Data Science Hub
  • Data Science Hub
  • STATISTICS
    • Introduction
    • Fundamentals
      • Data Types
      • Central Tendency, Asymmetry, and Variability
      • Sampling
      • Confidence Interval
      • Hypothesis Testing
    • Distributions
      • Exponential Distribution
    • A/B Testing
      • Sample Size Calculation
      • Multiple Testing
  • Database
    • Database Fundamentals
    • Database Management Systems
    • Data Warehouse vs Data Lake
  • SQL
    • SQL Basics
      • Creating and Modifying Tables/Views
      • Data Types
      • Joins
    • SQL Rules
    • SQL Aggregate Functions
    • SQL Window Functions
    • SQL Data Manipulation
      • String Operations
      • Date/Time Operations
    • SQL Descriptive Stats
    • SQL Tips
    • SQL Performance Tuning
    • SQL Customization
    • SQL Practice
      • Designing Databases
        • Spotify Database Design
      • Most Commonly Asked
      • Mixed Queries
      • Popular Websites For SQL Practice
        • SQLZoo
          • World - BBC Tables
            • SUM and COUNT Tutorial
            • SELECT within SELECT Tutorial
            • SELECT from WORLD Tutorial
            • Select Quiz
            • BBC QUIZ
            • Nested SELECT Quiz
            • SUM and COUNT Quiz
          • Nobel Table
            • SELECT from Nobel Tutorial
            • Nobel Quiz
          • Soccer / Football Tables
            • JOIN Tutorial
            • JOIN Quiz
          • Movie / Actor / Casting Tables
            • More JOIN Operations Tutorial
            • JOIN Quiz 2
          • Teacher - Dept Tables
            • Using Null Quiz
          • Edinburgh Buses Table
            • Self join Quiz
        • HackerRank
          • SQL (Basic)
            • Select All
            • Select By ID
            • Japanese Cities' Attributes
            • Revising the Select Query I
            • Revising the Select Query II
            • Revising Aggregations - The Count Function
            • Revising Aggregations - The Sum Function
            • Revising Aggregations - Averages
            • Average Population
            • Japan Population
            • Population Density Difference
            • Population Census
            • African Cities
            • Average Population of Each Continent
            • Weather Observation Station 1
            • Weather Observation Station 2
            • Weather Observation Station 3
            • Weather Observation Station 4
            • Weather Observation Station 6
            • Weather Observation Station 7
            • Weather Observation Station 8
            • Weather Observation Station 9
            • Weather Observation Station 10
            • Weather Observation Station 11
            • Weather Observation Station 12
            • Weather Observation Station 13
            • Weather Observation Station 14
            • Weather Observation Station 15
            • Weather Observation Station 16
            • Weather Observation Station 17
            • Weather Observation Station 18
            • Weather Observation Station 19
            • Higher Than 75 Marks
            • Employee Names
            • Employee Salaries
            • The Blunder
            • Top Earners
            • Type of Triangle
            • The PADS
          • SQL (Intermediate)
            • Weather Observation Station 5
            • Weather Observation Station 20
            • New Companies
            • The Report
            • Top Competitors
            • Ollivander's Inventory
            • Challenges
            • Contest Leaderboard
            • SQL Project Planning
            • Placements
            • Symmetric Pairs
            • Binary Tree Nodes
            • Interviews
            • Occupations
          • SQL (Advanced)
            • Draw The Triangle 1
            • Draw The Triangle 2
            • Print Prime Numbers
            • 15 Days of Learning SQL
          • TABLES
            • City - Country
            • Station
            • Hackers - Submissions
            • Students
            • Employee - Employees
            • Occupations
            • Triangles
        • StrataScratch
          • Netflix
            • Oscar Nominees Table
            • Nominee Filmography Table
            • Nominee Information Table
          • Audible
            • Easy - Audible
          • Spotify
            • Worldwide Daily Song Ranking Table
            • Billboard Top 100 Year End Table
            • Daily Rankings 2017 US
          • Google
            • Easy - Google
            • Medium - Google
            • Hard - Google
        • LeetCode
          • Easy
  • Python
    • Basics
      • Variables and DataTypes
        • Lists
        • Dictionaries
      • Control Flow
      • Functions
    • Object Oriented Programming
      • Restaurant Modeler
    • Pythonic Resources
    • Projects
  • Machine Learning
    • Fundamentals
      • Supervised Learning
        • Classification Algorithms
          • k-Nearest Neighbors
            • kNN Parameters & Attributes
          • Logistic Regression
        • Classification Report
      • UnSupervised Learning
        • Clustering
          • Evaluation
      • Preprocessing
        • Scalers: Standard vs MinMax
        • Feature Selection vs Dimensionality Reduction
        • Encoding
    • Frameworks
    • Machine Learning in Advertising
    • Natural Language Processing
      • Stopwords
      • Name Entity Recognition (NER)
      • Sentiment Analysis
        • Agoda Reviews - Part I - Scraping Reviews, Detecting Languages, and Preprocessing
        • Agoda Reviews - Part II - Sentiment Analysis and WordClouds
    • Recommendation Systems
      • Spotify Recommender System - Artists
  • Geospatial Analysis
    • Geospatial Analysis Basics
    • GSA at Work
      • Web Scraping and Mapping
  • GIT
    • GIT Essentials
    • Connecting to GitHub
  • FAQ
    • Statistics
  • Cloud Computing
    • Introduction to Cloud Computing
    • Google Cloud Platform
  • Docker
    • What is Docker?
Powered by GitBook
On this page
  • Standardization (Z-score Normalization )
  • Min-Max Scaling (Normalization)
  • Key differences and nuances
  • Conclusion

Was this helpful?

  1. Machine Learning
  2. Fundamentals
  3. Preprocessing

Scalers: Standard vs MinMax

Last updated 1 year ago

Was this helpful?

Scaling is an essential step in data preprocessing, as it helps ensure that machine learning models treat all features equally and make more accurate predictions. When it comes to scaling the two most common techniques are

  • Standardization (also known as Z-scoring or Z-score Normalization)

  • MinMax Scaling (also known as Normalization or Rescaling)

The choice between the two depends on the nature of your data and the specific requirements of your machine learning algorithm.

Standardization (Z-score Normalization )

  • Subtracts the mean and divides by the standard deviation for each feature

  • Resulting distribution has a mean of 0 and a standard deviation of 1

  • Useful when features have different units or scales

  • Preserves the shape of the original distribution

  • Related module for standardization is .

, another module for scaling the data, operates similarly to StandardScaler, and ensures that the features are on the same scale. The difference between the two is that RobustScaler uses the median and the quartiles (i.e. percentiles) instead; and therefore not influenced by a few very large values, i.e. outliers, in the dataset!

Min-Max Scaling (Normalization)

  • Subtracts the Min and divides by the range (Max - Min) for each feature

  • Rescales each feature to a common range (usually between 0 and 1)

  • Useful when features have different ranges or units

  • Can help reduce the effect of outliers

  • Can change the shape of the original distribution

  • Related module for normalization is .

  • If only positive values are present, the range is [0, 1] (same as MinMaxScaler).

  • If only negative values are present, the range is [-1, 0].

  • If both negative and positive values are present, the range is [-1, 1].

Key differences and nuances

  • Standardization is more sensitive to outliers, as it uses the mean and standard deviation, which can be influenced by extreme values.

  • Min-Max Scaling is more robust to outliers, as the normalized values are bounded between 0 and 1, which can reduce the impact of outliers.

  • Standardization is more suitable for algorithms that assume normality or equal variances, such as Linear Discriminant Analysis (LDA) or Gaussian Naive Bayes.

  • Min-Max Scaling is more suitable for algorithms that don't make assumptions about the distribution, such as Decision Trees or Support Vector Machines (SVMs).

  • If your data has negative values or a large range, Min-Max Scaling might be more appropriate. If your data is already somewhat normalized or has a small range, Standardization might be sufficient.

Conclusion

  • Reduce feature dominance

  • Improve model performance

  • Enhance generalization

  • Identify patterns and relationships

  • Prepare data for complex models

  • Create informative visualizations

  • Create new features

In conclusion, scaling is a crucial preprocessing step that ensures machine learning models treat all features equally, leading to more accurate predictions.

Further Reading

Another module that is similar to MinMaxScaler is called, which maps the original values different ranges depending on whether the dataset has negative OR positive values.

In addition to above mentioned techniques, there are scaling methods, including but not limited to (rescaling the vector for each sample to have unit norm), Log scaling (useful for skewed distributions), feature clipping (caps all feature values above (or below) a certain value to a fixed value), and custom scaling to a specific range. It is a general data preprocessing technique used in various supervised and unsupervised learning context, and is a versatile technique that benefits various applications, and help

article from scikit-learn is a good source that compares the effect of different scaling methods on a dataset with outliers.

scikit-learn
StandardScaler
RobustScaler
scikit-learn
MinMaxScaler
MaxAbsScaler
Normalizer
This