Feature Scaling

What is Feature Scaling?

Feature scaling is a technique used to standardize the range of independent variables or features of data. It is also known as data normalization or data standardization.

Why is important?

It is crucial because many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and close to normally distributed.

Normal vs Skewed distributions

Working with data that has vastly different scales, like age in years and income in dollars, can lead to biased results in algorithms that rely on distance calculations (e.g., k-nearest neighbors, k-means clustering) or gradient descent optimization (e.g., linear regression, logistic regression).

It also preserves the relationships between features while ensuring that no single feature dominates the learning process due to its scale.

Common Methods

You have different methods for feature scaling, but two of the most common are Min-Max Scaling and Standardization (Z-score Normalization).

Standardization (Z-score Normalization)

Standardization, also known as Z-score normalization, transforms your data so that it has a mean of 0 and a standard deviation of 1.

Let’s say you have a dataset with features like age (20-80) and income ($30k-$200k), as mentioned before. You want to standardize these features to ensure that they contribute equally to the analysis.

Here’s how you can do it in Python using pandas:

# data is your pandas DataFrame or Series
data_scaled = (data - data.mean()) / data.std()

Mathematical explanation

The formula for calculating the Z-score of a value is:

z = \frac{x - μ}{σ}

Where:

z is the standardized value
x is the original value
μ (mu) is the mean of the feature
σ (sigma) is the standard deviation of the feature

What this does

Centering: data - data.mean() shifts the data so that the new mean becomes 0
Scaling: Dividing by data.std() scales the data so that the new standard deviation becomes 1

After this transformation, the scaled data will have:

Mean = 0: The average of all values becomes zero
Standard Deviation = 1: The spread of the data is normalized
Shape Preservation: The distribution shape remains the same, just relocated and rescaled

Example

If you have original values: [1, 2, 3, 4, 5]

Mean (μ) = 3
Standard deviation (σ) ≈ 1.58

After standardization (applying the formula to each value):

(1-3)/1.58 ≈ -1.27
(2-3)/1.58 ≈ -0.63
(3-3)/1.58 = 0
(4-3)/1.58 ≈ 0.63
(5-3)/1.58 ≈ 1.27

So, instead of having values [1, 2, 3, 4, 5], you now have approximately [-1.27, -0.63, 0, 0.63, 1.27].

$ luctre

Recent Posts

The Elm Architecture

Innersourcing

ORMs