scaling-features

Search IconIcon to open search

Parent: data-representation

Source: google-ml-course

Scaling/normalising feature values

  • No benefit if only one feature in the feature set
  • Can be beneficial for multifeature set
    • Helps the gradient descent to converge more quickly
    • Helps avoid numbers becoming NaN
    • Makes the model spend the same time learning for each feature vector. If scaling is not done, more learning is done for features which have a bigger range of data.
  • Scaling methods
    • Map [min, max] to [-1, +1]
    • Use Z scores
      $$ Z = \frac{x - \mu}{\sigma} $$
      which results in most values lying within [-3, +3]
    • Log scaling for data with very big difference between min and max value (might still not be very ideal)
  • Clipping/capping at a max value (this doesn’t mean all values exceeding the max are thrown out, instead the outlier values get mapped to the max value)
OriginalLogClipped