one-hot-encoding
Parent: data-representation
Source: google-ml-course
One-hot encoding
- We can assign each class to a unique coefficient, i.e. maps a category to a number.
- The problem with this (using integer coefficients)
- These coefficients get bundled in with the ML math and the weights learned are specific to the defined coefficient code.
$$ \text{model} = \sum \text{weight} * \text{feature}$$
The weight for the specified feature vector is multiplied with all values of the feature, in this case the coefficients… what happens if we add a new class and/or change the coefficient code? - No accounting for data which belong to multiple classes.
- These coefficients get bundled in with the ML math and the weights learned are specific to the defined coefficient code.
- Solution: transform to a binary vector notation
- Useful for sparse, categorical data.
- Multi-hot encoding when multiple values are allowed to be one.