one-hot-encoding

Search IconIcon to open search

Parent: data-representation

Source: google-ml-course

One-hot encoding

  • We can assign each class to a unique coefficient, i.e. maps a category to a number.
  • The problem with this (using integer coefficients)
    • These coefficients get bundled in with the ML math and the weights learned are specific to the defined coefficient code.
      $$ \text{model} = \sum \text{weight} * \text{feature}$$
      The weight for the specified feature vector is multiplied with all values of the feature, in this case the coefficients… what happens if we add a new class and/or change the coefficient code?
    • No accounting for data which belong to multiple classes.
  • Solution: transform to a binary vector notation
  • Useful for sparse, categorical data.
  • Multi-hot encoding when multiple values are allowed to be one.