Source: google-ml-course

Embeddings

Translate a high dimensional vector into a lower dimensional space (embedding)
e.g. representing words by sparse vectors
Ideally, similar inputs are grouped close together in the embedding space (similarity metric)

Learning embeddings from data

Embedding layer: a hidden layer with $D$ nodes, where $D$ is the dimension of the embedding space
Each of these dimensions is called a latent dimension due to it being inferred from the data
This hidden layer learns how to cluster the data to optimise the metric we have chosen
Weights between the the input and the embedding layer are the coordinate values of the input within the embedding space
Hyperparameter: how many embedding dimensions $D$?

2D embedding for movies:

Each movie is a point in the 2D space.

Input representation: movies watched by different users

For the bottom user: a dense input representation could be $$ \left[\begin{array}{cccc cccc} 0 & 1 & 0 & 1 & \dots & 0 & 0 & 1 \end{array}\right] $$

For millions of movies, it would be inefficient to list out all of the unwatched movies –> preprocess the input

Build a dictionary mapping each feature to an integer (coding of the movies).
Now the user input vector can be the sparse representation
$$\left[\begin{array}{ccc} 1 & 3 & 999999 \end{array}\right]$$

If the user has watched 10 movies:

Take 3 as inputs
- Fed through the embedding layer etc int the neural network
- Comes out on the logit layer as a prediction for which movies the user would like
Take 7 as ground truth training labels (for loss calculation)

Predicting home sales prices

What is the size of the house from the set of words? ‘Roomy’, ‘spacious’, etc.
We reduce the high dimensional input vector to the lower dimensional 3D space of house size categories
Loss is between predicted price and actual sale price