Embedding: Maps each token (usually an integer ID) to a dense vector.
Purpose: Converts each token into a dense vector that captures its semantic and syntactic features. This vector is derived from the embedding matrix, which is learned during the training process.
Embedding Matrix: This is a learnable parameter in the model that maps each token in the vocabulary to a high-dimensional space. The dimensions typically range from hundreds to thousands, depending on the model’s complexity and the task requirements.
Positional Encoding:
Reference:
But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5
Visualizing transformers and attention | Talk for TNG Big Tech Day '24