RoPE: Rotary Positional Embeddings
Rotary Positional Embeddings, proposed in 2022, this innovation is swiftly making its way into prominent language models like Google’s PaLM and Meta’s LLaMa.
RoPE is a new type of positional encoding that unifies absolute and relative positional encoding approaches
Rotary Positional Encoding is a type of position encoding that encodes absolute positional information with a rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation
RoPE provide a flexible mechanism to include positional context into tokens, without modifying the original embedding. the core principle revolves around rotating the queries and keys in the attention mechanism, where each position in the sequence receives a unique rotation. This way, the dot product between queries and keys gradually diminishes for tokens that are distant from one another in the sequence, providing an effective way to encode relative positions.
This approach tends to maintain more fo the original token information while still providing the model with an effective way to understand sequence positions.
The implementation looks like this
|
|
see a good video explaination