RoPE: Rotary Positional Embeddings

zqfang included in Nature Language Processing

2024-07-29 240 words One minute

Contents

Rotary Positional Embeddings, proposed in 2022, this innovation is swiftly making its way into prominent language models like Google’s PaLM and Meta’s LLaMa.

RoPE is a new type of positional encoding that unifies absolute and relative positional encoding approaches

Rotary Positional Encoding is a type of position encoding that encodes absolute positional information with a rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation

RoPE provide a flexible mechanism to include positional context into tokens, without modifying the original embedding. the core principle revolves around rotating the queries and keys in the attention mechanism, where each position in the sequence receives a unique rotation. This way, the dot product between queries and keys gradually diminishes for tokens that are distant from one another in the sequence, providing an effective way to encode relative positions.

This approach tends to maintain more fo the original token information while still providing the model with an effective way to understand sequence positions.

The implementation looks like this

1
2
3
4
5
def rotary_positional_embedding(position, d_model):
    freqs = np.exp(np.linspace(0., -1., d_model /2) * np.log(10000.)) #1
    angles = position * freqs # 2
    rotary_matrix - np.stack([np.sin(angels), np.cos(angles)], axis =-1) # 3
    return rotary_matrix.reshape(-1, d_model) #4

see a good video explaination

Contents

RoPE: Rotary Positional Embeddings

reference