RoPE: Rotary Positional Embeddings

Rotary Positional Embeddings, proposed in 2022, this innovation is swiftly making its way into prominent language models like Google’s PaLM and Meta’s LLaMa. RoPE is a new type of positional encoding that unifies absolute and relative positional encoding approaches Rotary Positional Encoding is a type of position encoding that encodes


The most commonly used activation function in LLM. 1 2 3 4 5 6 7 8 9 10 11 12 class SwiGLU(nn.Module): def __init__(self, w1, w2, w3): super.__init__() self.w1 = w1 self.w2 = w2 slef.w3 = w3 def forward(self, x): x1 = F.linear(x, self.w1.weight) x2 = F.linear(x, self.w2.weight) hidden = F.

Bayesian Data Analysis: Basics

The three steps of Bayesian data analysis Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.

Graph: Train, valid, and test dataset split for link prediction

Link Prediction Link prediction is a common task in knowledgegraph’s link completeion. Link prediction is usually an unsupervised or self-supervised task, which means that sometimes we need to split the dataset and create corresponding labels on our own. How to prepare train, valid, test datasets ? For link prediction, we will split edges twice

Graph: Mini-batch sampling in large-scale graphs

Mini-batch Sampling Real world graphs can be very large with millions or even billions of nodes and edges. But the naive full-batch implementation of GNN cannot be feasible to these large-scale graphs. Two frequently used methods are summarized here: Neighbor Sampling (Hamilton et al. (2017)) torch_geometric.loader.NeighborLoader Cluster-GCN (Chiang et al.

Graph: Implement a MessagePassing layer in Pytorch Geometric

How to implement a custom MessagePassing layer in Pytorch Geometric (PyG) ? Before you start, something you need to know. special_arguments: e.g. x_j, x_i, edge_index_j, edge_index_i aggregate: scatter_add, scatter_mean, scatter_min, scatter_max PyG MessagePassing framework only works for node_graph. 1 2 3 4 5 x = ... # Node features of shape [num_nodes, num_features] edge_index = .

Graph: Concepts

Basics. Definition Graph: $G(V, E)$ Adjacency Matrix: $A$ Degree: $D$, the number of nodes that are adjacent to $v$. Neighbors: $N$, the number of $N_{v(i)}$ is equal to $D_{v(i)}$. Connectivity Walk A walk on a graph is an alternating sequence of nodes and edges, starting with a node and ending with a node where each edge is incident with the nodes immediately preceding and following it.

Graph: GraphRNN

Why is it interesting Drug discovery discovery highly drug-like molecules complete an existing molecule to optimize a desired property Discovering novel structures Network science Why is it hard Large and variable output Non-unique representations $n$-node graph can be represented in $n!$ ways Hard to compute/optimize objective functions Complex dependencies edge fprmation has long-range dependencies Graph Generative Model Given: Graphs sampled from $p_{data}(G)$