Graph Augmentations using PyG
Graph Structure Agumentation Half-Hop HalfHop adds a “slow node” to all edges with some probability p . Note that these slow nodes have averaged features from the parent nodes, and additionally are undirected.
Virtual Node VirtualNode (Gilmer 2017) appends a virtual node to the given homogeneous graph that is connected to all other nodes.
Rotary Positional Embeddings, proposed in 2022, this innovation is swiftly making its way into prominent language models like Google’s PaLM and Meta’s LLaMa. RoPE is a new type of positional encoding that unifies absolute and relative positional encoding approaches Rotary Positional Encoding is a type of position encoding that encodes
The most commonly used activation function in LLM.
1 2 3 4 5 6 7 8 9 10 11 12 class SwiGLU(nn.Module): def __init__(self, w1, w2, w3): super.__init__() self.w1 = w1 self.w2 = w2 slef.w3 = w3 def forward(self, x): x1 = F.linear(x, self.w1.weight) x2 = F.linear(x, self.w2.weight) hidden = F.
MuDataSeurat and sceasy are recommended
MuDataSeurat Recommended!!!
MuDataSeurat directly writes h5ad file without requiring Python runtime. All dependencies exist in R and can be easily installed and used.
Install I have added some extra features. Please Install my fork which works for anndata >=0.8 and Seurat V5.
Refer to MuDataSeurat
The three steps of Bayesian data analysis Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem.
Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.
Link Prediction Link prediction is a common task in knowledgegraph’s link completeion. Link prediction is usually an unsupervised or self-supervised task, which means that sometimes we need to split the dataset and create corresponding labels on our own. How to prepare train, valid, test datasets ? For link prediction, we will split edges twice
Mini-batch Sampling Real world graphs can be very large with millions or even billions of nodes and edges. But the naive full-batch implementation of GNN cannot be feasible to these large-scale graphs.
Two frequently used methods are summarized here:
Neighbor Sampling (Hamilton et al. (2017)) torch_geometric.loader.NeighborLoader Cluster-GCN (Chiang et al.
How to implement a custom MessagePassing layer in Pytorch Geometric (PyG) ?
Before you start, something you need to know.
special_arguments: e.g. x_j, x_i, edge_index_j, edge_index_i aggregate: scatter_add, scatter_mean, scatter_min, scatter_max PyG MessagePassing framework only works for node_graph. 1 2 3 4 5 x = ... # Node features of shape [num_nodes, num_features] edge_index = .
Basics.
Definition Graph: $G(V, E)$ Adjacency Matrix: $A$ Degree: $D$, the number of nodes that are adjacent to $v$. Neighbors: $N$, the number of $N_{v(i)}$ is equal to $D_{v(i)}$. Connectivity Walk A walk on a graph is an alternating sequence of nodes and edges, starting with a node and ending with a node where each edge is incident with the nodes immediately preceding and following it.
Why is it interesting Drug discovery discovery highly drug-like molecules complete an existing molecule to optimize a desired property Discovering novel structures Network science Why is it hard Large and variable output Non-unique representations $n$-node graph can be represented in $n!$ ways Hard to compute/optimize objective functions Complex dependencies edge fprmation has long-range dependencies Graph Generative Model Given: Graphs sampled from $p_{data}(G)$