Hi, there

Graph: Augmentation

zqfang published on 2025-03-15 included in Machine Learning with Graphs

Graph Augmentations using PyG Graph Structure Agumentation Half-Hop HalfHop adds a “slow node” to all edges with some probability p . Note that these slow nodes have averaged features from the parent nodes, and additionally are undirected. Virtual Node VirtualNode (Gilmer 2017) appends a virtual node to the given homogeneous graph that is connected to all other nodes.

RoPE: Rotary Positional Embeddings

zqfang published on 2024-07-29 included in Nature Language Processing

Rotary Positional Embeddings, proposed in 2022, this innovation is swiftly making its way into prominent language models like Google’s PaLM and Meta’s LLaMa. RoPE is a new type of positional encoding that unifies absolute and relative positional encoding approaches Rotary Positional Encoding is a type of position encoding that encodes

SwiGLU

zqfang published on 2024-07-29 included in Nature Language Processing

The most commonly used activation function in LLM. 1 2 3 4 5 6 7 8 9 10 11 12 class SwiGLU(nn.Module): def __init__(self, w1, w2, w3): super.__init__() self.w1 = w1 self.w2 = w2 slef.w3 = w3 def forward(self, x): x1 = F.linear(x, self.w1.weight) x2 = F.linear(x, self.w2.weight) hidden = F.

Convert Seurat to Scanpy h5ad

zqfang published on 2023-02-21 included in Make bioinfo uncool again

MuDataSeurat and sceasy are recommended MuDataSeurat Recommended!!! MuDataSeurat directly writes h5ad file without requiring Python runtime. All dependencies exist in R and can be easily installed and used. Install I have added some extra features. Please Install my fork which works for anndata >=0.8 and Seurat V5. Refer to MuDataSeurat

Bayesian Data Analysis: Basics

zqfang published on 2022-02-10 included in Statistic

The three steps of Bayesian data analysis Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.

Graph: Train, valid, and test dataset split for link prediction

zqfang published on 2021-08-12 included in Machine Learning with Graphs

Link Prediction Link prediction is a common task in knowledgegraph’s link completeion. Link prediction is usually an unsupervised or self-supervised task, which means that sometimes we need to split the dataset and create corresponding labels on our own. How to prepare train, valid, test datasets ? For link prediction, we will split edges twice

Graph: Mini-batch sampling in large-scale graphs

zqfang published on 2021-08-11 included in Machine Learning with Graphs

Mini-batch Sampling Real world graphs can be very large with millions or even billions of nodes and edges. But the naive full-batch implementation of GNN cannot be feasible to these large-scale graphs. Two frequently used methods are summarized here: Neighbor Sampling (Hamilton et al. (2017)) torch_geometric.loader.NeighborLoader Cluster-GCN (Chiang et al.

Graph: Implement a MessagePassing layer in Pytorch Geometric

zqfang published on 2021-08-07 included in Machine Learning with Graphs

How to implement a custom MessagePassing layer in Pytorch Geometric (PyG) ? Before you start, something you need to know. special_arguments: e.g. x_j, x_i, edge_index_j, edge_index_i aggregate: scatter_add, scatter_mean, scatter_min, scatter_max PyG MessagePassing framework only works for node_graph. 1 2 3 4 5 x = ... # Node features of shape [num_nodes, num_features] edge_index = .

Graph: Concepts

zqfang published on 2021-04-19 included in Machine Learning with Graphs

Basics. Definition Graph: $G(V, E)$ Adjacency Matrix: $A$ Degree: $D$, the number of nodes that are adjacent to $v$. Neighbors: $N$, the number of $N_{v(i)}$ is equal to $D_{v(i)}$. Connectivity Walk A walk on a graph is an alternating sequence of nodes and edges, starting with a node and ending with a node where each edge is incident with the nodes immediately preceding and following it.

Graph: GraphRNN

zqfang published on 2020-12-12 included in Machine Learning with Graphs

Why is it interesting Drug discovery discovery highly drug-like molecules complete an existing molecule to optimize a desired property Discovering novel structures Network science Why is it hard Large and variable output Non-unique representations $n$-node graph can be represented in $n!$ ways Hard to compute/optimize objective functions Complex dependencies edge fprmation has long-range dependencies Graph Generative Model Given: Graphs sampled from $p_{data}(G)$