MuDataSeurat and sceasy are recommended
MuDataSeurat Recommended!!!
MuDataSeurat directly writes h5ad file without requiring Python runtime. All dependencies exist in R and can be easily installed and used.
Refer to MuDataSeurat
Install I have added some extra features. Please Install my fork which works for anndata >=0.8 and Seurat V5.
The three steps of Bayesian data analysis Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem.
Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.
Link Prediction Link prediction is a common task in knowledgegraph’s link completeion. Link prediction is usually an unsupervised or self-supervised task, which means that sometimes we need to split the dataset and create corresponding labels on our own. How to prepare train, valid, test datasets ? For link prediction, we will split edges twice
Mini-batch Sampling Real world graphs can be very large with millions or even billions of nodes and edges. But the naive full-batch implementation of GNN cannot be feasible to these large-scale graphs.
Two frequently used methods are summarized here:
Neighbor Sampling (Hamilton et al. (2017)) torch_geometric.loader.NeighborLoader Cluster-GCN (Chiang et al.
How to implement a custom MessagePassing layer in Pytorch Geometric (PyG) ?
Before you start, something you need to know.
special_arguments: e.g. x_j, x_i, edge_index_j, edge_index_i aggregate: scatter_add, scatter_mean, scatter_min, scatter_max PyG MessagePassing framework only works for node_graph. 1 2 3 4 5 x = ... # Node features of shape [num_nodes, num_features] edge_index = .
Basics.
Definition Graph: $G(V, E)$ Adjacency Matrix: $A$ Degree: $D$, the number of nodes that are adjacent to $v$. Neighbors: $N$, the number of $N_{v(i)}$ is equal to $D_{v(i)}$. Connectivity Walk A walk on a graph is an alternating sequence of nodes and edges, starting with a node and ending with a node where each edge is incident with the nodes immediately preceding and following it.
Why is it interesting Drug discovery discovery highly drug-like molecules complete an existing molecule to optimize a desired property Discovering novel structures Network science Why is it hard Large and variable output Non-unique representations $n$-node graph can be represented in $n!$ ways Hard to compute/optimize objective functions Complex dependencies edge fprmation has long-range dependencies Graph Generative Model Given: Graphs sampled from $p_{data}(G)$
Graph Convolutional Network and Graph Attention
Why deep graph encoder ? Limitations of Shallow Encoders (e.g. node2vec)
$O( | V | )$ parameters are needed: No sharing of parameters between nodes Every node has its own unique embedding Inherently “transductive”: Can not generate embeddings for nodes that are not seen during training Do not incorporate node features Many graphs have features that we can and should leverage Graph Convolutional Network Could get embedding for unseen nodes!
Problems: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?
classification label of an object $O$ in network may depend on:
Features of $O$ Labels of the objects in $O$'s neighborhood Features of objects in $O$'s neigborhood Collective classification models Reational clasifiers Iterative classifications Loopy belief propagation Intuition Simultaneous classification of interlinked nodes using correlations
Node Embedings are learnt in the same way as word2vec (skip-gram model)
However, graphs could be (un)directed, (un)weighted, (a)cyclic and are basically much more complex than the strucure of a sequence…
So how do we generate “corpus” from a graph ?
Random walk on the graph Given a graph and a starting point, we select a neighbor of it at random; then we select a neigbor of this point at random, and move to it, etc.