Convert Seurat to Scanpy h5ad

zqfang included in Make bioinfo uncool again

2023-02-21 999 words 5 minutes

Contents

MuDataSeurat and sceasy are recommended

MuDataSeurat

Recommended!!!

MuDataSeurat directly writes h5ad file without requiring Python runtime. All dependencies exist in R and can be easily installed and used.

Install

I have added some extra features. Please Install my fork which works for anndata >=0.8 and Seurat V5.

Refer to MuDataSeurat

1
2
3
# in R console
## install myfork, which compatible with latest anndata 
remotes::install_github("zqfang/MuDataSeurat", force = T)

Usage

(Optional) Step 1: Slim down a Seurat object.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# (optional) step 1: Slim down a Seurat object. So you get raw counts, lognorm counts
seu = DietSeurat(
  srt,
  counts = TRUE, # so, raw counts save to adata.layers['counts']
  data = TRUE, # so, log1p counts save to adata.X when scale.data = False, else adata.layers['data']
  scale.data = FALSE, # set to false
  features = rownames(srt), # export all genes, not just top highly variable genes
  assays = "RNA",
  dimreducs = c("pca","umap"),
  graphs = c("RNA_nn", "RNA_snn"), # to RNA_nn -> distances, RNA_snn -> connectivities
  misc = TRUE
)


# .X, .layers['counts']. .raw.X
# Assumptions:
#   1. counts/data/scale.data -> X
#   3. counts & data -> layers['counts'], X
#   2. data & scale.data -> layers['data'], X
#   4. counts & scale.data -> layers['counts'], X
#   5. counts & data & scale.data -> layers['counts'], layers['data'], X

Step 2. Write Seurat to h5ad

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# for seurat v5, need to JoinLayer first
DefaultAssay(srt) = "RNA"
srt <- JoinLayers(srt)
# single modality
MuDataSeurat::WriteH5AD(srt, "srt.h5ad", assay="RNA")
# multi modality, ATAC+RNA, CITE-seq 
MuDataSeurat::WriteH5MU(srt, "srt.h5mu")

## data mapping
## GEX
# counts -> adata.layers['counts']
# data -> adata.layers['data'] # log1p data
# scale.data -> adata.X
# adata.raw.X set to None
## Graph
# RNA_nn -> adata.obsp['nn'] # this is knn adjmat. the old .uns['neighbors']['distance']
# RNA_snn -> adata.obsp['snn] # umap graph. the old .uns['neighbors']['connectivies']
## 

ReadH5AD(): Read an .h5mu file and create a Seurat object.
ReadH5MU(): Create a Seurat object from .h5mu file contents
WriteH5AD(): Write one assay to .h5ad
WriteH5MU(): Create an .h5mu file with data from a Seurat object

sceasy

The painless way. Try sceasy

However, the configuration of the running environment is complicated. Need Python and R runtime

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
library(sceasy)
library(reticulate)
use_condaenv('EnvironmentName')
loompy <- reticulate::import('loompy')

# seurat to anndata
sceasy::convertFormat(seurat_object, from="seurat", to="anndata",
                       outFile='filename.h5ad')

# anndata to seurat
sceasy::convertFormat(h5ad_file, from="anndata", to="seurat",
                       outFile='filename.rds')

SeuratDisk

Please see SeuratDisk to convert seurat to scanpy.

Not recommend, since it’s not fully compatible with anndata standards.

Tips:

set default assay to RNA before covert to h5ad.
if raw read count need to be imported to anndata, you should only contain counts slot in your seurat object before convertion

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(Seurat)
library(SeuratDisk)

# step 1: Slim down a Seurat object. So you get raw counts, lognorm counts

seu = DietSeurat(
  srt,
  counts = TRUE, # so, raw counts save to adata.raw.X 
  data = TRUE, # so, log1p counts save to adata.X
  scale.data = FALSE, # set to false, or else will save to adata.X
  features = rownames(srt), # export all genes, not just top highly variable genes
  assays = "RNA",
  dimreducs = c("pca","umap"),
  graphs = c("RNA_nn", "RNA_snn"), # to RNA_nn -> distances, RNA_snn -> connectivities
  misc = TRUE
)

# step 2: factor to character, or else your factor will be number in adata 
i <- sapply(seu@meta.data, is.factor)
seu@meta.data[i] <- lapply(seu@meta.data[i], as.character)

# step 3: convert 
SaveH5Seurat(seu, filename = "srt.h5seurat", overwrite = TRUE)
Convert("srt.h5seurat", "srt.h5ad", assay="RNA", overwrite = TRUE)

1
2
3
4
5
6
7
8
# load h5ad
import scanpy as sc
adata = sc.read_h5ad("srt.h5ad")
# save counts to layers
adata.layers['counts'] = adata.raw.X.copy()
adata.layers['log1p'] = data.X.copy()

# you need to scale data for downsream tasks if needed.

Seurat -> loom -> scanpy (OUTDATED)

NOT RECOMMENED

You actually neeed additional steps when seurat -> loom -> scanpy

see here

save to loom format.

1
2
3
4
5
6
7
pbmc.loom <- as.loom(seurat_object, filename = "../output/pbmc.loom", verbose = FALSE)
pbmc.loom$close_all() # alway close when done 

write.csv(Cells(seurat_object), file = "cellID_obs.csv", row.names = FALSE)
write.csv(Embeddings(seurat_object, reduction = "umap"), file = "cell_embeddings.csv")
write.csv(seurat_object@meta.data$seurat_clusters, file = "clusters.csv")

read into scanpy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import scanpy as sc
adata = sc.read_loom("../output/pbmc.loom")

sample_obs = pd.read_csv("cellID_obs.csv")
umap_cord = pd.read_csv("cell_embeddings.csv")
cell_clusters = pd.read_csv("clusters_obs.csv")


# now add metadata to the adata 
# e.g.
adata.obsm['X_umap'] = umap_ordered.values # cellID should be matched first
adata.uns['Cluster_colors'] = ...
...

open loom in R

1
2
3
immune <- Connect(filename = "../data/mmune_cells.loom", mode = "r")
seurat <- as.Seurat(immune)
immune$close_all() # alway close when done 

rpy2

This is the old way. Very hard to make it work. **Not recommended!*Converting Seurat to Scanpy cost me a lot of time to convert seurat objects to scanpy. It’s not a pleasant experience.

1. Install `Seurat v3.0.2`, or python kernel will always died!!!

Don’t know why latest seurat not work.

2. Set the R version for `rpy2`

1
2
3
4
5
6
7
# user defined R installation
import os
# path to your libR.so, only Seurat v3.0.2 works! 
# create a conda R env for seurat 3.0.2 first
os.environ['R_HOME'] = '/home/fangzq/miniconda/envs/seurat/lib/R' 
# path depends on where you installed Python.
os.environ['R_USER'] = '/home/fangzq/miniconda/lib/python3.7/site-packages/rpy2' 

3. Now, you’re good to go

1
2
import scanpy as sc
import glob

Install anndata2ri first

1
2
3
4
5
import anndata2ri
from rpy2.robjects import r
from rpy2.robjects.conversion import localconverter
# activate rpy2 env
anndata2ri.activate()

1
robjs = glob.glob("data/*Robj")

Convert to h5ad

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
r('library(Seurat)')
for robj in robjs:
    r(f'x<-load("{robj}")')
    r('y=get(x)')
    r('rm(x)')
    r('DefaultAssay(y) <- "RNA"') # get raw count matrix to save
    # seurat2 object
    # adata = r('as.SingleCellExperiment(UpdateSeuratObject(y))')
    adata = r('as.SingleCellExperiment(y)')
    adata.write_h5ad(filename=robj.replace("Robj","h5ad"))