Deploy snakemake pipeline on HPC

zqfang included in Make bioinfo uncool again

2020-08-19 396 words 2 minutes

Contents

The best part of snakemake is allowed you to run your pipeline on HPC automatically. It save you a lot of time.

How to run snakemake on HPC

there are two ways to configure

use --cluster:
- works on different HPC system, e.g. slurm, SGE.
- assign resource in params directive explicitly.
- or provide a config file by --cluster-config
use --profile: (recommend way)
- assign resource in resource directive explicitly.
- or provide a profile file.

Snakemake and slurm

the `--cluster` way

An example for Stanford Sherlock.

1. define resource in `params` directive

1
2
3
4
5
6
7
rule eblocks:
    input: "..."
    output: "..."
    params: time = "30:00", mem = "4g" 
    threads: 8
    shell:
       "..."

run

1
snakemake -s Snakefile --cluster 'sbatch -t {params.time} --mem={params.mem} -c {threads}' -j 10

2. define resource in `cluster_config.yaml`

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# slurm_config.yaml - cluster configuration for Stanford Sherlock
__default__:
    partition: normal
    time_min: "01:00:00" # time limit for each job
    cpus: 1  
    mem: "1g"
    #ntasks-per-node: 14 #Request n cores be allocated per node.
    output: "logs_slurm/{rule}.{wildcards}.out"  ## redirect slurm-JOBID.txt to your directory

strain2trait:
    time_min: "30:00"

eblocks:
    mem: "4g"
    cpus: "{threads}" ## => use `threads` define in rule

run

1
2
3
snakemake -s Snakefile --cluster-config cluster.yaml \
          --cluster 'sbatch -t {cluster.time_min} --mem={cluster.mem} -c {cluster.cpus} -o {cluster.output} -e {cluster.output}' \
          -j 10

3. deploy your pipleline on HPC

make a submit.sh script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
#SBATCH -J snakeflow
#SBATCH --time=120:00:00
#SBATCH --qos long
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1  
#SBATCH -p normal
#SBATCH --mem=1g
####SBATCH --mail-type=FAIL
####SBATCH --mail-user=xxx@gmail.com

# activate conda enviroment
source activate base
# run jobs
snakemake -j 666 -s haplomap.smk \
        --configfile config.yaml \
        --cluster "sbatch --time={cluster.time_min} -p {cluster.partition} --mem={cluster.mem} -c {cluster.cpus} " \
        --cluster-config slurm_config.yaml 

submit this script to cluster using

1
sbatch submit.sh

the `--profile` way

It’s more universal and versatile.

1. create a directory for slurm

1
mkdir -p ~/.config/snakemake/slurm

2. create `config.yaml` in the slurm directory

Note: resource directive for clusters only allow integer now.

1
2
3
jobs: 10  # maximun job numbers submitted each time
cluster: "sbatch -p normal -t {resources.time_min} --mem={resources.mem} -c {resources.cpus} -o logs_slurm/{rule}_{wildcards} -e logs_slurm/{rule}_{wildcards} --mail-type=FAIL --mail-user=user@mail.com"
default-resources: [cpus=1, mem=2000, time_min=60]

3. assign `resource` if different from default

1
2
3
4
5
rule eblocks:
    input: "..."
    output: "..."
    resources: mem = 4000 # only allow integers
    shell: "..."

4. run

1
snakemake --profile slurm -s haplomap.smk --configfile config.yaml -j 666

Contents

Deploy snakemake pipeline on HPC

How to run snakemake on HPC

Snakemake and slurm

the --cluster way

1. define resource in params directive

2. define resource in cluster_config.yaml