The best part of snakemake is allowed you to run your pipeline on HPC automatically. It save you a lot of time.
How to run snakemake on HPC
there are two ways to configure
- use
--cluster
:- works on different HPC system, e.g. slurm, SGE.
- assign
resource
in params
directive explicitly. - or provide a config file by
--cluster-config
- use
--profile
: (recommend way)- assign
resource
in resource
directive explicitly. - or provide a profile file.
Snakemake and slurm
the --cluster
way
An example for Stanford Sherlock.
1. define resource in params
directive
1
2
3
4
5
6
7
| rule eblocks:
input: "..."
output: "..."
params: time = "30:00", mem = "4g"
threads: 8
shell:
"..."
|
run
1
| snakemake -s Snakefile --cluster 'sbatch -t {params.time} --mem={params.mem} -c {threads}' -j 10
|
2. define resource in cluster_config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # slurm_config.yaml - cluster configuration for Stanford Sherlock
__default__:
partition: normal
time_min: "01:00:00" # time limit for each job
cpus: 1
mem: "1g"
#ntasks-per-node: 14 #Request n cores be allocated per node.
output: "logs_slurm/{rule}.{wildcards}.out" ## redirect slurm-JOBID.txt to your directory
strain2trait:
time_min: "30:00"
eblocks:
mem: "4g"
cpus: "{threads}" ## => use `threads` define in rule
|
run
1
2
3
| snakemake -s Snakefile --cluster-config cluster.yaml \
--cluster 'sbatch -t {cluster.time_min} --mem={cluster.mem} -c {cluster.cpus} -o {cluster.output} -e {cluster.output}' \
-j 10
|
3. deploy your pipleline on HPC
make a submit.sh
script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| #!/bin/bash
#SBATCH -J snakeflow
#SBATCH --time=120:00:00
#SBATCH --qos long
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -p normal
#SBATCH --mem=1g
####SBATCH --mail-type=FAIL
####SBATCH --mail-user=xxx@gmail.com
# activate conda enviroment
source activate base
# run jobs
snakemake -j 666 -s haplomap.smk \
--configfile config.yaml \
--cluster "sbatch --time={cluster.time_min} -p {cluster.partition} --mem={cluster.mem} -c {cluster.cpus} " \
--cluster-config slurm_config.yaml
|
submit this script to cluster using
the --profile
way
It’s more universal and versatile.
1. create a directory for slurm
1
| mkdir -p ~/.config/snakemake/slurm
|
2. create config.yaml
in the slurm directory
Note: resource
directive for clusters only allow integer now.
1
2
3
| jobs: 10 # maximun job numbers submitted each time
cluster: "sbatch -p normal -t {resources.time_min} --mem={resources.mem} -c {resources.cpus} -o logs_slurm/{rule}_{wildcards} -e logs_slurm/{rule}_{wildcards} --mail-type=FAIL --mail-user=user@mail.com"
default-resources: [cpus=1, mem=2000, time_min=60]
|
3. assign resource
if different from default
1
2
3
4
5
| rule eblocks:
input: "..."
output: "..."
resources: mem = 4000 # only allow integers
shell: "..."
|
4. run
1
| snakemake --profile slurm -s haplomap.smk --configfile config.yaml -j 666
|