GATK for inbred mouse
GATK is design for human genetics, but it also work well for inbred mice.
However, one of my colleague who studies mouse genetics, said,
I tried the haplotype caller from GATK. But it seems that the haplotype caller is designed for heterogeneous genome like human than for mice. Therefore, the result coming out of HC is worse than samtools, as I manually inspected a few regions that HC calls didn’t make sense.
In addition, in one of their mouse genomic paper that we reviewed, they even skipped the second recalibration step. We asked them why and they said it was because of the same reason: good for human but not that good for the homogeneous inbred mouse.
With my own experience with GATK4, I found that:
- SNP: at least 97% of the time, they both have the same call for inbred mice.
- Indels: GATK is a preferable alogrithm for calling Indels (higher accuary and lower FDR), benefits from a assemble-based caller
While BCFtools is as good as GATK for calling SNPs (position-based caller).
GATK resource bundle for inbred mouse.
I found a workflow here. However, the script is out of date. Also, see discussion here
For GATK4, we have
1. Genome
Download from NCBI (mm10) or Sanger Mouse Genetics Programme
|
|
2. dbSNP
Depends on your study design.
Download All in one vcf file from NCBI
|
|
Download from the Sanger Mouse Genetics Programme (Sanger MGP)
|
|
3. Known Indels
For mouse indels, the Sanger Mouse Genetics Programme (Sanger MGP) is probably the best resource.
Download all MGP indels (5/2015 release):
|
|
Filter for passing variants
|
|
Sort VCF (automatically generated index has to be deleted due to a known bug -> No anymore):
|
|