Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Nat Genet. 2022 Apr;54(4):518-525

PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w

Authors

Other Contributors

Jana Ebler¹, Peter Ebert¹, Wayne E Clarke², Tobias Rausch^3 4, Peter A Audano⁵, Torsten Houwaart⁶, Yafei Mao⁵, Jan O Korbel³, Evan E Eichler^5 7, Michael C Zody², Alexander T Dilthey^6 8 9, Tobias Marschall¹⁰

1Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
²New York Genome Center, New York, NY, USA.
³European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
⁴European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany.
⁵Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
⁶Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
⁷Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
⁸Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany.
⁹Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany.
¹⁰Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Publications