Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index
Name:
s12859-024-05862-y.pdf
Size:
2.488Mb
Format:
PDF
Description:
Found with Open Access Button
Affiliation
Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, Manchester, M20 4BX, UK.Issue Date
2024
Metadata
Show full item recordAbstract
MotivationAlignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods.ResultsIn this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.Citation
Egor G, Artem K, Maksim B, Gaukhar Z, Ekaterina K, Vsevolod M, et al. Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index. BMC bioinformatics. 2024 JUL 13;25(1). PubMed PMID: WOS:001267092300001. English.Journal
BMC BioinformaticsPubMed ID
39003441Language
enCollections
Related articles
- Calling known variants and identifying new variants while rapidly aligning sequence data.
- Authors: VanRaden PM, Bickhart DM, O'Connell JR
- Issue date: 2019 Apr
- Fast and SNP-aware short read alignment with SALT.
- Authors: Quan W, Liu B, Wang Y
- Issue date: 2021 Aug 25
- Fast and memory efficient approach for mapping NGS reads to a reference genome.
- Authors: Kumar S, Agarwal S, Ranvijay
- Issue date: 2019 Apr
- Fast read alignment with incorporation of known genomic variants.
- Authors: Guo H, Liu B, Guan D, Fu Y, Wang Y
- Issue date: 2019 Dec 19
- Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.
- Authors: Prodanov T, Bansal V
- Issue date: 2020 Nov 4