Show simple item record

dc.contributor.authorEgor, G.en
dc.contributor.authorArtem, K.en
dc.contributor.authorMaksim, B.en
dc.contributor.authorGaukhar, Z.en
dc.contributor.authorEkaterina, K.en
dc.contributor.authorVsevolod, Makeeven
dc.contributor.authorEvgeny, K.en
dc.date.accessioned2024-10-07T07:24:54Z
dc.date.available2024-10-07T07:24:54Z
dc.date.issued2024en
dc.identifier.citationEgor G, Artem K, Maksim B, Gaukhar Z, Ekaterina K, Vsevolod M, et al. Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index. BMC bioinformatics. 2024 JUL 13;25(1). PubMed PMID: WOS:001267092300001. English.en
dc.identifier.pmid39003441en
dc.identifier.urihttp://hdl.handle.net/10541/627135
dc.description.abstractMotivationAlignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods.ResultsIn this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.en
dc.language.isoenen
dc.titleEnhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 indexen
dc.contributor.departmentCancer Research UK National Biomarker Centre, University of Manchester, Manchester, Manchester, M20 4BX, UK.en
dc.identifier.journalBMC Bioinformaticsen
dc.description.noteen]
refterms.dateFOA2024-10-08T12:11:30Z


Files in this item

Thumbnail
Name:
s12859-024-05862-y.pdf
Size:
2.488Mb
Format:
PDF
Description:
Found with Open Access Button

This item appears in the following Collection(s)

Show simple item record