Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets

  • Jang il Sohn
  • , Min Hak Choi
  • , Dohun Yi
  • , Vipin A. Menon
  • , Yeon Jeong Kim
  • , Junehawk Lee
  • , Jung Woo Park
  • , Sungkyu Kyung
  • , Seung Ho Shin
  • , Byunggook Na
  • , Je Gun Joung
  • , Young Seok Ju
  • , Min Sun Yeom
  • , Youngil Koh
  • , Sung Soo Yoon
  • , Daehyun Baek
  • , Tae Min Kim
  • , Jin Wu Nam

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Variant callers typically produce massive numbers of false positives for structural variations, such as cancer-relevant copy-number alterations and fusion genes resulting from genome rearrangements. Here we describe an ultrafast and accurate detector of somatic structural variations that reduces read-mapping costs by filtering out reads matched to pan-genome k-mer sets. The detector, which we named ETCHING (for efficient detection of chromosomal rearrangements and fusion genes), reduces the number of false positives by leveraging machine-learning classifiers trained with six breakend-related features (clipped-read count, split-reads count, supporting paired-end read count, average mapping quality, depth difference and total length of clipped bases). When benchmarked against six callers on reference cell-free DNA, validated biomarkers of structural variants, matched tumour and normal whole genomes, and tumour-only targeted sequencing datasets, ETCHING was 11-fold faster than the second-fastest structural-variant caller at comparable performance and memory use. The speed and accuracy of ETCHING may aid large-scale genome projects and facilitate practical implementations in precision medicine.

Original languageEnglish
Pages (from-to)853-866
Number of pages14
JournalNature Biomedical Engineering
Volume7
Issue number7
DOIs
StatePublished - Jul 2023

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature Limited.

Fingerprint

Dive into the research topics of 'Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets'. Together they form a unique fingerprint.

Cite this