La Trobe
252451_Meng,J_2018.pdf (1.85 MB)

A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer

Download (1.85 MB)
journal contribution
posted on 2023-05-25, 06:59 authored by Jing Meng, Yi-Ping Phoebe ChenYi-Ping Phoebe Chen
Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read alignment, pose a big challenge to somatic mutation discovery. A number of callers have been developed to predict them from paired tumor/normal or unpaired tumor sequencing data. However, the small size of currently available experimentally validated somatic sites limits evaluation and then improvement of callers. Fortunately, NIST reference material NA12878 genome has been well-characterized with publicly available high-confidence genotype calls, and biological and technological noises can be computationally generalized to the number of sub-clones, the VAFs, the sequencing and mapping qualities. We used BAMSurgeon to create simulated tumors by introducing somatic small variants (SNVs and small indels) into homozygous reference or wildtype sites of NA12878. We generated 135 simulated tumors from 5 pre-tumors/normals. These simulated tumors vary in sequencing and subsequent mapping error profiles, read length, the number of sub-clones, the VAF, the mutation frequency across the genome and the genomic context. Furthermore, these pure tumor/normal pairs can be mixed at desired ratios within each pair to simulate sample contamination. This database (a total size of 15 terabytes) will be of great use to benchmark somatic small variant callers and guide their improvement.

Funding

This work was supported by La Trobe University full fee scholarship.

History

Publication Date

2018-08-30

Journal

PLoS One

Volume

13

Issue

8

Article Number

e0202982

Pagination

14p. (p. 1-14)

Publisher

PLOS

ISSN

1932-6203

Rights Statement

© 2018 Meng, Chen. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Usage metrics

    Journal Articles

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC