La Trobe

Learning sparse log-ratios for high-throughput sequencing data

Download (234.06 MB)
media
posted on 2021-12-15, 22:33 authored by Elliott Rodriguez, Fazel AlmasiFazel Almasi

In the context of high-throughput sequencing (HTS) data, and compositional data (CoDa) more generally, an important class of biomarkers are the log-ratios between the input variables. However, identifying predictive log-ratio biomarkers from HTS data is a combinatorial optimization problem, which is computationally challenging. Existing methods are slow to run and scale poorly with the dimension of the input, which has limited their application to low- and moderate-dimensional metagenomic datasets. Building on recent advances from the field of deep learning, we develop CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers. Our algorithm exploits a continuous relaxation to approximate the underlying combinatorial optimization problem. This relaxation can then be optimized efficiently using the modern ML toolbox, in particular, gradient descent. As a result, CoDaCoRe runs several orders of magnitude faster than competing methods, all while achieving state-of-the-art performance in terms of predictive accuracy and sparsity.


survey: https://www.surveymonkey.com/r/5H6CYPW


Funding

Intellectual climate Fund

History

School

  • School of Life Sciences

Publication Date

2021-12-16

Rights Statement

The Author reserves all moral rights over the deposited text and must be credited if any re-use occurs.

Usage metrics

    Open Data

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC