La Trobe

sorry, we can't preview this file

GMT20211207-000208_Learning-sparse-log-ratios-for-high-throughput-sequencing-data_1772x908.mp4 (234.06 MB)

Learning sparse log-ratios for high-throughput sequencing data

Download (234.06 MB)
posted on 15.12.2021, 22:33 by Elliott Rodriguez, Fazel AlmasiFazel Almasi

In the context of high-throughput sequencing (HTS) data, and compositional data (CoDa) more generally, an important class of biomarkers are the log-ratios between the input variables. However, identifying predictive log-ratio biomarkers from HTS data is a combinatorial optimization problem, which is computationally challenging. Existing methods are slow to run and scale poorly with the dimension of the input, which has limited their application to low- and moderate-dimensional metagenomic datasets. Building on recent advances from the field of deep learning, we develop CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers. Our algorithm exploits a continuous relaxation to approximate the underlying combinatorial optimization problem. This relaxation can then be optimized efficiently using the modern ML toolbox, in particular, gradient descent. As a result, CoDaCoRe runs several orders of magnitude faster than competing methods, all while achieving state-of-the-art performance in terms of predictive accuracy and sparsity.



Intellectual climate Fund



  • School of Life Sciences

Publication Date


Rights Statement

The Author reserves all moral rights over the deposited text and must be credited if any re-use occurs.

Usage metrics