La Trobe
1218391_Fachrul,M_2022.pdf (3.08 MB)

Assessing and removing the effect of unwanted technical variations in microbiome data

Download (3.08 MB)
journal contribution
posted on 2023-08-24, 01:56 authored by M Fachrul, Guillaume MericGuillaume Meric, Michael Inouye, SJ Pamp, Agus SalimAgus Salim
Varying technologies and experimental approaches used in microbiome studies often lead to irreproducible results due to unwanted technical variations. Such variations, often unaccounted for and of unknown source, may interfere with true biological signals, resulting in misleading biological conclusions. In this work, we aim to characterize the major sources of technical variations in microbiome data and demonstrate how in-silico approaches can minimize their impact. We analyzed 184 pig faecal metagenomes encompassing 21 specific combinations of deliberately introduced factors of technical and biological variations. Using the novel Removing Unwanted Variations-III-Negative Binomial (RUV-III-NB), we identified several known experimental factors, specifically storage conditions and freeze–thaw cycles, as likely major sources of unwanted variation in metagenomes. We also observed that these unwanted technical variations do not affect taxa uniformly, with freezing samples affecting taxa of class Bacteroidia the most, for example. Additionally, we benchmarked the performances of different correction methods, including ComBat, ComBat-seq, RUVg, RUVs, and RUV-III-NB. While RUV-III-NB performed consistently robust across our sensitivity and specificity metrics, most other methods did not remove unwanted variations optimally. Our analyses suggest that a careful consideration of possible technical confounders is critical during experimental design of microbiome studies, and that the inclusion of technical replicates is necessary to efficiently remove unwanted variations computationally.


AS was supported by the ARC Discovery Project Grant DP200101248. MI was supported by the Munz Chair of Cardiovascular Prediction and Prevention. MF was supported by a Melbourne Research Scholarship from The University of Melbourne jointly funded by the Baker Heart and Diabetes Institute. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. This study was also supported by the Victorian Government's Operational Infrastructure Support (OIS) program.


Publication Date



Scientific Reports



Article Number





Springer Nature



Rights Statement

© The Author(s) 2022 This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Usage metrics

    Journal Articles




    Ref. manager