Created on 11th June 2017
This paper has been published in GigaScience.
Despite the great advances in microbial ecology and the explosion of high throughput sequencing, our ability to understand and integrate the global biogeochemical cycles is still limited. Here we propose a novel approach to summarize the complexity of the Sulfur cycle based on the minimum ecosystem concept, the microbial mat model and the relative entropy of protein domains involved in S-metabolism. This methodology produces a single value, called the Sulfur Score (SS), which informs about the specific S-related molecular machinery. After curating an inventory of microorganisms, pathways and genes taking part in this cycle, we benchmark the performance of the SS on a collection of 2,107 non-redundant RefSeq genomes, 900 metagenomes from MG-RAST and 35 metagenomes analyzed for the first time. We find that the SS is able to correctly classify microorganisms known to be involved in the S-cycle, yielding an Area Under the ROC Curve of 0.985. Moreover, when sorting environments the top-scoring metagenomes were hydrothermal vents, microbial mats and deep-sea sediments, among others. This methodology can be generalized to the analysis of other biogeochemical cycles or processes. Provided that an inventory of relevant pathways and microorganisms can be compiled, entropy-based scores could be used to detect environmental patterns and informative samples in multi-genomic scale.Show more