Completed on 27 Jul 2017 by Daan Speth .
Login to endorse this review.
In their manuscript, a new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle, De Anda and colleagues describe a computational approach to characterize the relative importance of sulfur cycling in a (meta)genome dataset.
Their approach, calculating a "sulfur-score" based on the detected vs expected presence of genes involved in sulfur cycling seems appropriate for this question. I do however have several questions on the (description of) the methodology that I'd like to see clarified before recommending this manuscript for publication.
The introduction of the conceptual framework, in line 74-89 was not very clear to me at first reading. For readability of the manuscript I suggest the authors include some of the information that is present in the methods section. Specifically, why the minimum ecosystem concept and microbial mats are important.
I'm a little confused why the authors use the mean size length metric. Given a well curated reference database, even short reads should be alignable to protein sequences of any length. The length of the protein will impact the expected number of matches.
After the authors have selected 152 proteins involved in the S cycle, they only use 112 domains as annotated by interproscan. Do these domains represent 112 proteins? Why did the authors choose not to generate HMM's for the remaining 40 proteins?
Other than calculating the relative entropy, have the authors used any other check to assess whether the detected pfam domains were specific for the S-cycle? Many pfam domains contain proteins with a range of functions.
the purpose of figure 3 is not entirely clear to me. As it is, it contains too much information to be informative
the elaborate description of the metagenomes mentioned in line 433-463 seems unnecessary for the flow of the manuscript.
line 32: ROC is used but not defined till later in the text
line 46: I suggest changing "apparition" to "appearance"
line 147: what does a DNA signature of 0.01 mean?