In this manuscript the authors describe binning and analysis, using sophisticated bioinformatics, of two genomes from organisms belonging to candidate phylum KSB3. These filamentous organisms have been identified as a causing agent in sludge bulking of an UASB reactor. The genomes are used to construct a metabolic model of the organism, and gain insight in its ecophysiology. Additionally, based on the high amount of signal processing genes in both genomes, the authors hypothesize these organisms must be mobile and they proceed to show gliding motility using microscopy.
Although I think the study is well done and the manuscript is well written, there are a few things I’d like the authors to address:
In the methods section the DNA extraction procedure should briefly be mentioned, since it provides the underlying material for the data generated. Therefore, referring to a previous study (which in turn refers to a previous study) is in my opinion not right.
Accession codes are provided for the assembled/scaffolded genomes, but I could not find the raw data. The same goes for the study in which most of the sequencing was originally reported. I’d like to see the underlying raw data submitted to NCBI/EBI/DDBJ
Since UASB14 is the second most abundant organism of a non-bulking UASB it might benefit the paper to briefly discuss its role in the healthy system, and maybe speculate why it is more successful than UASB270.
Although in the methods section a range of stimulants for motility are mentioned, only glucose and maltose are mentioned in the results/discussion. In my opinion the discussion would be more complete when thoughts on the absence of response to the other stimulants are given.
in addition to the point above, I have a few minor comments on specific points in the text.
“cellular processes, including bulking,” seems to imply bulking is a cellular process. Maybe “cellular processes, including those causing bulking,” fits better?
Although I understand what is ment with population genomes, I think it is not (yet) an established term. Perhaps explaining the term in a few words will help establishing it quicker.
It is unclear what is ment with “a minimum similarity of 98% of the read length”, since the CLC mapper allows specification of ‘minimum similarity’ and ‘fraction of read length’ as separate parameters
The N50 for the assembly is given, but this metric is, in my opinion, quite meaningless for a metagenome since the sampling is (almost) always partial. I would advice to remove it.
the authors mention CRISPRs are present in all Archaea, but I think this is not true. Unless I’m mistaken they are absent from at least some thaumarchaea
The authors state the organisms generate ATP by converting acetyl-CoA to acetate. I’d add something along the lines of “in addition to glycolysis” as is shown in figure 4
There is some inconsistency in the notation where number of ORFs in cellular processes are mentioned. At “glycoside hydrolases” a percentage is given, then it is explained at “protease/peptidase”, and absent at “signalling”