Completed on 7 Jun 2018 by Dominic Wright.
Login to endorse this review.
This is an interesting study that looks at the genomics of five different species of bird of paradise. These are extremely interesting, and under-studied, set of species (as the authors note), and the initial sequencing of these birds can certainly help to resolve the phylogeny surrounding them.
The study itself has two main aspects - firstly to resolve the phylogeny of these birds by sequencing three (and resequencing two) species, and secondly to attempt to identify the genes that underlie the range of different phenotypes that are exhibited by this set of species. The phylogeny resolution results and the conclusions drawn from them are sound and not over-interpreted. The major issues with this manuscript concern the search for positive selection and the use of GO terms to extrapolate the genes underlying the various phenotypes.
1) In the search for positively selected genes the authors identify 178 genes using a nominal p-value significance if I understand their methods correctly, as they state the distribution of p-values gave an excess of non-significant p-values so therefore felt it was not appropriate to use a bonferonni or other multiple testing correction. Given that they test 8133 genes, some form of multiple testing correction is definitely required. Permutation testing seems the most obvious approach as that will be robust against non-normal distributions (see Nielsen, RCD et al, 2005, A scan for positively selected genes in the genomes of humans and chimpanzees. PloS Biology 3:723-733 for one example, but I am sure may others exist).
2) In terms of using this technique (dN/dS ratios) to identify single genes that are being positively selected, this technique is more robust when all the genes in a genome are considered more to identify overall patterns (see Hartl and Clark 'Principles of Population Genetics'). By only having one representative per species as in this case, it should be mentioned that some of the variation present for that species may be being overlooked, which can bias the estimate accordingly.
3) The use of genes with some signature of positive selection as a means of determining genes underlying phenotypic traits is almost certainly over-extrapolated here. By taking a laundry list of genes and then picking potentially interesting putative functions and ascribing them to different traits without further proof is over-speculation (e.g. collagen and extracellular matrix genes are identified, but these genes can potentially affect a gamut of different functions, the authors then say that they are possibly affecting feather formation, craniofacial development, etc). As a further example, the authors state structural colours as being typified by these birds, and that some of these genes are potentially related to structural (and other) colouration. However, no actual genes for structural colours have been identified that I am aware of, so it is unlikely that any genes identified by previous means are the exact ones that control structural colour variation. In general the traits the authors state are of interest are complex quantitative traits, so the QTL controlling the inter-specific variation are most likely in non-coding regions, and depending on the linkage disequilibrium in the genomes of these birds, easily missed in any case when only considering coding regions. Therefore without additional analysis this speculation over gene function should be drastically reduced.
4) Similarly, looking at GO enriched terms can also easily be extrapolated. For example, in a study by Pavlidis, P. et al. (A critical assessment of storytelling: GO categories and the importance of validating genomic scans. Mol Biol Evol 29(10):3237-3248, 2012), a set of selective sweeps were generated at random in a genome. In each instance, the genes were then analysed for GO enrichment and plausible functions ascribed in each case. This means that GO terms should be used as a complement to, rather than as the basis for the whole study.
In summary, the novel aspects of the sequencing of these genomes is very interesting, their use to resolve the phylogeny is strong, and they are an important step in being able to bring the full power of modern genomics to bear on these species. However, the use of just single representatives of each species to then attempt to prescribe putative phenotypic functions and relationships between the genes that are identified purely based on dN/dS ratios and then speculations regarding the genes identified is highly speculative and undermines the study as it currently is.