Completed on 17 Aug 2018 by Liya Wang.
Login to endorse this review.
This paper extends the SNiPlay workflows and provides a AWS powered Galaxy platform for the Rice community to leverage the Rice variation data. This could be useful but there are several significant concerns regarding the design of the system.
1. One challenge listed in the paper is that large CPU and RAMs are needed for dealing with fairly large data matrix. However, the entire Rice Galaxy platform is powered by an AWS machine that has only 2 CPUs and 4G RAM. Is this enough for large scale computation or the system is just built for small scale computation?
2. A related question is how long will this system last. Whats the cost of the system? If user wants to download all Rice VCFs from AWS using the Rice Galaxy, what's the cost of the data transfer? Its mentioned that S3 CLI is used to copy the entire gzipped VCF file to Rice Galaxy and then get subsetted with BCFtools. Is the copy free? It will be good to know how this is handled, otherwise, the system won't last long for serving the Rice community.
3. The platform wrapped a lot of tools/apps with the Galaxy platform. Can individual tools be dockerized and/or made available via bioconda? If so, these tools can be integrated into other publicly accessible platforms like CyVerse DE and SciApps.
4. How big is the total data source that users need to access from the Rice Galaxy?
Other suggestions are:
1. Is there any tutorials and working examples on using the platform with public trait data?
2. Page 2 line 31, extra white space after 'phenotypic'
3. Page 7, more details on how SNP lift-over works. What's the size of the flanking sequences used? How often will this workflow find the lift-over SNP or not?
4. Page 8 line 171, 'in within a JBrowse' should be changed to 'using' or 'in'
5. Page 16 Line 359, 'pane' should be 'panel'