Samuel A. Nastase, Yaroslav O. Halchenko, Andrew C. Connolly, M. Ida Gobbini, James V. Haxby
Review posted on 24th February 2018
The paper "Neural responses to naturalistic clips of behaving animals in two different task contexts" describes a new benchmark dataset for brain decoding. It brings a breath of fresh, unique quality in the context of similar currently available datasets. It will by no doubt be recognized as a valuable resource for the years to come. Nonetheless, a few improvements would make the manuscript better.
- Please provide stimuli files in the /stimuli folder linking them to the individual events via the stim_file column in _events.tsv files. See the BIDS specification for details. This is probably the most important improvement to the dataset I came across.
- Consider distributing preprocessed version of the datasets. This would allow scientists to run analyses using this dataset without the need to perform preprocessing themselves. In my experience providing a preprocessed version of the data increases its reuse potential. You can just run FMRIPREP on directly OpenNeuro (I recommend using the "--use-syn-sdc" option since the dataset does not include fieldmaps), and it will be available alongside your dataset. The manuscript should include information about the availability of this data and a brief description of FMRIPREP outputs (it's redundant, but convenient for the reader).
- Providing a figure with example frames from each category of stimuli would greatly help readers in understanding the paradigm.
- Similarly plotting the distributions of selected QC parameters would also improve the manuscript.
- The manuscript would benefit from division into sections such as Introduction, Methods, Results, Discussion (where a comparison to other publicly available datasets could be added) and Conclusion.
- It might be useful to consider making it explicit in the title that this paper is a data descriptor.
Javier Rasero Daparte, Hannelore Aerts, Jesus M. Cortes, Sebastiano Stramaglia, Daniele Marinazzo
Review posted on 20th February 2018
Functional connectivity as measured by resting state fMRI could one day be an important clinical biomarker. This paper attempts to push forward our understanding of intrinsic brain connectivity during rest and while performing a cognitive task.
"Predicting functional networks from region connectivity profiles in task-based versus resting-state fMRI data" by Rosaro et al. has the potential to contribute to our understanding of resting state connectivity meaningfully, but is held back by a confusing and unclear presentation.
After reading the abstract, introduction and the methods section, I was not clear what the authors attempted to predict from connectivity measures. My best guess is that the task was to predict which brain network a brain region belongs to given a vector of its connectivity measures with all other brain regions. A task formulated this way is, however, straightforward if we assume correspondence of connectivity measures across all the input samples. This assumption means that the first value of the vector always corresponds to connectivity with region A, second with region B, etc. for all input samples. The consequence of such encoding is that the connectivity vector for region A will have a correlation value of 1 at first value of the connectivity vector. In other words, the identity of brain regions is represented as noisy one-hot-encoding. All the network or the classifier has to do is to figure out which brain regions correspond to which networks - something that could be done without any knowledge of brain connectivity.
This is just speculation since I was not able to grasp the details of the analysis to confirm what was being predicted and how connectivity was encoded.
To improve clarity in the future revision of the manuscript, I recommend adding a conceptual figure presenting the prediction task in terms of dependent and independent variables (features and labels).
More specific comments:
- The abstract is confusing. "In this study we use a large cohort of publicly available data to test to which extent one can associate a brain region to one of these Intrinsic Connectivity Networks looking only at its connectivity pattern, and examine at how the correspondence between resting and task-based patterns can be mapped in this context." This sentence too long and convoluted.
- Page 3: "we will explore..." -> "We will explore..."
- Page 4: missing citation for the HCP project
- Page 4: "has been proved to increase the quality of the original data" citation needed.
- Page 4: "connectivity map" might be a better term than "correlation image"
- Page 4: How was the assignment of each brain region to a brain network performed? Shen and Yeo's parcellations differ in region definitions.
- Page 5: "Finally, the 282 resulting individual FC matrices were concatenated together" it's unclear if this was done separately for task and rest or the data was combined first. What dimension were the matrices concatenated along?
- Page 5: was the cross-validation performed across participants or nodes? Or both? If so why?
- Page 6: Table 1 is missing the MLP results
- Prediction accuracy on another dataset (with different acquisition parameters) would be good evidence of the robustness of your findings.
Matan Mazor, Noam Mazor, Roy Mukamel
Review posted on 16th December 2017
Mazor and colleagues in their manuscript titled “Using experimental data as a voucher for study pre-registration” propose a solution that potentially prevents bad actors from pretending that they preregistered the analysis protocol prior to data analysis. This is an interesting approach to an increasingly important problem. Some technical issue might prevent this solution from being practically applicable.
- The proposed solution only works if the authors share raw data (which would be great, but sadly is not common).
- The verification process requires reviewers to reanalyze the data which seems like an unrealistic expectation.
- Differences between processing pipelines used by the authors and the reviewers could result in slightly different results (see Carp et. al. 2012) and raise false concerns about changes to the preregistered protocol. This could be exploited further by the randomization scheme that has a very limited set of orders resulting in very similar results.
- “Bob then uses the Python script that he found in the protocol folder to generate a pseudorandom sequence of experimental events, based on the resulting protocol-sum.” Isn’t the fact that the code to translate the checksum to random order provided by the authors? What if it always gives the same answer? Am I missing some detail?
- A more sophisticated attack would involve modifying already acquired data to temporary rearrange so it would comply with a protocol defined post hoc. This would, however, require a highly motivated bad actor.
- RPR approaches do not necessarily provide time locking. One could imagine a situation, when a bad actor collects data, picks analysis protocol post hoc, submits to first stage of registered report pretending they did not acquire any data yet. This way they could game the system, but only assuming reviewers will not require changes in acquisition protocol.
Jason D. Yeatman, Adam Richie-Halford, Josh K. Smith, Ariel Rokem
Review posted on 09th October 2017
The paper entitled “AFQ-‐Browser: Supporting reproducible human neuroscience research through browser-‐based visualization tools” is a beautifully written description of a software tool that takes outputs a specific of a specific diffusion MRI analysis method (AFQ) and creates interactive visualizations that make data exploration easy. The tool implements some truly innovative ideas such as piggy backing on GitHub as a service for hosting data and visualizations and representation of data in a form that is appealing to data scientists with no prior MR experience. I hope that other tools will emulate those features. The manuscript also includes thoughtful discussion of exploratory vs hypothesis driven methods.
- The abstract gives the reader the wrong impression that the AFQ-Browser tool is more generic than it really is. It should be clarified that the tool only allows users to visualize and share outputs of AFQ analyses.
- When describing BrainBrowser and its involvement in MACACC dataset surely you meant “visualization” not “analysis”.
- It might be worth to introduce the publication feature earlier in the paper. I was quite confused when reading about reproducibility and data sharing without knowing that AFQ-Browser is not just a visualization tool.
- Please mention in the paper the license under which the tool is distributed and any pending or obtained patents that would limit its use or redistribution.
- If all AFQ users start uploading their results to GitHub using AFQ-Browser it might be hard to find or aggregate those results. It might be worth considering (and discussing) a centralized index (also hosted on GitHub) of all publicly available AFQ-Browser generated bundles. This index can be automatically updated during the “publish” procedure.
- GitHub is a great resource, but have few guarantees in terms of long term storage. A solution to this would be depositing the bundles into Zenodo which could be done directly from GitHub. Would be worth implementing and/or discussing this in the manuscript.
- It’s a technical detail, but it took me a little time to figure out why the tool requires user to spin up a local server (presumably to be able to access CSV and JSON files). Might be worth elaborating.
- Saving the visualization “view” (or “browser state”) seems cumbersome when done via a file. Could the view be encoded in the URL (via GET parameters)? Sharing of such views would be much easier and natural.
- Some example analyses include information about group membership or demographic information such as age. How is such information stored and conveyed to AFQ-Browser? Does it also come as output of AFQ?
- In the manuscript you mention that AFQ-Browser allows users to compare their results with normative distributions. Where are they coming from a central repository (please describe how it is populated) or do users need to provide such distributions themselves?
- It might be worth considering a crowdsourcing scheme such as the one employed in MRIQC Web API (https://mriqc.nimh.nih.gov/) to generate normative distributions of AFQ outputs.
- Is the way you store data in CSV files and their relation to the JSON files (beyond the “tidy” convention) described somewhere in detail? It would be useful for users.
- Please describe the software testing approach you employed in this project.
Tim van Mourik, Lukas Snoek, Tomas Knapen, David Norris
Review posted on 27th September 2017
Porcupine by van Mourik et al. is extensible cross platform desktop application that allow users to quickly design neuroimaging data workflows via a graphical user interface. Lack of graphical user interface has been a deeply needed feature for Nipype and Porcupine fills this gap.
Porcupine is designed in a very smart and flexible way allowing it to be extended to new code generation backends. Furthermore, since the output is the source code of the pipeline the processing can be customized via editing the code. Reproducibility of the produced pipelines is increased, by the generation of Dockerfiles.
It’s hard to understate this contribution since Porcupine since it will expose a large community of researchers that prefer graphical interfaces to reproducible neuroimaging pipelines.
- The manuscript at some point mentions saving MATLAB code, but I don’t believe such plugin exists yet.
- It might be worth mentioning NIAK as potential output plugin.
- In context of computational clusters it might be worth clarifying that Docker images can be run via singularity.
- “Nypipe” -> “Nipype”
- It’s unclear why the user is required to make modifications to the output Dockerfile – it seems that it should be possible to generate a complete Dockerfile without a need for any modifications.
- “It should be noted that Porcupine is not meant for low-level functionality, such as file handling and direct data operations.” What does that mean? Could you give an example?
- In context of graphical workflow systems: did you mean JIST instead of CBS Tools?
- “providing a direct way of creating this is high on the feature list of Porcupine” –> “planned features list”?
- The license under which Porcupine is distributed is not listed in the manuscript