Review History


All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

  • The initial submission of this article was received on October 5th, 2018 and was peer-reviewed by 2 reviewers and the Academic Editor.
  • The Academic Editor made their initial decision on November 6th, 2018.
  • The first revision was submitted on February 13th, 2019 and was reviewed by the Academic Editor.
  • The article was Accepted by the Academic Editor on February 14th, 2019.

Version 0.2 (accepted)

· Feb 14, 2019 · Academic Editor

Accept

Thank you for the close attention to detail expressed by the reviewers. I have considered the revision - the manuscript appears well refined and is ready to be moved forward. The software should have utility to many users in the high-throughput data arena. Consider your manuscript Accepted for Publication. Thank you for your contribution.

# PeerJ Staff Note - this decision was reviewed and approved by Elena Papaleo, a PeerJ Section Editor covering this Section #

Version 0.1 (original submission)

· Nov 6, 2018 · Academic Editor

Major Revisions

In general the manuscript was well received; however, with some question of the claims stated, and the use case examples presented to demonstrate a test-run of the software. With the newer ways of testing software, might I suggest the development of some sort of DOCKER container that might be used as a test platform for a potential user to test before installing. The reviewers present some valid use case scenarios which are not apparently addressed within the scope of the manuscript; perhaps expanding a number of test-sets would be of value. As another way of demonstrating the value of the software, an example of a “deeper” set of data to demonstrate the utility with high-throughput “big” data types might be formulated as an example. With claims compared against Galaxy, the actual benchmark would really need to be measured. The reviewers provide some valid suggestions which should be addressed. The feedback provided should provide a valuable starting point for strengthening the presentation of the software. Since there may be considerable revision to address the suggested points I will rank this as requiring “Major Revision”, but it may be easier to adjust than the categorization implies. We look forward to your revision.

·

Basic reporting

The manuscript is overall well-written, figures are clear and easy to follow. Although I like the general idea of organising secondary data more systematically, I was not fully convinced what Pixel’s USP was compared to popular and well-established tools like Galaxy which support many data types and offer data tagging.

1. Title is misleading. "Digital lab assistant" can mean anything. "Multi-omics": Pixel is made for a very specific type of omics data. There are many types of omics data that Pixel does not support, e.g. protein-protein interaction data. Similarly, the term data integration is misleading, as it usually implies the utility of ontologies, standard vocabularies, URI to integrate data - none of which are used in Pixel as far as I can tell. I suggest to use a different title, e.g.: “Pixel - a content management platform for quantitative omics data”.

2. Galaxy comparison. The latest version of Galaxy provides tags and group tags (https://docs.galaxyproject.org/en/release_18.09/releases/18.09_announce.html). Primary data can be tagged, and tags are propagated all the way to secondary data in Galaxy histories/ libraries. Galaxy has tools to filter, join and visualise tables. I like to know what the USP of Pixel is compared to Galaxy?

Experimental design

The software stack used to develop and deploy the software is cutting edge. The presented use case nicely illustrates the functionalities of Pixel. First, filters are used to identify two datasets tagged with the same species and “alkaline pH”. Second, genes/proteins are filtered if they contain “pathogenesis” in their description. It was not clear in Line 259-262 where the description came from.

My concerns are:
3. Reproducibility. Pixel clearly helps with data/content management, but I can’t figure out how it helps with reproducibility. As the authors clearly state, Pixel is not a tool for primary data analysis, it is for managing and visualising secondary data. When users upload data into Pixel, they can provide a description of the “Analysis” and “Experiment”. It is very optimistic to expect users would provide tool versions and parameters that created the secondary data. I would recommend to remove any references to reproducibility.

4. Metadata and Tags. The success of Pixel’s search depends on the richness and consistency of user-provided tags and metadata. Two users could choose different tags to describe the same dataset. How does Pixel cope with situations when different terms are used to describe the similar data, or with spelling mistakes?

5. Use case. It would be good if the author’s can illustrate why doing the same task in Galaxy would be more difficult: tag search, text search, join three tables and histogram visualisation.

6. Excel. Despite its universality it is notorious for identifier mangling (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7). I would like to learn what safeguards there are against this.

Validity of the findings

Minor technical issues:
I got this following the README:
- ‘make bootsrap’ (first step):
- django.db.utils.ProgrammingError: relation "core_omicsarea" does not exist
Went away after re-running the command
- Then I got: /bin/sh: yarn: command not found
- after installing 'yarn', it went fine

The step 'make dev', last message from 'make dev' is 'Creating pixel-dev_node_1 ... done'. At this point the server seems started, but the stdout isn't clear on this.
After this step, I can see a login page at http://127.0.0.1:8000, but I have no clue what to do from there.
There are requirements like Yarn, which the documentation should mention.
The documentation should say something about what to do once the application is deployed (is there a default user? Should I create it) and it should mention first steps I can do as user.
Moreover, regarding the step 'or run the Django development server and the css build watcher separately', it's not clear what to do as initial steps in either case (clone and cd).

·

Basic reporting

Denecker et al. describe the Pixel infrastructure, a digital lab assistant to manage, annotate and integrate datasets produced with a variety of omics technologies. In the first part of the manuscript, the authors describe the software in terms of underlying technology and design, and provide and overview of its usage. They then provide a short use case, leading to the identification of units potentially involved in the pathogenicity of Candida glabrata under alkaline stress using two omics datasets. The article is generally well written and conveys a good idea of the software's capabilities and the underlying technologies and development model.

In terms of spelling, I have two comments

- Line 79, possibly replace Importation with Import.
- Line 255, 'are search_ed_'

Experimental design

Here, I will comment on the software and its features:

- I struggled to install the software locally. Given the open development model used by the team, I will directly address this in a GitHub issue.

- As the authors acknowledge, installation of the software would require dedicated support, even if its use, later, is very intuitive. Given my own issues to install and test Pixel, and that potential users would want to try it out before installing it locally, it would be useful if a public instance could be used as test bench.

- I think further clarification or examples of what the Value and QS can be used for would be helpful. It becomes clear, later, in the example, that they can be used to store fold changes and p-values, but can any other data be used such as, for example classification or clustering results? Is there thus a limitation to 2 quantitative values per unit? How would one deal with a case where the experiment compared more than two groups and more fold changes and p-values are to be reported?

- Are the tags used to document Pixel sets free text, or is there some controlled vocabulary that will prevent different users to use different synonyms for the same term?

- On lines 188-190, I think there's a need for more information about 'user notebooks'. Are these dynamically executed scripts, or only provided as a documentation? Is this optional or mandatory?

- I would suggest to link to documentation in docs and docs-install in README.md file.

- Is it possible to export and update Pixel sets?

- On line 315, at the end of the discussion, the authors explain that a private web app contains 20000 pixels. Following the definition of a pixel being one line/entry in a pixel set, this seems very low.

Validity of the findings

Here, I will focus on the use case.

My main concern is that Pixel Set A zip file is missing and that the Pixel Set B isn't available at all. Hence, even if I had managed to install the software, I wouldn't be able to test it by reproducing the use case presented in the manuscript.

I also find the use case very simplistic, or lacking details, to make a convincing case. For example, the integration, based on common identifiers (unit names), which I assume, need to be mapped accordingly prior to importing the data into Pixel, is trivial. How would Pixel deal with metabolomics data, where liking between gene/protein names and metabolites isn't as direct? A more elaborated use case, that would illustrate a data analysis cycle as presented in figure 1C, would be useful.

In the use case, units matching the 'pathogenicity' term are searched for. How were these units annotated originally? This seems to be crucial to be able to perform a search and identify the units of interest. How could a user identify units matching a specific GO term, for example?

In the use case, the authors use p-values as an example. Are these not adjusted p-values, or shouldn't they be? This, in my opinion, also reduces the relevance of the use case.

Additional comments

To summarise, my main concerns are

- being able to test the software and either reproduce the use case, or at least use test data provided by the developers;
- the use case could be improved to make a stronger case in favour of Pixel.

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.