To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
After following all the reviewers' recommendations I think the paper can be accepted for publication.
The authors have addressed the majority of my concerns, so the paper could be accepted.
Although the paper has been positively evaluated, one of the reviewers have serious concerns about it, so you should prepare a new version of the manuscript addressing all the reviewers' comments included.
The research issue is well presented and is a challenging one. The structure of the paper is also sound and easy to follow, although several sub-sections should be combined into larger ones.
The main contribution of the paper regards the proposed architecture for having ML capabilities in the browser. The main drawback of the presentation regards clear presentation of client and server side. It was somehow difficult to figure out exactly which parts are on the server side and which are on the client side.
- The Boss is a data worker and/or a web worker?
- Both Master Server and Data Server are on the same server machine?
- Is Hadoop used in this project? Where? On Master Server and/or Data Server?
The experiments are fine although more accuracy and performance metrics may be required.
Yes, the findings are valid but may be a little preliminary. More structured experimental results may be needed.
The paper needs two improvements:
- refine the presentation of the architectural decisions for a better understanding
- more structured experimental results (accuracy metrics and performance assessment) may be needed.
The paper is clear and readable. The idea is interesting, and I really enjoyed reading the manuscript.
In the introduction, the authors should clearly clarify the three objectives that are later mentioned in Section 2. Additionally, it is important to clearly state the motivation and justification of the paper. In this sense, research questions are missing. The paper should be more focused on the scientific side, but it mostly covers technological aspects.
Please describe the organisation of the paper uniformly. For example, Section 3.2 is described but section 3.1 is not even mentioned. Mentioning the main sections is more than enough.
Please, avoid including explanatory text in captions (Fig. 1 to 3). Such pieces of text should be incorporated to the section.
All the acronyms should be explained (e.g. SGD, etc.)
The paper is not properly motivated and the experimental framework presented is weak. For example, how did the authors reach this approach? Which design choices were made? Which limitations have their solution and how were they neutralised? Are other solutions (e.g. using services) viable in this case?
The technological contribution is notorious and technically sound, but the scientific side should be conducted more precisely. Additionally, technological choices should be plenty justified as well, e.g., why node.js?
A more detailed comparative framework with both other design alternatives and other previous solutions would be required.
All the implementation and contribution seems to be focused on a given problem domain. What if the library should be scaled, integrated within other system or just a single module would be reused? How is it done? I am afraid that it would be not that simple, and most of the current modules should be adapted or reimplemented, e.g. to add preprocessing capabilities, data and result handlers, new algorithms, etc. Who is in charge of extending the library? Can an external developer to add new features or should we depend on the releases provided by a development team? If it is considered extendable, then please include evidences (case studies, a further discussion, etc.) All these aspects should be explained in detailed so the real contribution is clearer.
The experimental framework should be clearly defined. Important information is missing to support the scalability of this approach. Which is the largest amount of data supported? Which technical/performance limitations have the authors found in their approach? Any limitation regarding data (type, size, etc.)?
The paper makes some strong assumptions without any substantial and precise scientific support. For example, in Section 2.1, the authors mention that “to make ML truly ubiquitous requires ML writing models and algorithms with web programming languages and using the browser as the engine”. This should be supported by references.
In fact, a major issue is that the authors often make use of subjective references to endorse their work and assumptions. It is clearly lacking of rigour. References to particular blog entries or subjective articles should be avoided. Please cite instead peer-reviewed works and other precise sources of information. If assertions are not properly founded, they become speculation.
Another strong issue is that the paper is not clear about what is really done and what is to be. For example, the abstract seems to indicate that GPU capabilities are already provided (“MlitB offers [..] including: development of distributed learning algorithms, advancement of web GPU algorithms [..]”. Until Section 2 we do not know whether it is really implemented or not. In fact, the last paragraph of Section 2.2 should be moved to Future Work.
Please, clearly differentiate the current contribution from future work.
It also happens with Objective 2 and 3, described in Section 2, which are not properly developed later.
Section 2.3 explains how important is to provide mechanisms to make reproducibility easier. I totally agree. However, it seems to be an item in our wish list, because how to make it with the library is not properly explained in the paper. In this sense, it is likely that the contribution of Dr. Antonio Ruiz and Dr. Jose Antonio Parejo (University of Seville, Spain) about reproducibility in the field of ML (they are building some sort of framework in this context) would be of interest.
A performance analysis is required so that we can really compare this approach and know whether it would be a viable choice. Citing other external works about the language performance is not enough. Please, design and include a comparative performance study of your own proposal. This becomes especially important in the field of ML because of its increasing computational requirements.
In a real environment, how many users could it support? How would the increase of users affect the performance?
In general, my view on this work is very positive. The idea is really interesting and the work, promising and challenging. However, I regret to say that, according to the criteria given to the reviewers by PeerJ, the manuscript seems to be immature yet and requires an important rewriting effort. The experimental framework and its validation from a scientific perspective are weak.
Web workers and web sockets should be explained. Discuss why they are the best choice would be interesting (limitations, characteristics, etc.)
In Fig. 5, what does “Probability” means? It is not a precise measure.
Figure 6 is unreadable.
I cannot see Section 5.3 as an opportunity. It should be presented as future work, right?
Section 7.2 does not provide a significant contribution. Please extend to a better clarification.
How are new developments incorporated to the server?
What would happen if the browser is suddenly closed?
Is there any domain/s to which this library is especially well-suited?
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.