To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
Thank you for addressing the reviewer and editor concerns so thoroughly in this revision. Your document describing in detail how you responded to each concern was extremely helpful for enabling a timely decision. Congratulations on getting this interesting and important work into publication.
This work is interesting, important, and with one exception, well-described and well-executed.
This submission needs to have a very targeted rewrite of section 5.2 Text Mining Pipeline Intrinsic Evaluation. As currently described, this evaluation does not meet the PeerJ guidelines on experimental design (conducted rigorously and to a high technical standard) or validity of findings (data should be robust, statistically sound, and controlled). The authors do not need to prove that their algorithm outperforms others; however, the evaluation should make it clear to what degree their algorithm is able to reliably detect Claims and Contributions in their three corpora. The current evaluation only compares the algorithm results with a human annotated set for one corpus, which has different characteristics then the others (looking at data presented in Table 2).
There are (at least) three options:
(1) Reviewer number 2 offers advice for comparing against known baselines, which is a great idea.
(2) Make it clear why choosing to only sample documents from one corpus is a reasonable choice. It is not obvious to a reader that it is.
(3) Create an evaluation sample that includes a random mix of documents from all three corpora.
The authors should also more clearly describe their processes for creating the gold standard corpus. (Were documents double annotated? What was the agreement between human experts?) This gold standard data set should also be released to support replications of their experiments. They should also provide precision and recall values, in addition to the F-measure.
There are also a few typos to be fixed:
10 for a wide-spread adoption (for wide-spread adoption)
40 provide for use cases such as summarization (provide support for)
42 (NEs) present in a document (e.g., algorithms, methods, technologies) can help locating (can help locate)
83 extraction task by determining zones of text where further analysis is needed. (extraction tasks)
117 mining, controlled vocabularies are used in form of markup languages, which are added (not grammatical)
220 scientific documents LOD entities combined with rhetorical entities. (documents’)
229 Detection of Named Entities). Finally, the extracted information are stored in a seman- (is stored)
624 the concepts of REs and NEs, enhanced retrieval of document becomes possible, e.g., (documents)
The overall methodology is sound; however, some system components are simplistic and naive in nature, especially, the rule-based approach for rhetorical entity detection. It is not clear how expressive and accurate the set of 190 gazetteer entries and the subsequent rules are. The authors present results on a small data set with only a modest 68% F1 score, making the assessment even harder. A more detailed description of these entries and rules along with a qualitative analysis is required (e.g., what are the major categories of claim and contribution statements? what portion of these are targeted by the system? how much linguistic variation do claim and contribution statements exhibit? how much of it can the designed rules accommodate? etc.).
Without any baseline/benchmark, it is not clear how to properly judge the reported results. I suggest that the authors compare their RE and NE detection subsystems with at least one other system (some of which are mentioned by reviewer 1 in the earlier set of reviews).
In addition, the authors should also provide experimental evidence that such a system is helpful for users in some quantifiable way. Something along the line of what reviewer 2 suggested in the earlier set of reviews would be helpful.
Overall, a well-written paper describing a sound methodology. However, I do strongly feel that the authors should additionally carry out the above experiments and report the results. The 'major revisions' recommendation that follows is really meant to stress the importance of these experiments rather than to suggest extensive rework.
Typo : In the results section of the abstract "semantic queries than show" -> "semantic queries then show".
This paper lays out a framework for a valuable contribution to bridging the gaps in semantic annotation for scientific literature. It will be interesting to see the results of your future experiments honing DBPedia Spotlight onto scientific domains so that the over-generalizations addressed in this paper do not dim the importance of this line of research.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.