All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
All reviewers are satisfied with the current revised manuscript. I concur with reviewers and suggest accepting this manuscript.
[# PeerJ Staff Note - this decision was reviewed and approved by Jyotismita Chaki, a PeerJ Computer Science Section Editor covering this Section #]
no comment
no comment
no comment
This manuscript has been completely revised. This paper will inspire and help many colleagues.
N/A
N/A
N/A
The revision resolved all my questions and concerns. No additional comments.
The paper is well organized.
The data is sufficient.
After reviewing the revised version, I recommend accepting this paper.
After reviewing the revised version, I recommend accepting this paper.
The reviewers have minor concerns about this manuscript. The authors should provide point-to-point responses to address all the concerns and provide a revised manuscript with the revised parts being marked in different color.
no comment
no comment
no comment
Chemical informatics has become an indispensable aspect of modern chemical research. This paper uses artificial neural networks, such as short-term memory, bidirectional LSTM, converter, and fine-tuning T5, to extract relevant data from USPTO and EPO patents, and designs a pipeline that can use original patent data to create chemical reactions and their processes in a structured format, to modify or merge new patents easily. A method of using a machine learning algorithm to extract actions from a structured chemical synthesis program was proposed, which bridged the gap between chemistry and natural language processing. This study promotes the application of AI-based methods, which simplify synthesis pathways, predict reaction results, and optimize experimental conditions.
The following modifications are suggested:
1. Hope to further clarify the chemical characteristics of the data set collected in the article.
2. The article should establish a link to facilitate the reader to refer to the dataset and program.
The paper is very well-written, well-structured and provides detailed descriptions of the proposed methodology and its evaluation. There is only a minor expression issue
Line 58~59 "which can identify patterns and trends in data that humans may not discern easily discernible" I believe there are duplicate usage of discernible
The paper proposes a methodology that uses machine learning algorithms to extract actions from structured chemical synthesis procedures, bridging the gap between chemistry and natural language processing. The methodology includes a pipeline that combines machine learning algorithms and scripts to extract relevant data from USPTO and EPO patents by collecting, processing, and transforming experimental procedures into a series of structured actions. The pipeline has two primary tasks solved by ML algorithms: classification of patent paragraphs to select relevant information and extraction of actions from the selected paragraphs. The paper provides detailed descriptions of each step in the pipeline and evaluates the performance of the methodology on a dataset of chemical synthesis procedures.
Great work!
The evaluation results show that the proposed methodology achieves high precision and recall scores, indicating that it is effective in extracting relevant information from patent documents, which is quite convincing.
The paper organizes well but is too long to read, the author may reorganize the section to make it better readable.
The experimental design and data are enough to explain the objective of the paper.
The innovativeness is moderate.
1. The result shows that BiLSTM performs the best in task 1, could the author give more distinct explanations about why this ML model achieved the highest score?
2. Same for task 2, could the author provide more detail about why transformers give the best results compared with other models?
3. Could the author explain how we deal with the sentence with InvalidAction and FollowOtherProcedure?
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.