All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Based on the reviewers' reports, the revised manuscript has been improved in terms of quality and readability. Since the contribution is solid, the manuscript is acceptable for publishing in the journal.
The manuscript has been well-revised. Literature review is sufficient with good background provided. The results contains clear definition of all terms.
The experiment design is clear with sufficient details and justified. The manuscript has met the standards of the journal.
The findings have been rigorously validated with sufficient details. Conclusions are well stated.
all the issues I raised have been addressed
all the issues I raised have been addressed
all the issues I raised have been addressed
all the issues I raised have been addressed
The manuscript is well-written with an interesting topic. However, you need to compare your study to existing methods to highlight the contribution of the proposed algorithm. Additionally, the typos should be corrected via careful proof reading before re-submission.
[# PeerJ Staff Note: The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at copyediting@peerj.com for pricing (be sure to provide your manuscript number and title) #]
This paper proposes a biped control framework based on reinforcement learning. The expert trajectory of the traditional controller is introduced to accelerate the training. And the exploration of reinforcement learning ensures the final model outperforms the expert instead of simply imitating the expert. To improve the training efficiency and performance, some improvements are also introduced into the framework. The method is validated by various experiments, including two tasks (walking as fast as possible & tracking specific velocity) and several different environments (plain, up-hill, down-hill and uneven floor). The work is conducted well and is promising in different control tasks. Both the framework and the tricks are inspiring for other works.
Problems in the content and explanation of the paper:
1. The description of the PPO algorithm is not detailed enough. The meaning of lambda and gamma in Table1 and their usage are not illustrated.
2. The experiments are rich for the LORM models. However, the performance of reference motion should be illustrated to show the improvement or difference between the proposed algorithm and reference motion.
Other details should also be edited:
1. Some paragraphs have indentation while others do not. For example Line 295 and Line 297; Line 273 and Line 276.
2. Please avoid capital letters in sentences. For example, Line 15 “ Learn and Outperform the Reference Motion (LORM), an RL based framework ... “ in the abstract.
3. Equation 17: f(x) = ..., however, there is no variable x or function f(x). I think it should be Criterion = ... .
4. The names of curves in Figures can be polished, for example, Fig11. And the captions can be used in figures to make the meaning of curves more clear.
5. The language should be further polished. Especially in the subsection “Symmetrization of actions and states”, I wonder whether it can be more clear? Though the description is understandable, it takes time to read and understand it.
In addition, I have two questions to be answered by the authors:
1. In the subsection “Symmetrization of actions and states”, why only the angles of joints are symmetrized while other observations keep unchanged?
2. The input observation of the agent contains many items which can be obtained in simulation software (base position, position of centre of mass). However, is it possible to obtain them in the real world?
An RL based framework for gait controlling of biped robot is proposed in this paper to overcome the complications of dynamics design and calculation. The results validated the efficiency and the advantages of the proposed method. However, there are several suggestions for the authors
1. As the proposed method is claimed to be a novel method, there should be more literature discussion in the introduction part to clarify the state-of-art of the field and thus the novelty of the paper.
2. In the result and discussion part, it is better to compare and validate the result with published works to make it more convincing.
3. There are some typo and grammar errors in the paper, please give it a proofreading for the language check.
The results are sufficient enough to validate the aim of the paper. However, more discussions are expected to emphasize the novelty and significance of the method.
In the result and discussion part, it is better to compare and validate the result with published works to make it more convincing.
As the proposed method is claimed to be a novel method, there should be more literature discussion in the introduction part to clarify the state-of-art of the field and thus the novelty of the paper.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.