All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
I read the revised manuscript and the rebuttal letter, and found that all reviewers' concerns have been addressed.
[# PeerJ Staff Note - this decision was reviewed and approved by Keith Crandall, a PeerJ Section Editor covering this Section #]
Both reviewers recognize that your manuscript adds more meaningful data to the field. Please improve the manuscript by addressing all concerns raised by the two reviewers, especially their concerns about the experimental design.
[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
The authors have reported to develop a first valid quantitative structure-activity relationship (QSAR) model following OECD guidelines based on multiple linear regression to improve prediction of in vivo human fetal-maternal blood concentration ratio (logFM) of chemicals.
This study is overall well designed and the manuscript is well written.There are a few items that need to be addressed in order to further improve the quality of the paper.
1. All the abbreviations such as OECD, AATSC1c should be defined at the first instance of using them.
2. line 107-108, please explain the biological reason the rule of the feature selection: 2) more than or equal to 50% zero values (scarcity), and 3) small variation (less than 8 unique values).
3. Please give the reference of Eq. 1.
4. line 146, Eq. 2. should be interpreted with the biological reason such as what the meaning of negative-valued in the prediction model.
5. line 156, background of y-randomization test should be given to explain why using it to test the model?
6. line 175-179, the authors should explain the 3 exclusion rules were selected to exclude chemicals out of AD. For example, 3 corresponding chemicals (Mifepristone, Didanosine, Diazepam) were identified to be out in training dataset while 3 compounds (chloroquine, didanosine, and DDE) were considered outliers because they have more than two of the standardized residuals (Takaku, T. etc. 2015).
To clarify the importance of the features used in multiple linear regression analysis, the standard partial regression coefficients should be calculated.
Any limitations to use this model needs to be discussed.
The manuscript is well written in general.
A few suggestions:
1) cite original data source of the in vivo fetal-maternal blood ratio of 55 chemicals.
2) Add legends to figures (e.g. color label, Y/N meaning).
3) Include a supplementary table for all the original 1444 descriptors.
1) Correlation coefficients of descriptors should be checked to avoid the multicollinearity before feature selection.
2) It's not clear if the authors did Feature Scaling? if so, which method used? Feature Scaling is important for linear regression. With feature scaling, the first 3 steps of feature selection may not necessary to apply.
1) Could the authors give some explanations/discussions on why the two descriptors (AATSC1c and ZMIC1) are informative, any biological significance behind them?
2) Since the current study used the same dataset and algorithm for predicting the same biological question (DOI: 10.1248/bpb.b14-00883), the author should discuss more about how and why the predicting performance improved.
3) Previous research identified MW, Hmax and TopoPSA as the best descriptors. Did these three descriptors identified as well in the original 1444 descriptor set? why they are not been picked up after feature selection?
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.