Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on April 8th, 2025 and was peer-reviewed by 4 reviewers and the Academic Editor.
The Academic Editor made their initial decision on June 17th, 2025.
The first revision was submitted on July 31st, 2025 and was reviewed by 3 reviewers and the Academic Editor.
A further revision was submitted on September 5th, 2025 and was reviewed by 3 reviewers and the Academic Editor.
A further revision was submitted on September 19th, 2025 and was reviewed by the Academic Editor.
The article was Accepted by the Academic Editor on September 20th, 2025.

Version 0.4 (accepted)

Faizan Kashoo · Sep 20, 2025 · Academic Editor

I am satisfied with the manuscript now, although some typographical errors remain—for example, the period should appear after the reference, not before. However, these minor corrections can be addressed during the production stage.

[# PeerJ Staff Note - this decision was reviewed and approved by Mike Climstein, a PeerJ Section Editor covering this Section #]

**PeerJ Staff Note:** Although the Academic and Section Editors are happy to accept your article as being scientifically sound, a final check of the manuscript shows that it would benefit from further editing. Therefore, please identify necessary edits and address these while in proof stage.

Download Version 0.4 (PDF) Download author's response letter (v0.4) - submitted Sep 19, 2025

Version 0.3

Faizan Kashoo · Sep 14, 2025 · Academic Editor

Minor Revisions

Please address the minor comments from Reviewer 2.

Reviewer 1 · Sep 7, 2025

Basic reporting

good revisions

Experimental design

good revisions

Validity of the findings

good revisions

Additional comments

good revisions

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.3)". PeerJ

Reviewer 2 · Sep 8, 2025

Basic reporting

No comment

Experimental design

No comment

Validity of the findings

The discussion and conclusion sections need to be aligned to study hypothesis.
Introduction: "It is hypothesized the four-category ordinal scales would be most informative in describing performance with reliable results."
Discussion: No mention that the study hypothesis is not supported. The authors should carefully revise their discussion first paragraph.
Conclusion: The study should state that the findings did not support the use of 4-point scale, and either the 2- or 3-point scale could be used. The 3-point scale can provide more information on movement deficit as compared to the 2-point scale.

Additional comments

Overall, the authors need to be mindful that their study hypothesis is not supported.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.3)". PeerJ

Reviewer 4 · Sep 9, 2025

Basic reporting

No further comments

Experimental design

No further comments

Validity of the findings

No further comments

Additional comments

No further comments

Cite this review as

Anonymous Reviewer (2025) Peer Review #4 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.3)". PeerJ

Download Version 0.3 (PDF) Download author's response letter (v0.3) - submitted Sep 5, 2025

Version 0.2

Faizan Kashoo · Aug 24, 2025 · Academic Editor

Major Revisions

Please revise your manuscript in accordance with the reviewers' comments and resubmit it for further consideration. If you find any particular comment not feasible to address, kindly provide a detailed explanation in your rebuttal letter.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · Jul 31, 2025

Basic reporting

Experimental design

Validity of the findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.2)". PeerJ

Reviewer 2 · Aug 1, 2025

Basic reporting

The use of English still needs improvement; the authors are to review carefully as mentioned previously.
Original comment: L139-143 Background: No study hypothesis reported.

Authors’ response: Line 160-162. A hypothesis has been added at the end of the third paragraph in the Introduction section: “It is hypothesis the four-category ordinal scales would be most informative in describing performance with reliable results.”

New comment: “It is hypothesis…” is incorrect. It should read “It is hypothesized…”

Experimental design

Original comment: L97-100 Background: It is unclear what the claims on guidelines are when there are 6 references that are not clinical practice guidelines.

Authors’ response: We respectfully disagree with the reviewer’s comment regarding the six referenced articles. All six references are, in fact, clinical practice guidelines (CPGs) published in the Journal of Orthopaedic & Sports Physical Therapy (JOSPT), which is a recognized source for evidence-based clinical guidelines in the field of physical therapy. Here is the link: https://www.orthopt.org/content/publications/pub-cpg

New comments: I noted the author's response; it is unclear how the CPGs relate to the aim of the study, with only one recommendation for the single-leg squat. I provide my comments to each of the six references below. The authors need to remain focused on their topic of interest.

REF1: Arundale AJH, Bizzini M, Dix C, Giordano A, Kelly R, Logerstedt DS, Mandelbaum B, Scalzitti DA, Silvers-Granelli H, and Snyder-Mackler L. 2023. Exercise-Based Knee and Anterior Cruciate Ligament Injury Prevention. J Orthop Sports Phys Ther 53:Cpg1-cpg34. 10.2519/jospt.2023.0301

Specific comment: This is a CPG recommendation for programming of intervention, not assessment.

REF2: Cibulka MT, White DM, Woehrle J, Harris-Hayes M, Enseki K, Fagerson TL, Slover J, and Godges JJ. 2009. Hip pain and mobility deficits--hip osteoarthritis: clinical practice guidelines linked to the international classification of functioning, disability, and health from the orthopaedic section of the American Physical Therapy Association. J Orthop Sports Phys Ther 39:A1-25. 10.2519/jospt.2009.0301

Specific comment: I am unsure why the authors cite an outdated reference when there is a 2017 version. There is no advisory on the use of the single-leg squat in the 2017 version as well.

REF3: Enseki KR, Bloom NJ, Harris-Hayes M, Cibulka MT, Disantis A, Di Stasi S, Malloy P, Clohisy JC, and Martin RL. 2023. Hip Pain and Movement Dysfunction Associated With Nonarthritic Hip Joint Pain: A Revision. J Orthop Sports Phys Ther 53:Cpg1-cpg70. 10.2519/jospt.2023.0302

Specific comment: This is appropriate and should be the main focus. The sentence should zoom in on the topic of the single-leg squat instead of a preamble on other functional tests.

REF4: Logerstedt DS, Snyder-Mackler L, Ritter RC, Axe MJ, and Godges JJ. 2010. Knee stability and movement coordination impairments: knee ligament sprain. J Orthop Sports Phys Ther 40:A1-a37. 10.2519/jospt.2010.0303

Specific comment: The study highlights the single-leg hop test at Grade B evidence, which is different from your study on the single-leg squat.

REF5: Martin RL, Chimenti R, Cuddeford T, Houck J, Matheson JW, McDonough CM, Paulseth S, Wukich DK, and Carcia CR. 2018. Achilles Pain, Stiffness, and Muscle Power Deficits: Midportion Achilles Tendinopathy Revision 2018. J Orthop Sports Phys Ther 48:A1-a38. 10.2519/jospt.2018.0302
Specific comment: The study suggests hop and heel raise endurance at Grade B evidence, which is different from your study on single-leg squat.

REF6: Martin RL, Davenport TE, Fraser JJ, Sawdon-Bea J, Carcia CR, Carroll LA, Kivlan BR, and Carreira D. 2021. Ankle Stability and Movement Coordination Impairments: Lateral Ankle Ligament Sprains Revision 2021. J Orthop Sports Phys Ther 51:Cpg1-cpg80. 10.2519/jospt.2021.0302

Specific comment: The study suggests jumping and landing task at Grade B evidence, which is different from your study on the single-leg squat.

Original comment: L204-206 Methods (Statistical Analyses): The authors did not declare a clear cut-off for interpreting their results. Based on their citation (McHugh, 2012), the minimum acceptable cut-off for the kappa coefficient is 0.41. Hence, the results and conclusions are inappropriate for choosing a 3-point categorical scale.

Authors’ response: We appreciate the reviewer’s thoughtful comment and agree that the overall intra-rater reliability for the three-category scale (κ = 0.35) does not meet the minimum acceptable threshold of κ = 0.41 as suggested by McHugh (2012). We acknowledge that this value is insufficient to support the use of the three-category scale for overall composite scoring, and we have revised the Abstract, Discussion, and Conclusion sections of the manuscript to reflect this limitation.
However, we would like to clarify that our recommendation regarding the three-category scale to its application in rating specific movement components (trunk deviation, hip adduction, and lower extremity internal rotation), rather than a composite score.

New comment: I thank the authors for taking the time to revise the manuscript based on my original comment. The kappa difference between the 2- and 3-point scales is not clinically meaningful, so from the lens of a clinician, simple is better, complex is not [REF a]. I understand that the authors have chosen to stay with the 3-point scale because of the specific movements' interpretation. The study title is now inaccurate as it is no longer aligned with the new study purpose to apply an ordinal scale for specific movement appraisal. If specific movement is important to the authors, it validates my earlier point that the introduction needs to be in-depth for the single-leg squat.

REF a: Docking, S. I., Cook, J., & Rio, E. (2016). The diagnostic dartboard: is the bullseye a correct pathoanatomical diagnosis or to guide treatment?. British Journal of Sports Medicine, 50(16), 959-960.

Validity of the findings

New comment: The authors have made substantial revisions to address my concerns. My thoughts are that either the 2-point or 3-point scale is comparable, and it is up to the readers and clinicians to decide whether to use a 2-point or 3-point scale.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.2)". PeerJ

Reviewer 4 · Aug 4, 2025

Basic reporting

The authors have addressed most of my comments. However, they need to improve the methods section to make the description of the assessment and rating procedures clearer. While the authors state that specific details about the methods and SLS rating are included in the text—and that is true—the information is not presented clearly enough, as evidenced by the fact that all reviewers made similar comments. I recommend revising this section for clarity (the appendix also).

Experimental design

Validity of the findings

Cite this review as

Anonymous Reviewer (2025) Peer Review #4 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.2)". PeerJ

Download Version 0.2 (PDF) Download author's response letter (v0.2) - submitted Jul 31, 2025

Version 0.1 (original submission)

PeerJ Staff · Jun 17, 2025 · Academic Editor

Major Revisions

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors agree that they are relevant and useful.

**PeerJ Staff Note:** Please ensure that all review, editorial, and staff comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

Reviewer 1 · Apr 25, 2025

Basic reporting

The manuscript is good and conforms to professional standard. However, there are minor grammatical and phrasing issues that could be improved for better clarity, particularly in the Methods and Results sections. For example, the phrase "The study aim to determine" should be corrected to "The study aims to determine." Additionally, the figures and tables are relevant and well-labeled, but the raw data files mentioned in the submission materials were not provided for review, which is a critical omission. The authors should ensure all raw data is included to comply with journal policies and enhance transparency.

Experimental design

The study design is robust, with a clear research question and well-defined methods. The inclusion and exclusion criteria are appropriate, and the use of standardized outcome measures (LEFS and UCLA Activity Scale) strengthens the study. However, the description of the SLST procedure could be more detailed to ensure replicability. For instance, the exact criteria for categorizing trunk deviation, hip adduction, and lower extremity internal rotation into "normal," "moderate," and "severe" are not explicitly stated in the main text. The authors should provide these details either in the Methods section or as supplementary material. Additionally, the rationale for selecting a 1-week interval for intra-rater reliability testing could be better justified, as other studies have used longer intervals, which may affect comparability.

Validity of the findings

The findings are statistically sound and supported by appropriate analyses, including the use of unweighted kappa values for reliability assessment. The results demonstrate that the three-category scale offers a good balance between detail and reliability, which is a valuable contribution to the field. However, the study's generalizability is limited by the homogeneous sample of young, active individuals with lower extremity injuries. The authors should acknowledge this limitation and discuss how the findings might differ in older or less active populations. Furthermore, the discrepancy in intra-rater reliability between this study and previous work using video assessments warrants further discussion, as it raises questions about the consistency of real-time versus recorded evaluations.

Additional comments

The manuscript would benefit from a more thorough discussion of the clinical implications of the findings. For example, how might the three-category scale be integrated into routine clinical practice, and what training might be required for clinicians to achieve reliable results? Additionally, the ethical approval statement is noted, but the authors should confirm that all identifiable information has been removed from the datasets.
Two additional limitations should be addressed:
Reliance on Self-Reported Data: The study uses self-reported measures such as the LEFS and UCLA Activity Scale, which are subject to recall bias and subjective interpretation. Recent large-scale studies on physical activity [doi: 10.1016/j.jesf.2025.03.004; doi: 10.1038/s41514-025-00217-0] have highlighted discrepancies between self-reported and objectively measured activity levels. The authors should refer to previous literature and acknowledge this limitation and discuss how it may influence the interpretation of functional performance in their study.
Cross-Sectional Design: The study’s cross-sectional nature limits the ability to infer causality or long-term reliability of the SLST rating scales. Longitudinal or repeated-measures designs, as employed in other movement assessment studies, could provide stronger evidence for clinical utility. The authors should explicitly discuss this inherent limitation and suggest future research directions, such as test-retest reliability over longer periods or interventional studies tracking changes in SLST performance.

**PeerJ Staff Note:** It is PeerJ policy that additional references suggested during the peer-review process should only be included if the authors are in agreement that they are relevant and useful.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.1)". PeerJ

Reviewer 2 · May 3, 2025

Basic reporting

The use of English is fairly poor, starting from the title and throughout the manuscript. The authors need to put in effort to rectify this before further resubmission. A few examples are provided below.
Title: “Comparing the Reliable of the…” should read “Comparing the Reliability of the…”
L47-49 Abstract (Methods): “Intra-rater reliability was evaluated… single point in time.” The whole sentence does not read well. I believe you mean “A single rater rated the SLST with a 1-week interval to establish intra-rater reliability.” Likewise, the language for inter-rater reliability should be corrected.
L49 Abstract (Methods): The reporting order of the ordinal scales “four, three and two” is not aligned with the order reported in title and results. Please be consistent with the reporting for ease of reading.
L96-97 Background: It is unclear how evaluation of movement system prevents musculoskeletal injuries.
L139-143 Background: No study hypothesis reported.

Experimental design

L97-100 Background: It is unclear the claims on guidelines when there are 6 references that are not clinical practice guidelines.
L204-206 Methods (Statistical Analyses): The authors did not declare a clear cut-off for interpreting their results. Based on their citation (McHugh, 2012), the minimum acceptable cut-off for kappa coefficient is 0.41. Hence, the results and conclusions are inappropriate to choose a 3-point categorical scale.

Validity of the findings

L54 Abstract (Results): The overall intra-rater reliability value, particularly 4-category scale does not make sense as compared to latter reported values.
L54-55 Abstract (Results): The intra-rater reliability for 3-category is poorer than 2-category, so I am uncertain how the authors derived at their choice of conclusion.
L63 Abstract (Conclusion): The study did not investigate assessment details such as specific descriptors of performance, so this can be misleading.
Overall, the authors should review their manuscript arguments, at the present moment the writing contradicts from section to section in the manuscript.
L103-105 Background: The reference (Ressman et al., 2019) cited by the authors have concluded that a scale of less than or equal to 3-point scale is recommended, so the claims by the authors that an optimal method is unknown is surprising.
L129-131 and L169-170: It seems that the reliability has already been established in past studies (e.g McGovern et al., 2018 and McGovern et al., 2019), so this study should be clear in how this study value-adds to previous studies.
L228-229 Discussion: The reporting is not aligned with the results. The reliability of the 2-point and 3-point scales are not similar. In fact, the 2-point scale is the one that shows better intra- and inter-rater reliability because the intra-rater reliability of the 3-point scale is 0.35. This means that within the individual rater, the 3-point scale use is poor and that reliability within both raters could be equally poor leading to good inter-rater reliability.

Additional comments

Nil

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.1)". PeerJ

Reviewer 3 · Jun 7, 2025

Basic reporting

The study addresses a clinically relevant question about optimizing ordinal scales for visual assessment of the Single Leg Squat Test (SLST). The methodology is generally sound, and conclusions align with the results.
- The language is clear, unambiguous, and professional
- Enough literature references, and sufficient field background
- Article structure is quite good
- The result is relevant, but need improvements on method

Experimental design

- The experimental design of this study demonstrates clear alignment with the journal's scope by addressing a clinically significant gap in musculoskeletal rehabilitation research. The research question is precisely defined and highly relevant, aiming to determine whether two-, three-, or four-category ordinal scales offer the optimal balance between detail and reliability for rating trunk deviation, hip adduction, and lower extremity rotation.
- The study’s originality lies in its simultaneous comparison of all three scales within the same cohort and testing session, reducing inter-study variability and enabling direct benchmarking—a novel contribution to the field.
-Methodologically, the design incorporates rigorous technical standards:
- Ethical compliance is robust

- Key limitations undermine replicability and statistical robustness: the definitions and thresholds for ordinal scale categories (e.g., moderate vs. severe deviation) are omitted
- Unweighted kappa was used for >2-category ordinal data, which is statistically inappropriate
- The rationale for defining "trunk overall deviation" as the worst sub-score (flexion/lateral/rotation) lacks biomechanical justification and risks oversimplifying movement quality
- Sample size (n=29/group) is unjustified, risking underpowered reliability analysis.
- The 15-minute rater training protocol may be insufficient for complex 4-category scaling, and rater expertise (11–35 years) limits generalizability to less-experienced clinicians.

Validity of the findings

- While the article addresses a clinically meaningful question, the impact and novelty are only implied, not directly addressed. The potential implications for clinical assessment protocols are significant, but the authors should better highlight these to improve the perception of impact.
- There are serious concerns about the statistical soundness and transparency of the data
- Despite methodological limitations, the conclusions are appropriately restrained and closely tied to the study’s central research question. The authors correctly state that reliability differs across scales and that certain scales may offer a balance between detail and inter-rater agreement, but the authors should qualify their claims more explicitly and discuss how methodological adjustments might alter their findings.

Additional comments

The study addresses a clinically relevant question about optimizing ordinal scales for visual assessment of the Single Leg Squat Test (SLST). The methodology is generally sound, but need improvements and revisions in several areas. The conclusions align with the results and it can give good implication in the research area.

Cite this review as

Anonymous Reviewer (2025) Peer Review #3 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.1)". PeerJ

Reviewer 4 · Jun 8, 2025

Basic reporting

Thank you for the opportunity to review this manuscript. This study provides important information for the appropriate use of scales to visually assess the SLS; however, clarification is needed regarding the methods and analyses used before it can be accepted. Please see comments below:

Title: Use reliability instead of reliable.

Background:
Line 105: Please cite this recent systematic review (https://pubmed.ncbi.nlm.nih.gov/37549590/), which also explored the construct validity of visual assessments of the SLS, to complement the reliability findings from the study by Ressman.
Line 110: Please cite this recent paper (https://www.jospt.org/doi/10.2519/josptopen.2025.0050) which developed a scale to visually assess the SLS in patients with FAI syndrome to strengthen the references related to non-arthritic intra-articular hip conditions.
Line 128: Please clarify what the ordinal categories are, as this may not be straightforward for all readers. An example might help.
Line 132: I would focus on “methods of assessment” here as the present study used data from individuals with different injuries and this would represent different populations too.

Methods
Line 147: Please clarify from where and how participants were recruited. Also, why 29 individuals were recruited for inter and other 29 for inter. Why the raters did not assessed all the 59 participants two times and inter- and intra rater reliability was calculated?
Line 155: How was this assessed? Clinical assessment? Self-reported? Please clarify.
Line 189: It is not clear how the rating system works across the different scoring systems (including in the supplementary material). For example, in a four-point rating system, the final score could range from 0 to 9, correct? How was this was classified into groups? If it was not - How was a kappa statistic used to calculate overall score reliability if this was not a categorical variable? Please clarify, and if needed, use the appropriate method to calculate reliability (e.g., ICC for continuous variables).

Discussion
I think the discussion highlight the important points of the results compared to the available literature.

Experimental design

Listed in basic reporting

Validity of the findings

Listed in basic reporting

Additional comments

I believe this study adds a small but important contribution to improving the use of scales for visually assessing the single-leg squat. However, before publication, the methods section needs to be improved to clearly explain the different rating schemes, how the scores are calculated, and whether they align with the statistical analyses used.

Cite this review as

Anonymous Reviewer (2025) Peer Review #4 of "Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales (v0.1)". PeerJ

Download Original Submission (PDF) - submitted Apr 8, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales

Summary

Version 0.4 (accepted)

Faizan Kashoo · Sep 20, 2025 · Academic Editor

Version 0.3

Faizan Kashoo · Sep 14, 2025 · Academic Editor

Reviewer 1 · Sep 7, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Sep 8, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 4 · Sep 9, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Version 0.2

Faizan Kashoo · Aug 24, 2025 · Academic Editor

Reviewer 1 · Jul 31, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 2 · Aug 1, 2025

Basic reporting

Experimental design

Validity of the findings

Reviewer 4 · Aug 4, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

PeerJ Staff · Jun 17, 2025 · Academic Editor

Reviewer 1 · Apr 25, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · May 3, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 3 · Jun 7, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 4 · Jun 8, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Review History
Comparing the reliability of the single leg squat test using two, three, and four category ordinal rating scales