All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
The revison is well done. Thanks for sending the paper to PeerJ!
[# PeerJ Staff Note - this decision was reviewed and approved by Stephen Macknik, a PeerJ Section Editor covering this Section #]
Please further edit your paper according to the review.
While the authors say they have removed causal language, “impact” remains in the title (this was noted previously) and on Line 242. See also “increased” on Lines 178–179. It would be worth checking the entire manuscript for any other language that alludes to causality rather than association. The limitations of cross-sectional data are appropriately noted on Lines 244–245.
Line 95: Repeated “as a as a”.
The details on Lines 119–120 (about measuring blood pressure) might fit better on Line 113.
Line 126: “…t-testS WERE used…” or “…THE t-test was…”
Line 134: Missing space in “andLUTS”.
Line 150: It would be helpful to specify “mean ± SD” on Line 150. (I appreciate you mention this in the methods on Line 125, but some readers will skim/skip that section.)
Line 182: “more significantly associated” isn’t a meaningful description. You could note that the odds were higher in one group. This numerical comparison would be strengthened by statistically comparing these two groups and noting the statistical significance in the text around here. Also Line 221.
Line 189: I’m not sure what you mean by “depression as the outcome of interest” here. Do you mean study outcome rather than outcome variable in the statistical models? If so, that wasn’t clear to me.
Lines 210–211: I think there is some unintended repetition in “in the pathogenesis of major depression in the pathogenesis of depression and LUTS”.
Table 1 talks about “weighted percentages” but you do not otherwise talk about the complex sampling in NHANES. Is this spurious? The percentages I checked did not appear to be weighted (or if they were, the effects of weighting were negligible). Line 252 suggests that weights were not used at all.
Table 1: Note that there is clearly some skew in age (a 95% reference range would be 33.8 to 80.0 which doesn’t make sense given the minimum age of 40 here). I would also expect some skew in weight and BMI. This might make arithmetic means and SDs inappropriate descriptive statistics for such variables and it would be worth checking that medians for these are not meaningfully different to arithmetic means, with medians and IQRs or geometric means and SDs used as appropriate if this is the case. The approach used to determine appropriate descriptive statistics should be mentioned in the statistical methods.
Tables 1 and 2: You mention “weighted percentage” for both of these, but, as mentioned above, I’d understood that the sampling design was not incorporated into the analyses. If they were, it’s unclear why means were not similarly weighted. See Line 252 again.
Table 2: Given the skew in age, t-tests might be questionable here. Similarly for any other variables with substantial skew.
Table 3: You report unadjusted p-values, but not Wald test p-values for the adjusted models. Is there a reason for this? Personally, I’d recommend using a univariable Wald test p-value rather than the Chi-squared test p-value here just for consistency (so using a logistic/modified Poisson regression model with depression category only, then this with age, and finally adding the other variables).
Table 3: Refers twice to using “sampling weights”. These were not mentioned in the statistical methods and based on Line 252 and your rebuttal response, I had understood that these were not used here. The Chi-squared tests, however, do not mention weights (note b). Could you please clarify?
My comments about multivariable and multivariate might not have been clear. A common convention is that multivariable is the correct term here as it is multiple independent variables being modelled. See Lines 128 (twice), 130, 133, 176, and Table 3. Similarly, “univariable” is correct when one predictor (including groups) is being looked at, e.g. Lines 126, 129, and 168. See https://dx.doi.org/10.2105%2FAJPH.2012.300897 for arguments in favour of this nomenclature.
Following on from my comments about Tables 1 and 2 above, I had previously asked about whether some variables were too skewed to warrant using means and SDs as descriptive statistics (suggesting geometric means and SDs or medians and IQRs) and I could not follow your response: “Thank you for your comments. The size of the study population was large enough to represent the population and in the clinical perspective. According to recently reported demographics of the United States population, the value of continuous variables were not skewed, and we think that using means and SDs is justifiable.” While a larger sample provides some protection with the sampling statistic being more likely to be approximately normally distributed, and so the p-values being appropriate despite the underlying distribution, this does not address the issue around descriptive statistics. The demographics of the US population are also not relevant to your sample here, unless these were for men aged 40+.
I still find the causal model here to be, in part at least, unexpected, but not unreasonable. By looking at LUTS as the outcome in logistic regression models, it is implied that you’re interested in whether depression leads to LUTS, or at least whether depression can be used in screening for LUTS, and you are adjusting LUTS for the covariates rather than adjusting depression for these variables. From a strictly association perspective, which is the dependent and which the independent variable is not too important, but the variable that is being adjusted for is important. It seems to me that depression is potentially an outcome of LUTS, although I’m also open to arguments that depression could be causally linked to LUTS beyond just nocturia, and I appreciate that investigating patients with depression for LUTS is also an option. The studies cited here (3, 10–13; as referenced on Line 55) appear to be a mixture in terms of the direction, and my comments on this point can be seen as inquisitive rather than directive. One option when there is genuine uncertainty about the causal direction is to look at models in both directions and it is not uncommon, in my experience, for one direction to be statistically significant even after adjusting for potential confounders and the other direction to not be so, providing some guidance in thinking about the causal direction of the association. This comment is just following up on my previous comments on this point and I am not requesting further changes here.
Related to this though, the statement on Lines 241–243 is much too strong for a cross-sectional association. Words such as “impact” require justification beyond that possible here.
In response to my comment: “Note that given that this is not a case-control study, the authors are not required to use logistic regression, and so can use modified Poisson regression to produce relative risks, or could instead justify using odds ratios to estimate these.”, the authors responded “Since this was a case-control study, we could not compute the probability of the disease in each exposure group, and RR could not be calculated.” However, this is definitely not a case-controls study, despite Lines 140–142. In a case-control study, participants are selected based on their case or control status and this means that no estimate of prevalence is available (the ratios of cases to controls is fixed by the design of the study rather than able to be determined from the data); here, all men aged 40+ who provided the necessary data were included and their case or non-case status was determined by the data and not the study design. The choice to present relative risks from modified Poisson regression or odds ratios from logistic regression is up to the authors, but it is important that they appreciate that this does not appear to be a case-control study and so RRs are not precluded. For completeness, the statement on Lines 142–143: “As LUTS and depression are not common disease entitie, we used ORs to estimate relative risks.” (note “entitie”) would be true but only requires LUTS to not be common, the prevalence of depression is not relevant here.
Thank you for adding details looking at collinearity (Lines 129–132) and for your response about some of the diagnostics you used. I suggest adding a reference justifying your cut-off of 2 for VIFs around Line 131. Diagnostics for the logistic regression extend beyond collinearity, and tests of goodness of fit are common here, e.g. Hosmer-Lemeshow’s test. Some readers will also want to see mention of model assumptions and diagnostics for the t-tests included in the manuscript (Line 126). Note that it is not the normality of the dependent variables that matters, but the residuals of the model, which for a t-test is equivalent to normality within each group, and I think it’s worth being careful when explaining this point in the text.
Because this is observational data, I think more caution is needed in interpreting the results, particularly in the absence of a thoroughly justified causal model. For example, perhaps Line 191 should read something like “…our study suggests that multiple factors MIGHT be involved…” Similarly, the “should” on Line 192 is very definite, more than I feel is justified here. There is no evidence here, after all, that LUTS would not predict depression after adjusting for plausible available confounders (and indeed, you discuss this causal direction for the association on Lines 201–203), and both could be being caused by a third variable (as you discuss on Lines 205–206). I think the results here justify looking at interventions, but caution is definitely needed without those RCTs.
Lines 251–252: While this is true in terms of designing the study, the actual power of the study is reflected in the widths of the 95% CIs and so the actual adequacy of the sample size can be evaluated here based on these widths and their relationship to clinically important effects.
The point on Lines 254–255 does not make sense to me. If there are interactions that exist in the data generating process and these are unmodelled, this makes the estimates suspect (the model is mis-specified).
The manuscript has been appreciably improved and reads much more clearly now.
Dear Dr Chul,
Your paper needs some improvements as stated
"Especially important is establishing an underlying causal model so that the appropriate variables are included in the appropriate statistical models (which have been checked for their mathematical assumptions), leading to full and complete reporting and interpretation of the results."
If you are willing to change it adequately it will be a pleasure to review your updated paper
[# PeerJ Staff Note: Please ensure that all review comments are addressed in an appropriate rebuttal letter, and please ensure that any edits or clarifications mentioned in the rebuttal letter are also inserted into the revised manuscript (where appropriate). Direction on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]
The writing is generally fine, but there are some instances where the writing is a little awkward or slightly unclear (e.g. Lines 129–130, 132). Careful proofreading should find these.
The authors need to be careful to avoid making causal statements from observational cross-sectional surveys (including “impact” in the title and on Line 239, and “cause” on Lines 18 and 64).
Lines 16–18: Because the authors use logistic regression for their adjusted analyses, the question being investigated here is whether these variables are associated with the risk of LUTS (despite the data being cross-sectional, there is still a direction in the statistical models which generally becomes the implied direction for causality in adjusted models—while causality cannot be confidently determined from such data, we can, and do, test the consistency of the observed data with an assumed, global null, causal model). This is not meaningfully different to the question as stated here for unadjusted analyses (comparing means and comparing odds would lead to similarly worded questions with basically the same intent), but in the adjusted analyses, the effects are conditional on other variables’ effects on the dependent variable and so the two approaches (comparing means and comparing odds, each time adjusting a different dependent variable for the other variables) diverge in terms of their research questions. See also Lines 25, 27–28 (but note that Lines 28–29 is correct as are Lines 29–32), and elsewhere for this. If all results are presented in the same way (with LUTS as the dependent variable), I think the reader will find the results easier to follow.
I wouldn’t describe depression and sleep disorders as demographics (e.g. Lines 16–17) but rather as measures of health or as morbidities.
Line 20: 2005 not 2006.
Line 37: “discussed” or “investigated”? This doesn’t seem to me to hold though as you are modelling unemployment as a statistical predictor of LUTS, rather than LUTS as statistically predicting unemployment. I think you could note that the association between unemployment and LUTS and that this could motivate investigation into the economic burden of LUTS. (See comments about the underlying causal model below, particularly the possibility that unemployment is a collider in the fully adjusted models here.)
Readers might be interested in the number of men excluded due to each criterion listed on Lines 79–80.
Lines 129, 130, 131, 132, etc.: “multivariable” is for multiple independent variables; “multivariate” is for multiple dependent variables. Similar for univariate and univariable. Please check your usage of these terms throughout the manuscript but note that your usage on Lines 24 and 29 was correct.
Line 127: Are all continuous variables approximately normally distributed so as to justify using means and SDs (rather than medians and IQRs or geometric means and geometric SDs)? Weight and BMI tend to be positively skewed and CRP (Line 211) is always highly positively skewed, as is the case here.
Lines 132–135: This text read to me initially as if you would model the symptoms as statistical predictors of LUTS. This does not match the models in Table 4 and so the text may need to be reworded to make the intent clearer.
Normally, where a mean is presented, a standard deviation will also be included, e.g. Line 146.
Lines 146 and 147: The decimal places for this are inconsistent with other percentages. There may be other instances of this and the manuscript should be checked carefully.
Lines 157–158: The decimal places for ORs and their associated 95% CIs are inconsistent. I would suggest that two decimal places would be sufficient in all cases here.
Line 157: If this is per year of age, it must be made clear to the reader (also Line 165). The level of interest (the non-reference level), e.g. for employment status, must also be clear to the reader.
For Tables 2 and 3, ORs, 95% CIs, and p-values should be reported for both unadjusted and adjusted analyses. As discussed below, the selection of variables for the adjusted model cannot be entirely empirically-driven due to the need to still have a sensible causal model underlying these cross-sectional analyses.
For Table 4, if effect modification by depression status is of interest, p-values for the appropriate interactions should be added after the first three results columns.
I’d like to see the authors’ thoughts on the next steps needed for research in this area added to the discussion or conclusions.
While the data is not provided directly, it can be obtained easily enough through the NHANES website.
The single most important part of a study such as this is identifying the necessary set of variables (confounders) to adjust for in order to estimate the association(s) of interest, without including variables that should not be included (colliders and variables on the causal pathway should the total effect and not the direct effect be of interest). This part of the study needs to be entirely clear to the reader in both the abstract methods and in the body of the manuscript. It is good that these variables are clearly listed in the body of the manuscript, although they are not clear in the abstract’s methods, but they still need to be put into context with an underlying causal model. Identifying the necessary variables cannot be done by looking at unadjusted models (as in Tables 2 and 3), for example, due to the potential for colliders and variables on the causal pathway being included, and the possibility that other important confounders were not examined or not available. Preparing a directed acyclic graph (DAG) for this purpose is often useful to the reader (perhaps included as a supplementary file) in understanding the modelling, as well as to the authors in planning their statistical analyses and in interpreting their results. For example, depression may lead to sleep disorder, which in turn leads to LUTS, i.e., sleep disorder could in theory be on the causal pathway between depression and LUTS, making its inclusion an overadjustment for the purposes of estimating the total effect of depression. Alternatively, you might have a different DAG in mind. Similarly, unemployment might increase the risk of depression, in turn increasing the risk of LUTS. Alternatively, unemployment might be a collider if the DAG was drawn differently.
Some of the models in Table 4 should be identical to models in Tables 2 or 3 once the appropriate underlying causal model is identified, unless there is a very good reason for these to differ. Can the authors comment on this point?
You look at a large number of independent variables. Did you consider interactions between any of these? Model specification tests might be useful in identifying any omitted interactions, but better would be to address this based on the plausibility of the effect modification suggested by the interactions.
The level of detail in the statistical methods should be enough for a (bio)statistician to be able to replicate your results with the data sets downloaded from NHANES without needing to resort to trial and error. This includes all model diagnostics and decisions made on the basis of these.
Lines 71–72 could be misread as saying that you obtained this consent for this particular project.
Lines 128–131: While unadjusted logistic regression and Chi-squared tests will give comparable p-values for binary dependent variables, as noted above, it doesn’t make sense to use the latter for the unadjusted analyses (with no measure of effect or 95% CI provided) and then the former for the adjusted analyses. A simpler approach is to use logistic regression for all models with a binary dependent variable, making a smoother transition from unadjusted to adjusted models and providing effects and CIs for all of these results.
Given the number of cases, and note that the effective number of cases will be reduced by design effects, the authors need to outline how they have controlled for excessive model complexity (see, for example, Peduzzi, et al., 1996's guidelines about sample sizes and logistic regression; if alternative approaches are used, they should be justified).
Note that given that this is not a case-control study, the authors are not required to use logistic regression, and so can use modified Poisson regression to produce relative risks, or could instead justify using odds ratios to estimate these.
The reader might wonder why you used NHANES 2005/6 and 2007/8 and not a more recent survey/surveys.
In Tables 2 and 3, it is not clear why you have not used logistic regression for all independent variables (providing ORs, 95% CIs, and p-values in each case irrespective of statistical significance, as you do in Table 4 for the ORs and 95% CIs, with p-values still needing to be added there). Note that it does not make sense to both compare mean age and use age as a predictor of the odds of LUTS, for example. The linearity (on the log-odds scale) of associations will need to be checked, with this process explained in the methods, for all continuous variables. It is useful to show both unadjusted and adjusted results (and in some cases, partially adjusted results are also useful where there is some uncertainty about the underlying causal model that is informing the analyses, e.g. whether something is a confounder or potentially on the causal pathway).
While I appreciate that the number of cases and non-cases is out of the authors’ control, I would still hope that they performed some sample size adequacy checks prior to analyses and, if so, these should be presented here. If this was not done, this fact should still be noted.
All statistical models make mathematical assumptions. How were the assumptions for the t-tests, Chi-squared tests, and logistic regression models checked? These diagnostics need to be clearly explained in the statistical methods in order for the reader to have confidence in the analyses. Note that I am also suggesting that you use logistic regression (or perhaps better, modified Poisson regression) for all analyses involving LUTS, which would simplify the set of assumptions that need to be discussed here.
It is good that the authors have described the NHANES (Lines 68–74) but please note that NHANES is a complex survey and as its analytic guidelines state: “Survey sample weights should be used, and the complex survey design must be accounted for in the estimation of variance.” Such methods are not described as part of the statistical methods here. As you are looking at a subset of the data, please also note the instruction: “An analyst may have a certain demographic subgroup of interest, such as a particular age range or gender, or a subsample of participants who received a particular laboratory test. For proper variance estimation, the entire set of data containing the appropriate weights must be used. The estimation procedure must then indicate which records are in the subgroup of interest.” (Both quotes are from the National Health and Nutrition Examination Survey: Analytic Guidelines, 1999-2010.) This has important implications for statistical analyses, especially for prevalences and other descriptive measures. While the statistical methods are silent on these, the legend to Tables 1, 2, and 3 says that percentages (but does not mention means) are weighted and the legend to Table 4 mentions sampling weights (but not clustering, etc.). Could the authors please explain, and if necessary, justify, what was done in this regard?
How has multiplicity been addressed?
Note that ORs and 95% CIs are required irrespective of statistical significance (Tables 2 and 3).
The ORs and 95% CIs should be used to interpret clinical/practical significance as well as whether further research is needed for certain questions.
While there is certainly value in the research question posed here, there are some significant issues that I feel need to be very carefully addressed. Especially important is establishing an underlying causal model so that the appropriate variables are included in the appropriate statistical models (which have been checked for their mathematical assumptions), leading to full and complete reporting and interpretation of the results.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.