PeerJ:Statisticshttps://peerj.com/articles/index.atom?journal=peerj&subject=7900Statistics articles published in PeerJEstimation of the percentile of Birnbaum-Saunders distribution and its application to PM2.5 in Northern Thailandhttps://peerj.com/articles/170192024-02-292024-02-29Warisa ThangjaiSa-Aat NiwitpongSuparat Niwitpong
The Birnbaum-Saunders distribution plays a crucial role in statistical analysis, serving as a model for failure time distribution in engineering and the distribution of particulate matter 2.5 (PM2.5) in environmental sciences. When assessing the health risks linked to PM2.5, it is crucial to give significant weight to percentile values, particularly focusing on lower percentiles, as they offer a more precise depiction of exposure levels and potential health hazards for the population. Mean and variance metrics may not fully encapsulate the comprehensive spectrum of risks connected to PM2.5 exposure. Various approaches, including the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, were employed to establish confidence intervals for the percentile of the Birnbaum-Saunders distribution. To assess the performance of these intervals, Monte Carlo simulations were conducted, evaluating them based on coverage probability and average length. The results demonstrate that the GCI approach is a favorable choice for estimating percentile confidence intervals. In conclusion, this article presents the results of the simulation study and showcases the practical application of these findings in the field of environmental sciences.
The Birnbaum-Saunders distribution plays a crucial role in statistical analysis, serving as a model for failure time distribution in engineering and the distribution of particulate matter 2.5 (PM2.5) in environmental sciences. When assessing the health risks linked to PM2.5, it is crucial to give significant weight to percentile values, particularly focusing on lower percentiles, as they offer a more precise depiction of exposure levels and potential health hazards for the population. Mean and variance metrics may not fully encapsulate the comprehensive spectrum of risks connected to PM2.5 exposure. Various approaches, including the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, were employed to establish confidence intervals for the percentile of the Birnbaum-Saunders distribution. To assess the performance of these intervals, Monte Carlo simulations were conducted, evaluating them based on coverage probability and average length. The results demonstrate that the GCI approach is a favorable choice for estimating percentile confidence intervals. In conclusion, this article presents the results of the simulation study and showcases the practical application of these findings in the field of environmental sciences.Does it pay to pay? A comparison of the benefits of open-access publishing across various sub-fields in biologyhttps://peerj.com/articles/168242024-02-272024-02-27Amanda D. ClarkTanner C. MyersTodd D. SteuryAli KrztonJulio YanesAngela BarberJacqueline BarrySubarna BaruaKatherine EatonDevadatta GosaviRebecca NanceZahida PervaizChidozie UgochukwuPatricia HartmanLaurie S. Stevison
Authors are often faced with the decision of whether to maximize traditional impact metrics or minimize costs when choosing where to publish the results of their research. Many subscription-based journals now offer the option of paying an article processing charge (APC) to make their work open. Though such “hybrid” journals make research more accessible to readers, their APCs often come with high price tags and can exclude authors who lack the capacity to pay to make their research accessible. Here, we tested if paying to publish open access in a subscription-based journal benefited authors by conferring more citations relative to closed access articles. We identified 146,415 articles published in 152 hybrid journals in the field of biology from 2013–2018 to compare the number of citations between various types of open access and closed access articles. In a simple generalized linear model analysis of our full dataset, we found that publishing open access in hybrid journals that offer the option confers an average citation advantage to authors of 17.8 citations compared to closed access articles in similar journals. After taking into account the number of authors, Journal Citation Reports 2020 Quartile, year of publication, and Web of Science category, we still found that open access generated significantly more citations than closed access (p < 0.0001). However, results were complex, with exact differences in citation rates among access types impacted by these other variables. This citation advantage based on access type was even similar when comparing open and closed access articles published in the same issue of a journal (p < 0.0001). However, by examining articles where the authors paid an article processing charge, we found that cost itself was not predictive of citation rates (p = 0.14). Based on our findings of access type and other model parameters, we suggest that, in the case of the 152 journals we analyzed, paying for open access does confer a citation advantage. For authors with limited budgets, we recommend pursuing open access alternatives that do not require paying a fee as they still yielded more citations than closed access. For authors who are considering where to submit their next article, we offer additional suggestions on how to balance exposure via citations with publishing costs.
Authors are often faced with the decision of whether to maximize traditional impact metrics or minimize costs when choosing where to publish the results of their research. Many subscription-based journals now offer the option of paying an article processing charge (APC) to make their work open. Though such “hybrid” journals make research more accessible to readers, their APCs often come with high price tags and can exclude authors who lack the capacity to pay to make their research accessible. Here, we tested if paying to publish open access in a subscription-based journal benefited authors by conferring more citations relative to closed access articles. We identified 146,415 articles published in 152 hybrid journals in the field of biology from 2013–2018 to compare the number of citations between various types of open access and closed access articles. In a simple generalized linear model analysis of our full dataset, we found that publishing open access in hybrid journals that offer the option confers an average citation advantage to authors of 17.8 citations compared to closed access articles in similar journals. After taking into account the number of authors, Journal Citation Reports 2020 Quartile, year of publication, and Web of Science category, we still found that open access generated significantly more citations than closed access (p < 0.0001). However, results were complex, with exact differences in citation rates among access types impacted by these other variables. This citation advantage based on access type was even similar when comparing open and closed access articles published in the same issue of a journal (p < 0.0001). However, by examining articles where the authors paid an article processing charge, we found that cost itself was not predictive of citation rates (p = 0.14). Based on our findings of access type and other model parameters, we suggest that, in the case of the 152 journals we analyzed, paying for open access does confer a citation advantage. For authors with limited budgets, we recommend pursuing open access alternatives that do not require paying a fee as they still yielded more citations than closed access. For authors who are considering where to submit their next article, we offer additional suggestions on how to balance exposure via citations with publishing costs.How to account for behavioral states in step-selection analysis: a model comparisonhttps://peerj.com/articles/165092024-02-262024-02-26Jennifer PohleJohannes SignerJana A. EccardMelanie DammhahnUlrike E. Schlägel
Step-selection models are widely used to study animals’ fine-scale habitat selection based on movement data. Resource preferences and movement patterns, however, often depend on the animal’s unobserved behavioral states, such as resting or foraging. As this is ignored in standard (integrated) step-selection analyses (SSA, iSSA), different approaches have emerged to account for such states in the analysis. The performance of these approaches and the consequences of ignoring the states in step-selection analysis, however, have rarely been quantified. We evaluate the recent idea of combining iSSAs with hidden Markov models (HMMs), which allows for a joint estimation of the unobserved behavioral states and the associated state-dependent habitat selection. Besides theoretical considerations, we use an extensive simulation study and a case study on fine-scale interactions of simultaneously tracked bank voles (Myodes glareolus) to compare this HMM-iSSA empirically to both the standard and a widely used classification-based iSSA (i.e., a two-step approach based on a separate prior state classification). Moreover, to facilitate its use, we implemented the basic HMM-iSSA approach in the R package HMMiSSA available on GitHub.
Step-selection models are widely used to study animals’ fine-scale habitat selection based on movement data. Resource preferences and movement patterns, however, often depend on the animal’s unobserved behavioral states, such as resting or foraging. As this is ignored in standard (integrated) step-selection analyses (SSA, iSSA), different approaches have emerged to account for such states in the analysis. The performance of these approaches and the consequences of ignoring the states in step-selection analysis, however, have rarely been quantified. We evaluate the recent idea of combining iSSAs with hidden Markov models (HMMs), which allows for a joint estimation of the unobserved behavioral states and the associated state-dependent habitat selection. Besides theoretical considerations, we use an extensive simulation study and a case study on fine-scale interactions of simultaneously tracked bank voles (Myodes glareolus) to compare this HMM-iSSA empirically to both the standard and a widely used classification-based iSSA (i.e., a two-step approach based on a separate prior state classification). Moreover, to facilitate its use, we implemented the basic HMM-iSSA approach in the R package HMMiSSA available on GitHub.Mathematical model of voluntary vaccination against schistosomiasishttps://peerj.com/articles/168692024-02-072024-02-07Santiago LopezSamiya MajidRida SyedJan RychtarDewey Taylor
Human schistosomiasis is a chronic and debilitating neglected tropical disease caused by parasitic worms of the genus Schistosoma. It is endemic in many countries in sub-Saharan Africa. Although there is currently no vaccine available, vaccines are in development. In this paper, we extend a simple compartmental model of schistosomiasis transmission by incorporating the vaccination option. Unlike previous models of schistosomiasis transmission that focus on control and treatment at the population level, our model focuses on incorporating human behavior and voluntary individual vaccination. We identify vaccination rates needed to achieve herd immunity as well as optimal voluntary vaccination rates. We demonstrate that the prevalence remains too high (higher than 1%) unless the vaccination costs are sufficiently low. Thus, we can conclude that voluntary vaccination (with or without mass drug administration) may not be sufficient to eliminate schistosomiasis as a public health concern. The cost of the vaccine (relative to the cost of schistosomiasis infection) is the most important factor determining whether voluntary vaccination can yield elimination of schistosomiasis. When the cost is low, the optimal voluntary vaccination rate is high enough that the prevalence of schistosomiasis declines under 1%. Once the vaccine becomes available for public use, it will be crucial to ensure that the individuals have as cheap an access to the vaccine as possible.
Human schistosomiasis is a chronic and debilitating neglected tropical disease caused by parasitic worms of the genus Schistosoma. It is endemic in many countries in sub-Saharan Africa. Although there is currently no vaccine available, vaccines are in development. In this paper, we extend a simple compartmental model of schistosomiasis transmission by incorporating the vaccination option. Unlike previous models of schistosomiasis transmission that focus on control and treatment at the population level, our model focuses on incorporating human behavior and voluntary individual vaccination. We identify vaccination rates needed to achieve herd immunity as well as optimal voluntary vaccination rates. We demonstrate that the prevalence remains too high (higher than 1%) unless the vaccination costs are sufficiently low. Thus, we can conclude that voluntary vaccination (with or without mass drug administration) may not be sufficient to eliminate schistosomiasis as a public health concern. The cost of the vaccine (relative to the cost of schistosomiasis infection) is the most important factor determining whether voluntary vaccination can yield elimination of schistosomiasis. When the cost is low, the optimal voluntary vaccination rate is high enough that the prevalence of schistosomiasis declines under 1%. Once the vaccine becomes available for public use, it will be crucial to ensure that the individuals have as cheap an access to the vaccine as possible.Prevalence of type 2 diabetes mellitus and impaired fasting glucose, and their associated lifestyle factors among teachers in the CLUSTer cohorthttps://peerj.com/articles/167782024-01-222024-01-22Yit Han NgFoong Ming MoyNoran Naqiah HairiAwang Bulgiba
Background
Teachers are responsible for educating future generations and therefore play an important role in a country’s education system. Teachers constitute about 2.6% of all employees in Malaysia, making it one of the largest workforces in the country. While health and well-being are crucial to ensuring teachers’ work performance, reports on non-communicable diseases such as type 2 diabetes mellitus (T2DM) among Malaysian teachers are scarce. Hence, this study focused on the prevalence of T2DM, undiagnosed diabetes mellitus (DM), impaired fasting glucose (IFG), and underlying lifestyle factors associated with these outcomes among Malaysian teachers.
Methods
This is a cross-sectional study from the CLUSTer cohort. There were 14144 teachers from the Peninsular Malaysia included in this study. The teachers’ sociodemographic and lifestyle characteristics were described using a weighted complex analysis. A matched age group comparison was carried out between teachers and the Malaysian general population on T2DM, undiagnosed DM, and IFG status. Next, the researchers examined the association of lifestyle factors with T2DM and IFG using multivariable logistic regression.
Results
The prevalence of T2DM, undiagnosed DM, and IFG among the Malaysian teachers were 4.1%, 5.1%, and 5.6%, respectively. The proportions of teachers with T2DM (both diagnosed and undiagnosed) and the IFG increased linearly with age. Teachers had a lower weighted prevalence of T2DM (known and undiagnosed) than the general population. However, teachers were more inclined to have IFG than the general population, particularly those aged 45 years and older. Among all lifestyle indicators, only waist circumference (aOR: 1.14, 95% CI: 1.08, 1.20) was found to be associated with T2DM, whereas waist circumference (aOR: 1.10, 95% CI: 1.05, 1.15) and physical activity [moderately active = (aOR: 0.71, 95% CI: 0.52, 0.98); highly active = (aOR: 0.56, 95% CI: 0.40, 0.80)] were associated with IFG.
Conclusions
Modifiable lifestyle factors such as abdominal obesity and physical activity were associated with T2DM and IFG. Intervention programs targeting these factors could help reduce future treatment costs and increase productivity.
Background
Teachers are responsible for educating future generations and therefore play an important role in a country’s education system. Teachers constitute about 2.6% of all employees in Malaysia, making it one of the largest workforces in the country. While health and well-being are crucial to ensuring teachers’ work performance, reports on non-communicable diseases such as type 2 diabetes mellitus (T2DM) among Malaysian teachers are scarce. Hence, this study focused on the prevalence of T2DM, undiagnosed diabetes mellitus (DM), impaired fasting glucose (IFG), and underlying lifestyle factors associated with these outcomes among Malaysian teachers.
Methods
This is a cross-sectional study from the CLUSTer cohort. There were 14144 teachers from the Peninsular Malaysia included in this study. The teachers’ sociodemographic and lifestyle characteristics were described using a weighted complex analysis. A matched age group comparison was carried out between teachers and the Malaysian general population on T2DM, undiagnosed DM, and IFG status. Next, the researchers examined the association of lifestyle factors with T2DM and IFG using multivariable logistic regression.
Results
The prevalence of T2DM, undiagnosed DM, and IFG among the Malaysian teachers were 4.1%, 5.1%, and 5.6%, respectively. The proportions of teachers with T2DM (both diagnosed and undiagnosed) and the IFG increased linearly with age. Teachers had a lower weighted prevalence of T2DM (known and undiagnosed) than the general population. However, teachers were more inclined to have IFG than the general population, particularly those aged 45 years and older. Among all lifestyle indicators, only waist circumference (aOR: 1.14, 95% CI: 1.08, 1.20) was found to be associated with T2DM, whereas waist circumference (aOR: 1.10, 95% CI: 1.05, 1.15) and physical activity [moderately active = (aOR: 0.71, 95% CI: 0.52, 0.98); highly active = (aOR: 0.56, 95% CI: 0.40, 0.80)] were associated with IFG.
Conclusions
Modifiable lifestyle factors such as abdominal obesity and physical activity were associated with T2DM and IFG. Intervention programs targeting these factors could help reduce future treatment costs and increase productivity.A systematic review of conference papers presented at two large Japanese psychology conferences in 2013 and 2018: did Japanese social psychologists selectively report p < 0.05 results without peer review?https://peerj.com/articles/167632024-01-182024-01-18Kai HiraishiAsako MiuraMasataka HiguchiYoshitsugu FujishimaDaiki NakamuraMasaki Suyama
We conducted a systematic review of conference papers in social psychology at two large psychology conferences in Japan: the Japanese Psychological Association and the Japanese Society for Social Psychology. The conference papers were effectively not subjected to peer review; hence, they were suitable for testing if psychologists selectively reported statistically significant findings without pressure from journal editors and reviewers. We investigated the distributions of z-values converted from the p-values reported in the articles presented at the 2013 and 2018 conferences. The z-curve analyses suggest the existence of selective reporting by the authors in 2013. The expected discovery rate (EDR) was much lower than the observed discovery rate (ODR; 7% vs. 76%, respectively), and the 95% confidence interval (CI) did not include the ODR. However, this does not mean that the set of studies completely lacked evidential value. The expected replication rate (ERR) was 31%; this is significantly higher than 5%, which was expected under the null hypothesis of no effect. Changes were observed between 2013 and 2018. The ERR increased (31% to 44%), and the EDR almost doubled (7% to 13%). However, the estimation of the maximum false discovery rate (FDR; 68% in 2013 and 35% in 2018) suggested that a substantial proportion of the reported findings were false positives. Overall, while social psychologists in Japan engaged in selective reporting, this does not mean that the entire field was covered with false positives. In addition, slight signs of improvement were observed in how they reported their findings. Still, the evidential value of the target studies was weak, even in 2018, allowing for no optimism.
We conducted a systematic review of conference papers in social psychology at two large psychology conferences in Japan: the Japanese Psychological Association and the Japanese Society for Social Psychology. The conference papers were effectively not subjected to peer review; hence, they were suitable for testing if psychologists selectively reported statistically significant findings without pressure from journal editors and reviewers. We investigated the distributions of z-values converted from the p-values reported in the articles presented at the 2013 and 2018 conferences. The z-curve analyses suggest the existence of selective reporting by the authors in 2013. The expected discovery rate (EDR) was much lower than the observed discovery rate (ODR; 7% vs. 76%, respectively), and the 95% confidence interval (CI) did not include the ODR. However, this does not mean that the set of studies completely lacked evidential value. The expected replication rate (ERR) was 31%; this is significantly higher than 5%, which was expected under the null hypothesis of no effect. Changes were observed between 2013 and 2018. The ERR increased (31% to 44%), and the EDR almost doubled (7% to 13%). However, the estimation of the maximum false discovery rate (FDR; 68% in 2013 and 35% in 2018) suggested that a substantial proportion of the reported findings were false positives. Overall, while social psychologists in Japan engaged in selective reporting, this does not mean that the entire field was covered with false positives. In addition, slight signs of improvement were observed in how they reported their findings. Still, the evidential value of the target studies was weak, even in 2018, allowing for no optimism.phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things)https://peerj.com/articles/165052024-01-052024-01-05Liam J. Revell
Phylogenetic comparative methods comprise the general endeavor of using an estimated phylogenetic tree (or set of trees) to make secondary inferences: about trait evolution, diversification dynamics, biogeography, community ecology, and a wide range of other phenomena or processes. Over the past ten years or so, the phytools R package has grown to become an important research tool for phylogenetic comparative analysis. phytools is a diverse contributed R library now consisting of hundreds of different functions covering a variety of methods and purposes in phylogenetic biology. As of the time of writing, phytools included functionality for fitting models of trait evolution, for reconstructing ancestral states, for studying diversification on trees, and for visualizing phylogenies, comparative data, and fitted models, as well numerous other tasks related to phylogenetic biology. Here, I describe some significant features of and recent updates to phytools, while also illustrating several popular workflows of the phytools computational software.
Phylogenetic comparative methods comprise the general endeavor of using an estimated phylogenetic tree (or set of trees) to make secondary inferences: about trait evolution, diversification dynamics, biogeography, community ecology, and a wide range of other phenomena or processes. Over the past ten years or so, the phytools R package has grown to become an important research tool for phylogenetic comparative analysis. phytools is a diverse contributed R library now consisting of hundreds of different functions covering a variety of methods and purposes in phylogenetic biology. As of the time of writing, phytools included functionality for fitting models of trait evolution, for reconstructing ancestral states, for studying diversification on trees, and for visualizing phylogenies, comparative data, and fitted models, as well numerous other tasks related to phylogenetic biology. Here, I describe some significant features of and recent updates to phytools, while also illustrating several popular workflows of the phytools computational software.Modeling ocean distributions and abundances of natural- and hatchery-origin Chinook salmon stocks with integrated genetic and tagging datahttps://peerj.com/articles/164872023-11-282023-11-28Alexander J. JensenRyan P. KellyWilliam H. SatterthwaiteEric J. WardPaul MoranAndrew Olaf Shelton
Background
Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data from these monitoring efforts, including tagging studies and genetic sampling, often can be siloed. For Pacific salmon in the Northeast Pacific Ocean, predominant data sources for fish monitoring are coded wire tags (CWTs) and genetic stock identification (GSI). Despite their complementary strengths and weaknesses in coverage and information content, the two data streams rarely have been integrated to inform Pacific salmon biology and management. Joint, or integrated, models can combine and contextualize multiple data sources in a single statistical framework to produce more robust estimates of fish populations.
Methods
We introduce and fit a comprehensive joint model that integrates data from CWT recoveries and GSI sampling to inform the marine life history of Chinook salmon stocks at spatial and temporal scales relevant to ongoing fisheries management efforts. In a departure from similar models based primarily on CWT recoveries, modeled stocks in the new framework encompass both hatchery- and natural-origin fish. We specifically model the spatial distribution and marine abundance of four distinct stocks with spawning locations in California and southern Oregon, one of which is listed under the U.S. Endangered Species Act.
Results
Using the joint model, we generated the most comprehensive estimates of marine distribution to date for all modeled Chinook salmon stocks, including historically data poor and low abundance stocks. Estimated marine distributions from the joint model were broadly similar to estimates from a simpler, CWT-only model but did suggest some differences in distribution in select seasons. Model output also included novel stock-, year-, and season-specific estimates of marine abundance. We observed and partially addressed several challenges in model convergence with the use of supplemental data sources and model constraints; similar difficulties are not unexpected with integrated modeling. We identify several options for improved data collection that could address issues in convergence and increase confidence in model estimates of abundance. We expect these model advances and results provide management-relevant biological insights, with the potential to inform future mixed-stock fisheries management efforts, as well as a foundation for more expansive and comprehensive analyses to follow.
Background
Considerable resources are spent to track fish movement in marine environments, often with the intent of estimating behavior, distribution, and abundance. Resulting data from these monitoring efforts, including tagging studies and genetic sampling, often can be siloed. For Pacific salmon in the Northeast Pacific Ocean, predominant data sources for fish monitoring are coded wire tags (CWTs) and genetic stock identification (GSI). Despite their complementary strengths and weaknesses in coverage and information content, the two data streams rarely have been integrated to inform Pacific salmon biology and management. Joint, or integrated, models can combine and contextualize multiple data sources in a single statistical framework to produce more robust estimates of fish populations.
Methods
We introduce and fit a comprehensive joint model that integrates data from CWT recoveries and GSI sampling to inform the marine life history of Chinook salmon stocks at spatial and temporal scales relevant to ongoing fisheries management efforts. In a departure from similar models based primarily on CWT recoveries, modeled stocks in the new framework encompass both hatchery- and natural-origin fish. We specifically model the spatial distribution and marine abundance of four distinct stocks with spawning locations in California and southern Oregon, one of which is listed under the U.S. Endangered Species Act.
Results
Using the joint model, we generated the most comprehensive estimates of marine distribution to date for all modeled Chinook salmon stocks, including historically data poor and low abundance stocks. Estimated marine distributions from the joint model were broadly similar to estimates from a simpler, CWT-only model but did suggest some differences in distribution in select seasons. Model output also included novel stock-, year-, and season-specific estimates of marine abundance. We observed and partially addressed several challenges in model convergence with the use of supplemental data sources and model constraints; similar difficulties are not unexpected with integrated modeling. We identify several options for improved data collection that could address issues in convergence and increase confidence in model estimates of abundance. We expect these model advances and results provide management-relevant biological insights, with the potential to inform future mixed-stock fisheries management efforts, as well as a foundation for more expansive and comprehensive analyses to follow.Sensitivity analysis of selection bias: a graphical display by bias-correction indexhttps://peerj.com/articles/164112023-11-162023-11-16Ping-Chen ChungI-Feng Lin
Background
In observational studies, how the magnitude of potential selection bias in a sensitivity analysis can be quantified is rarely discussed. The purpose of this study was to develop a sensitivity analysis strategy by using the bias-correction index (BCI) approach for quantifying the influence and direction of selection bias.
Methods
We used a BCI, a function of selection probabilities conditional on outcome and covariates, with different selection bias scenarios in a logistic regression setting. A bias-correction sensitivity plot was illustrated to analyze the associations between proctoscopy examination and sociodemographic variables obtained using the data from the Taiwan National Health Interview Survey (NHIS) and of a subset of individuals who consented to having their health insurance data further linked.
Results
We included 15,247 people aged ≥20 years, and 87.74% of whom signed the informed consent. When the entire sample was considered, smokers were less likely to undergo proctoscopic examination (odds ratio (OR): 0.69, 95% CI [0.57–0.84]), than nonsmokers were. When the data of only the people who provided consent were considered, the OR was 0.76 (95% CI [0.62–0.94]). The bias-correction sensitivity plot indicated varying ORs under different degrees of selection bias.
Conclusions
When data are only available in a subsample of a population, a bias-correction sensitivity plot can be used to easily visualize varying ORs under different selection bias scenarios. The similar strategy can be applied to models other than logistic regression if an appropriate BCI is derived.
Background
In observational studies, how the magnitude of potential selection bias in a sensitivity analysis can be quantified is rarely discussed. The purpose of this study was to develop a sensitivity analysis strategy by using the bias-correction index (BCI) approach for quantifying the influence and direction of selection bias.
Methods
We used a BCI, a function of selection probabilities conditional on outcome and covariates, with different selection bias scenarios in a logistic regression setting. A bias-correction sensitivity plot was illustrated to analyze the associations between proctoscopy examination and sociodemographic variables obtained using the data from the Taiwan National Health Interview Survey (NHIS) and of a subset of individuals who consented to having their health insurance data further linked.
Results
We included 15,247 people aged ≥20 years, and 87.74% of whom signed the informed consent. When the entire sample was considered, smokers were less likely to undergo proctoscopic examination (odds ratio (OR): 0.69, 95% CI [0.57–0.84]), than nonsmokers were. When the data of only the people who provided consent were considered, the OR was 0.76 (95% CI [0.62–0.94]). The bias-correction sensitivity plot indicated varying ORs under different degrees of selection bias.
Conclusions
When data are only available in a subsample of a population, a bias-correction sensitivity plot can be used to easily visualize varying ORs under different selection bias scenarios. The similar strategy can be applied to models other than logistic regression if an appropriate BCI is derived.Confidence intervals for ratio of means of delta-lognormal distributions based on left-censored data with application to rainfall data in Thailandhttps://peerj.com/articles/163972023-11-092023-11-09Warisa ThangjaiSa-Aat Niwitpong
Thailand is a country that is prone to both floods and droughts, and these natural disasters have significant impacts on the country’s people, economy, and environment. Estimating rainfall is an important part of flood and drought prevention. Rainfall data typically contains both zero and positive observations, and the distribution of rainfall often follows the delta-lognormal distribution. However, it is important to note that rainfall data can be censored, meaning that some values may be missing or truncated. The interval estimator for the ratio of means will be useful when comparing the means of two samples. The purpose of this article was to compare the performance of several approaches for statistically analyzing left-censored data. The performance of the confidence intervals was evaluated using the coverage probability and average length, which were assessed through Monte Carlo simulation. The approaches examined included several variations of the generalized confidence interval, the Bayesian, the parametric bootstrap, and the method of variance estimates recovery approaches. For (ξ1, ξ2) = (0.10,0.10), simulations showed that the Bayesian approach would be a suitable choice for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data. For (ξ1, ξ2) = (0.10,0.25), the parametric bootstrap approach was a strong alternative for constructing the confidence interval. However, the generalized confidence interval approach can be considered to construct the confidence when the sample sizes are increase. Practical applications demonstrating the use of these techniques on rainfall data showed that the confidence interval based on the generalized confidence interval approach covered the ratio of population means and had the smallest length. The proposed approaches’ effectiveness was illustrated using daily rainfall datasets from the provinces of Chiang Rai and Chiang Mai in Thailand.
Thailand is a country that is prone to both floods and droughts, and these natural disasters have significant impacts on the country’s people, economy, and environment. Estimating rainfall is an important part of flood and drought prevention. Rainfall data typically contains both zero and positive observations, and the distribution of rainfall often follows the delta-lognormal distribution. However, it is important to note that rainfall data can be censored, meaning that some values may be missing or truncated. The interval estimator for the ratio of means will be useful when comparing the means of two samples. The purpose of this article was to compare the performance of several approaches for statistically analyzing left-censored data. The performance of the confidence intervals was evaluated using the coverage probability and average length, which were assessed through Monte Carlo simulation. The approaches examined included several variations of the generalized confidence interval, the Bayesian, the parametric bootstrap, and the method of variance estimates recovery approaches. For (ξ1, ξ2) = (0.10,0.10), simulations showed that the Bayesian approach would be a suitable choice for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data. For (ξ1, ξ2) = (0.10,0.25), the parametric bootstrap approach was a strong alternative for constructing the confidence interval. However, the generalized confidence interval approach can be considered to construct the confidence when the sample sizes are increase. Practical applications demonstrating the use of these techniques on rainfall data showed that the confidence interval based on the generalized confidence interval approach covered the ratio of population means and had the smallest length. The proposed approaches’ effectiveness was illustrated using daily rainfall datasets from the provinces of Chiang Rai and Chiang Mai in Thailand.