Confidence intervals for ratio of means of delta-lognormal distributions based on left-censored data with application to rainfall data in Thailand
- Published
- Accepted
- Received
- Academic Editor
- Haider Mahmood
- Subject Areas
- Statistics, Natural Resource Management, Ecohydrology, Environmental Impacts
- Keywords
- Bayesian credible interval, Delta-lognormal distribution, Left-censored data, Ratio, Rainfall
- Copyright
- © 2023 Thangjai and Niwitpong
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
- Cite this article
- 2023. Confidence intervals for ratio of means of delta-lognormal distributions based on left-censored data with application to rainfall data in Thailand. PeerJ 11:e16397 https://doi.org/10.7717/peerj.16397
Abstract
Thailand is a country that is prone to both floods and droughts, and these natural disasters have significant impacts on the country’s people, economy, and environment. Estimating rainfall is an important part of flood and drought prevention. Rainfall data typically contains both zero and positive observations, and the distribution of rainfall often follows the delta-lognormal distribution. However, it is important to note that rainfall data can be censored, meaning that some values may be missing or truncated. The interval estimator for the ratio of means will be useful when comparing the means of two samples. The purpose of this article was to compare the performance of several approaches for statistically analyzing left-censored data. The performance of the confidence intervals was evaluated using the coverage probability and average length, which were assessed through Monte Carlo simulation. The approaches examined included several variations of the generalized confidence interval, the Bayesian, the parametric bootstrap, and the method of variance estimates recovery approaches. For (ξ_{1}, ξ_{2}) = (0.10,0.10), simulations showed that the Bayesian approach would be a suitable choice for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data. For (ξ_{1}, ξ_{2}) = (0.10,0.25), the parametric bootstrap approach was a strong alternative for constructing the confidence interval. However, the generalized confidence interval approach can be considered to construct the confidence when the sample sizes are increase. Practical applications demonstrating the use of these techniques on rainfall data showed that the confidence interval based on the generalized confidence interval approach covered the ratio of population means and had the smallest length. The proposed approaches’ effectiveness was illustrated using daily rainfall datasets from the provinces of Chiang Rai and Chiang Mai in Thailand.
Introduction
Floods occur in Thailand primarily during the monsoon season, which typically lasts from May to October. During this time, heavy rainfall can cause rivers to overflow and inundate low-lying areas, leading to widespread flooding. In some cases, flash floods can also occur, particularly in urban areas with poor drainage systems. The impacts of floods in Thailand can include damage to crops and infrastructure, loss of property and livelihoods, and loss of life. On the other hand, droughts are more common during the dry season, which typically lasts from November to April. During this time, rainfall is limited, and water resources such as rivers, reservoirs, and groundwater become depleted. Droughts can have significant impacts on agriculture, which is a major sector of Thailand’s economy. They can also lead to water shortages for households and industries, as well as damage to the environment. Rainfall estimation is an essential tool for addressing these challenges and preventing floods and droughts in Thailand. By accurately estimating rainfall patterns, authorities can better prepare for and mitigate the impacts of these natural disasters, helping to protect the country’s people, economy, and environment. The analysis and modeling of rainfall patterns play a crucial role in various fields, such as hydrology, agriculture, and climate science. Thailand experiences a wide range of rainfall patterns due to its geographic location, varying topography, and monsoonal climate. The characteristics of rainfall data fit various distributions, such as the normal distribution, log-normal distribution, gamma distribution, Weibull distribution, inverse Gaussian distribution, and delta-lognormal distribution. Several researchers have indicated that the delta-lognormal distribution is a suitable model for rainfall data in Thailand; see Krstanovic & Singh (1992), Singh & Rajagopal (1986) and Singh & Singh (1987).
The delta-lognormal distribution is a statistical distribution commonly used to model continuous positive data, such as rainfall, crop yields, and stock prices. It is a two-part distribution that contains both zero and positive values. The number of zero values follows a binomial distribution, while the positive values follow a log-normal distribution. The log-normal distribution is formed by taking the exponential of a normal distribution. The delta-lognormal distribution is often used in hydrology and other fields to model variables that exhibit high variability and are bounded by zero. It has been found to be an effective model for a wide range of applications, particularly for modeling rainfall data. Several researchers have studied confidence intervals for functions of the delta-lognormal distribution. Thangjai et al. (2023) proposed the confidence interval estimation for the ratio of the percentiles of two delta-lognormal distributions. Additionally, given that rainfall data is often censored, it is crucial to estimate the functions of the delta-lognormal distribution based on left-censored data for obtaining accurate statistics. Thangjai & Niwitpong (2023) estimated the confidence intervals for mean and difference between means of delta-lognormal distributions based on left-censored data with application to rainfall data.
The difference of parameters refers to the subtraction or comparison of two parameters or coefficients in a statistical model. This is typically done to quantify the difference in the impact or effect of two variables on an outcome. In rainfall data analysis, the difference of parameters is used to assess the difference in the effects of two variables. For instance, the difference between the coefficients of rainfall associated with different geographical regions is computed to determine if one region experiences significantly more or less rainfall than the other. The ratio of parameters refers to the division or comparison of two parameters or coefficients in a statistical model. Parameters can represent various aspects of a model, such as the coefficients of variables in a regression model, the proportions in a probability distribution, or the odds ratios in logistic regression. In the context of rainfall data analysis, the ratio of parameters is used to compare the effects of different variables on rainfall. For example, the ratio of the coefficients of temperature and humidity in a regression model is used to determine how each of these factors contributes to changes in rainfall. Moreover, the difference of means is likely to be minor when both means are small, and such a minor difference can lead to an inability to draw powerful or definite conclusions. Therefore, the ratio of means is often considered more accurate than the difference of means. The ratio of means is the ratio of two means and is used in many fields. For instance, in bioequivalence, the ratio of means is used to compare the mean of the test drug and the mean of the reference drug. In epidemiology, the ratio of means is used to compare the particulate matter with a diameter of less than 2.5 µm (PM2.5) level averages of two areas. In climate sciences, the ratio of means is used to compare the daily rainfall averages of two areas. Confidence intervals for the ratio of means have been constructed in many research studies, such as those by Chen & Zhou (2006a).
In statistics, the information in a sample is used to make inferences about an unknown parameter. The inference methods are hypothesis testing and estimation (Casella & Berger, 2002). Estimations have a point estimation and an interval estimation. Estimation is of interest in many fields. For example, in the environment, Luo, Shen & Xu (2022) studied the modeling and estimation of system reliability under dynamic operating environments and lifetime ordering constraint. They used the maximum likelihood method for point estimation while proposing generalized inference methods for interval estimation. In industry, Zhang et al. (2022) studied the problem of reliability estimation for a parallel system when one stress variable is involved, referred to as the multicomponent stress–strength model.
The construction of confidence intervals is a crucial aspect of statistical inference, and many researchers have proposed various approaches for constructing such intervals. The generalized confidence interval (GCI) approach uses the concepts of the generalized pivotal quantity (GPQ) to construct the confidence interval. Chen & Zhou (2006a) presented the GCI estimate for the ratio and the difference between the means of log-normal distributions. It gave a highly accurate coverage rate and fairly low bias, especially for small sample sizes. Tian & Wu (2007) proposed the GCI approach for inferences on the common mean of log-normal distributions. Ye, Ma & Wang (2010) proposed inferences on the common mean of several inverse Gaussian populations using the GCI approach.
The bootstrap approach relies heavily on computer simulations. Traditionally, standard errors have been calculated using well-known formulae, often based on assumptions that are not satisfied or only approximately satisfied. In some cases, it may not even be known if the assumptions hold or not. In essence, the bootstrap approach relies on resampling with replacement from the given sample and calculating the required statistic from these repeated samples. The values of the statistic from the repeated sampling can then be used to generate standard errors and confidence intervals for the statistic (Dunn, 2001). Thangjai et al. (2023) constructed the confidence interval for the ratio of the percentiles of two delta-lognormal distributions based on the parametric bootstrap approach. Moreover, Altunkaynak & Gamgam (2019) proposed the bootstrap confidence intervals for the coefficient of quartile variation.
The method of variance estimates recovery (MOVER) approach utilizes the initial confidence interval of a single parameter of interest to construct the final confidence interval. Zou & Donner (2008) constructed the confidence limits about effect measures using the MOVER approach. Zou, Taleban & Hao (2009) proposed the MOVER approach to estimate the confidence interval for log-normal distribution.
Statistics can be divided into two different techniques: the classical approach and the Bayesian approach. The classical approach includes techniques such as the GCI, parametric bootstrap, and MOVER. In this approach, the parameter of interest is unknown but fixed. In contrast, the Bayesian approach considers the parameter of interest as a quantity, and its variation is described by the prior distribution. There are many reasons why a researcher may prefer to use Bayesian estimation over classical estimation. The main reason for choosing the Bayesian approach is that the models are often too complex for traditional methods to handle. It is important to note that, regardless of the reasons for implementing the Bayesian approach, conducting a sensitivity analysis of priors is always crucial and should be included (Depaoli, Winter & Visser, 2020). The impact of the priors is highly dependent on model complexity, and it is crucial to thoroughly examine their influence on the final model estimates. The Bayesian approach can effectively capture the features of posteriors of the parameters of interest by combining information from data and priors. However, obtaining closed forms for each marginal posterior in Bayesian analysis is a challenging task. Markov chain Monte Carlo (MCMC) can be employed to obtain posterior samples from a set of Markov chains with respect to the parameters of interest and nuisance parameters. The iteration is terminated when all chains are stable and well-mixed. MCMC has been well-developed and widely utilized for complex models, allowing parameter estimation through posterior samples generated from a collection of Markov chains (Zhou et al., 2023). Thangjai et al. (2023) proposed the credible interval estimation for the ratio of the percentiles of two delta-lognormal distributions using the Bayesian approach. Furthermore, Thangjai & Niwitpong (2023) constructed the Bayesian credible interval for mean and difference between means of delta-lognormal distributions based on left-censored data. Moreover, the Bayesian approach is also widely used for uncertainty quantification (Zhuang, Xu & Wang, 2023). Aizpurua et al. (2022) studied how uncertainty quantification was incorporated into machine health prognostics through the Bayesian approach.
The rest of this article is organized as follows. ‘Materials & Methods’ describes the four approaches used to construct confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data. ‘Results’ presents the performance of the proposed approaches using simulation studies. ‘Empirical Application’ shows a real data example. ‘Discussion’ provides a discussion. Finally, ‘Conclusions’ presents concluding remarks.
Materials & Methods
Suppose ${Z}_{1}=\left({Z}_{11},{Z}_{12},\dots ,{Z}_{1{n}_{1}}\right)$ is random sample from delta-lognormal distribution with parameters mean μ_{1}, variance ${\sigma}_{1}^{2}$, and probability of obtaining a zero observation δ_{1}. Similary, let Z_{2} = (Z_{21}, Z_{22}, …, Z_{2n2}) be random sample following the delta-lognormal distribution with parameters mean μ_{2}, variance ${\sigma}_{2}^{2}$, and probability of obtaining a zero observation δ_{2}. The delta-lognormal distribution is combination of zero and positive values. This distribution contains a binomial distribution and a log-normal distribution. This is because the zero values follow the binomial distribution and the positive values follow the log-normal distribution. The mean of delta-lognormal distribution for Z_{1} and Z_{2} are given by (1)${\gamma}_{1}=\left(1-{\delta}_{1}\right)exp\left({\mu}_{1}+\frac{1}{2}{\sigma}_{1}^{2}\right)$ and (2)${\gamma}_{2}=\left(1-{\delta}_{2}\right)exp\left({\mu}_{2}+\frac{1}{2}{\sigma}_{2}^{2}\right).$
Suppose X_{1} and X_{2} are nonnegative random variables drawn from Z_{1} and Z_{2}, respectively. In other words, X_{1} = (X_{11}, X_{12}, …, X_{1n1}) and X_{2} = (X_{21}, X_{22}, …, X_{2n2}) follow the log-normal distributions so that Y_{1} = log(X_{1}) and Y_{2} = log(X_{2}) follow the normal distributions. Suppose that ${\stackrel{\u0304}{X}}_{1}$ and ${\stackrel{\u0304}{X}}_{2}$ are the means of X_{1} and X_{2}, respectively. Moreover, suppose that ${S}_{{X}_{1}}^{2}$ and ${S}_{{X}_{2}}^{2}$ are the variances of X_{1} and X_{2}, respectively. Let log(ξ_{1}) be censoring point value. Let n_{1(1)} be the number of observations less than or equal to some censoring point log(ξ_{1}) and let n_{1(2)} be the number of observations greater than some censoring point log(ξ_{1}). Let ${Y}_{1}=\left({Y}_{11},{Y}_{12},\dots ,{Y}_{1{n}_{1\left(2\right)}}\right)$ be the observations above log(ξ_{1}). The mean and variance of Y_{1} are given by (3)$\stackrel{\u0304}{Y}}_{1}=\frac{1}{{n}_{1\left(2\right)}}\sum _{j=1}^{{n}_{1\left(2\right)}}{Y}_{1j$ and (4)${S}_{1}^{2}=\frac{1}{{n}_{1\left(2\right)}}\sum _{j=1}^{{n}_{1\left(2\right)}}{\left({Y}_{1j}-{\stackrel{\u0304}{Y}}_{1}\right)}^{2}.$
Suppose that ϕ and Φ are the density function and the distribution function of the standard normal distribution. According to Krishnamoorthy, Mallick & Mathew (2011), the maximum likelihood estimators of μ_{1} and ${\sigma}_{1}^{2}$ are given by (5)${\stackrel{\u02c6}{\mu}}_{1}={\stackrel{\u0304}{Y}}_{1}-\psi \left({h}_{1},{a}_{1}\right)\left({\stackrel{\u0304}{Y}}_{1}-log\left({\xi}_{1}\right)\right)$ and (6)${\stackrel{\u02c6}{\sigma}}_{1}^{2}={S}_{1}^{2}+\psi \left({h}_{1},{a}_{1}\right){\left({\stackrel{\u0304}{Y}}_{1}-log\left({\xi}_{1}\right)\right)}^{2},$ where $h}_{1}=\frac{{n}_{1\left(1\right)}}{{n}_{1}$ $a}_{1}=\frac{log\left({\xi}_{1}\right)-{\mu}_{1}}{{\sigma}_{1}$ $W\left({a}_{1}\right)=\frac{\varphi \left({a}_{1}\right)}{1-\Phi \left({a}_{1}\right)}$ $V\left({h}_{1},{a}_{1}\right)=\frac{{h}_{1}W\left(-{a}_{1}\right)}{1-{h}_{1}}$ and $\psi \left({h}_{1},{a}_{1}\right)=\frac{V\left({h}_{1},{a}_{1}\right)}{V\left({h}_{1},{a}_{1}\right)-{a}_{1}}.$
The mean of censored log-normal distribution is (7)${\theta}_{1}=exp\left({\mu}_{1}+\frac{1}{2}{\sigma}_{1}^{2}\right).$
The estimator of the mean of censored log-normal distribution is (8)${\stackrel{\u02c6}{\theta}}_{1}=exp\left({\stackrel{\u02c6}{\mu}}_{1}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{1}^{2}\right).$
Similarly, let log(ξ_{2}) be censoring point value. Let n_{2(1)} be the number of observations less than or equal to log(ξ_{2}) and let n_{2(2)} be the number of observations greater than some censoring point log(ξ_{2}). Let ${Y}_{2}=\left({Y}_{21},{Y}_{22},\dots ,{Y}_{2{n}_{2\left(2\right)}}\right)$ be the observations above log(ξ_{2}). The mean and variance of Y_{2} are given by (9)$\stackrel{\u0304}{Y}}_{2}=\frac{1}{{n}_{2\left(2\right)}}\sum _{j=1}^{{n}_{2\left(2\right)}}{Y}_{2j$ and (10)${S}_{2}^{2}=\frac{1}{{n}_{2\left(2\right)}}\sum _{j=1}^{{n}_{2\left(2\right)}}{\left({Y}_{2j}-{\stackrel{\u0304}{Y}}_{2}\right)}^{2}.$
The maximum likelihood estimators of μ_{2} and ${\sigma}_{2}^{2}$ are given by (11)${\stackrel{\u02c6}{\mu}}_{2}={\stackrel{\u0304}{Y}}_{2}-\psi \left({h}_{2},{a}_{2}\right)\left({\stackrel{\u0304}{Y}}_{2}-log\left({\xi}_{2}\right)\right)$ and (12)${\stackrel{\u02c6}{\sigma}}_{2}^{2}={S}_{2}^{2}+\psi \left({h}_{2},{a}_{2}\right){\left({\stackrel{\u0304}{Y}}_{2}-log\left({\xi}_{2}\right)\right)}^{2},$ where $h}_{2}=\frac{{n}_{2\left(1\right)}}{{n}_{2}$ $a}_{2}=\frac{log\left({\xi}_{2}\right)-{\mu}_{2}}{{\sigma}_{2}$ $W\left({a}_{2}\right)=\frac{\varphi \left({a}_{2}\right)}{1-\Phi \left({a}_{2}\right)}$ $V\left({h}_{2},{a}_{2}\right)=\frac{{h}_{2}W\left(-{a}_{2}\right)}{1-{h}_{2}}$ and $\psi \left({h}_{2},{a}_{2}\right)=\frac{V\left({h}_{2},{a}_{2}\right)}{V\left({h}_{2},{a}_{2}\right)-{a}_{2}}.$
The mean of censored log-normal distribution is (13)${\theta}_{2}=exp\left({\mu}_{2}+\frac{1}{2}{\sigma}_{2}^{2}\right).$
The estimator of the mean of censored log-normal distribution is (14)${\stackrel{\u02c6}{\theta}}_{2}=exp\left({\stackrel{\u02c6}{\mu}}_{2}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{2}^{2}\right).$
Therefore, the estimator of the ratio of means of censored log-normal distributions is given by (15)$\stackrel{\u02c6}{\theta}=\frac{{\stackrel{\u02c6}{\theta}}_{1}}{{\stackrel{\u02c6}{\theta}}_{2}},$
where ${\stackrel{\u02c6}{\theta}}_{1}$ and ${\stackrel{\u02c6}{\theta}}_{2}$ are defined in Eqs. (8) and (14), respectively.
The estimator of the ratio of means of censored log-normal distributions, defined in Eq. (15), is used to construct the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data. Here, four newly proposed approaches are applied to construct the confidence intervals. Next, the computation of the GCI, Bayesian, parametric bootstrap, and MOVER approaches is explained.
Generalized confidence interval approach
The generalized pivotal quantity (GPQ) is used to construct the GCI which is defined in Weerahandi (1993). According to Krishnamoorthy, Mallick & Mathew (2011), the GPQs for μ_{1}, σ_{1}, and θ_{1} are defined by (16)$R}_{{\mu}_{1}}={\stackrel{\u02c6}{\mu}}_{1}-\frac{{\stackrel{\u02c6}{\mu}}_{1}^{\ast}}{{\stackrel{\u02c6}{\sigma}}_{1}^{\ast}}{\stackrel{\u02c6}{\sigma}}_{1$ (17)$R}_{{\sigma}_{1}}=\frac{{\stackrel{\u02c6}{\sigma}}_{1}}{{\stackrel{\u02c6}{\sigma}}_{1}^{\ast}$ and (18)${R}_{{\theta}_{1}}=exp\left({R}_{{\mu}_{1}}+\frac{1}{2}{\left({R}_{{\sigma}_{1}}\right)}^{2}\right),$ where ${\stackrel{\u02c6}{\mu}}_{1}^{\ast}$ and ${\stackrel{\u02c6}{\sigma}}_{1}^{\ast}$ are the maximum likelihood estimators based on a censored sample from standard normal distribution.
Similarly, the GPQs for μ_{2}, σ_{2}, and θ_{2} are defined by (19)$R}_{{\mu}_{2}}={\stackrel{\u02c6}{\mu}}_{2}-\frac{{\stackrel{\u02c6}{\mu}}_{2}^{\ast}}{{\stackrel{\u02c6}{\sigma}}_{2}^{\ast}}{\stackrel{\u02c6}{\sigma}}_{2$ (20)$R}_{{\sigma}_{2}}=\frac{{\stackrel{\u02c6}{\sigma}}_{2}}{{\stackrel{\u02c6}{\sigma}}_{2}^{\ast}$ and (21)${R}_{{\theta}_{2}}=exp\left({R}_{{\mu}_{2}}+\frac{1}{2}{\left({R}_{{\sigma}_{2}}\right)}^{2}\right),$ where ${\stackrel{\u02c6}{\mu}}_{2}^{\ast}$ and ${\stackrel{\u02c6}{\sigma}}_{2}^{\ast}$ are the maximum likelihood estimators based on a censored sample from standard normal distribution.
The GPQ for the difference between means of delta-lognormal distributions based on left-censored data was used as previously described in Thangjai & Niwitpong (2023). In this article, the GPQ for the ratio of means of delta-lognormal distributions based on left-censored data is given by (22)${R}_{\theta}=\frac{{R}_{{\theta}_{1}}}{{R}_{{\theta}_{2}}},$ where R_{θ1} and R_{θ2} are defined in Eqs. (18) and (21), respectively.
Therefore, the 100(1 − α)% two-sided confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data using the GCI approach is given by (23)$C{I}_{GCI}=\left[{L}_{GCI},{U}_{GCI}\right]=\left[{R}_{\theta}\left(\alpha /2\right),{R}_{\theta}\left(1-\alpha /2\right)\right],$ where R_{θ}(α/2) and R_{θ}(1 − α/2) denote the 100(α/2)-th and 100(1 − α/2)-th percentiles of R_{θ}, respectively.
The Algorithm 1 is used to construct the GCI for the ratio of means of delta-lognormal distributions based on left-censored data.
Algorithm 1.
Step 1: Generate sample from the standard normal distribution and compute ${\stackrel{\u02c6}{\mu}}_{1}^{\ast}$, ${\stackrel{\u02c6}{\mu}}_{2}^{\ast}$, ${\stackrel{\u02c6}{\sigma}}_{1}^{\ast}$, and ${\stackrel{\u02c6}{\sigma}}_{2}^{\ast}$
Step 2: Compute R_{μ1}, R_{σ1}, and R_{θ1} from Eqs. (16), (17), and (18), respectively.
Step 3: Compute R_{μ2}, R_{σ2}, and R_{θ2} from Eqs. (19), (20), and (21), respectively.
Step 4: Compute R_{θ} from Eq. (22)
Step 5: Repeat step 1 - step 4, a total times and obtain an array of R_{θ}’s
Step 6: Compute L_{GCI} and U_{GCI}
Bayesian approach
The Bayesian approach offers a framework for updating beliefs and making predictions using new evidence or data. It is rooted in Bayes’ theorem, which combines prior probability and likelihood to calculate the posterior probability. The prior distribution represents uncertainty about parameters before observing data. In this article, we employed the Jeffreys Independence prior. According to Thangjai & Niwitpong (2023), the posterior distributions of ${\sigma}_{1}^{2}$, μ_{1}, and θ_{1.BS} are defined by (24)${\sigma}_{1}^{2}|{y}_{1}\sim IG\left(\frac{{n}_{1\left(2\right)}-1}{2},\frac{\left({n}_{1\left(2\right)}-1\right){s}_{1}^{2}}{2}\right)$ (25)${\mu}_{1}|{\sigma}_{1}^{2},{y}_{1}\sim N\left({\stackrel{\u0304}{y}}_{1},\frac{{\sigma}_{1}^{2}}{{n}_{1\left(2\right)}}\right)$
and (26)${\theta}_{1.BS}=exp\left({\mu}_{1}+\frac{1}{2}{\sigma}_{1}^{2}\right),$ where ${\stackrel{\u0304}{y}}_{1}$ is observed value of ${\stackrel{\u0304}{Y}}_{1}$ defined in Eq. (3) and ${s}_{1}^{2}$ is the observed value of ${S}_{1}^{2}$ defined in Eq. (4).
The posterior distributions of ${\sigma}_{2}^{2}$, μ_{2}, and θ_{2.BS} are defined by (27)${\sigma}_{2}^{2}|{y}_{2}\sim IG\left(\frac{{n}_{2\left(2\right)}-1}{2},\frac{\left({n}_{2\left(2\right)}-1\right){s}_{2}^{2}}{2}\right)$ (28)${\mu}_{2}|{\sigma}_{2}^{2},{y}_{2}\sim N\left({\stackrel{\u0304}{y}}_{2},\frac{{\sigma}_{2}^{2}}{{n}_{2\left(2\right)}}\right)$
and (29)${\theta}_{2.BS}=exp\left({\mu}_{2}+\frac{1}{2}{\sigma}_{2}^{2}\right),$ where ${\stackrel{\u0304}{y}}_{2}$ is observed value of ${\stackrel{\u0304}{Y}}_{2}$ defined in Eq. (9) and ${s}_{2}^{2}$ is the observed value of ${S}_{2}^{2}$ defined in Eq. (10).
Thangjai & Niwitpong (2023) proposed the posterior distribution of the difference between means of delta-lognormal distributions based on left-censored data. Therefore, the posterior distribution of the ratio of means of delta-lognormal distributions based on left-censored data is given by (30)${\theta}_{BS}=\frac{{\theta}_{1.BS}}{{\theta}_{2.BS}},$ where θ_{1.BS} and θ_{2.BS} are defined in Eqs. (26) and (29), respectively.
Therefore, the 100(1 − α)% two-sided credible interval for the ratio of means of delta-lognormal distributions based on left-censored data using the Bayesian approach is given by (31)$C{I}_{\theta .BS}=\left[{L}_{\theta .BS},{U}_{\theta .BS}\right],$ where L_{θ.BS} and U_{θ.BS} denote the lower and upper limits of the shortest 100(1 − α)% highest posterior density interval of θ_{BS}, respectively.
The Algorithm 2 is used to construct the Bayesian credible interval for the ratio of means of delta-lognormal distributions based on left-censored data.
Algorithm 2.
Step 1: Compute ${\sigma}_{1}^{2}|{y}_{1}$, ${\mu}_{1}|{\sigma}_{1}^{2},{y}_{1}$, and θ_{1.BS} from Eqs. (24), (25), and (26), respectively.
Step 2: Compute ${\sigma}_{2}^{2}|{y}_{2}$, ${\mu}_{2}|{\sigma}_{2}^{2},{y}_{2}$, and θ_{2.BS} from Eqs. (27), (28), and (29), respectively.
Step 3: Compute θ_{BS} from Eq. (30)
Step 4: Repeat step 1 - step 3, a total m times and obtain an array of θ_{BS}’s
Step 5: Compute L_{θ.BS} and U_{θ.BS}
Parametric bootstrap approach
Let ${Y}_{1}^{\ast}=\left({Y}_{11}^{\ast},{Y}_{12}^{\ast},\dots ,{Y}_{1{n}_{1\left(2\right)}}^{\ast}\right)$ be the observations above log(ξ_{1}). The mean and variance of ${Y}_{1}^{\ast}$ are given by (32)$\stackrel{\u0304}{Y}}_{1}^{\ast}=\frac{1}{{n}_{1\left(2\right)}}\sum _{j=1}^{{n}_{1\left(2\right)}}{Y}_{1j}^{\ast$ and (33)${S}_{1}^{2\ast}=\frac{1}{{n}_{1\left(2\right)}}\sum _{j=1}^{{n}_{1\left(2\right)}}{\left({Y}_{1j}^{\ast}-{\stackrel{\u0304}{Y}}_{1}^{\ast}\right)}^{2}.$
The estimator of the mean of delta-lognormal distribution based on left-censored data is (34)${\stackrel{\u02c6}{\theta}}_{1}^{\ast}=exp\left({\stackrel{\u0304}{Y}}_{1}^{\ast}+\frac{1}{2}{S}_{1}^{2\ast}\right),$ where ${\stackrel{\u0304}{Y}}_{1}^{\ast}$ and ${S}_{1}^{2\ast}$ are defined in Eqs. (32) and (33), respectively.
Let ${Y}_{2}^{\ast}=\left({Y}_{21}^{\ast},{Y}_{22}^{\ast},\dots ,{Y}_{2{n}_{2\left(2\right)}}^{\ast}\right)$ be the observations above log(ξ_{2}). The mean and variance of ${Y}_{2}^{\ast}$ are given by (35)$\stackrel{\u0304}{Y}}_{2}^{\ast}=\frac{1}{{n}_{2\left(2\right)}}\sum _{j=1}^{{n}_{2\left(2\right)}}{Y}_{2j}^{\ast$ and (36)${S}_{2}^{2\ast}=\frac{1}{{n}_{2\left(2\right)}}\sum _{j=1}^{{n}_{2\left(2\right)}}{\left({Y}_{2j}^{\ast}-{\stackrel{\u0304}{Y}}_{2}^{\ast}\right)}^{2}.$
The estimator of the mean of delta-lognormal distribution based on left-censored data is (37)${\stackrel{\u02c6}{\theta}}_{2}^{\ast}=exp\left({\stackrel{\u0304}{Y}}_{2}^{\ast}+\frac{1}{2}{S}_{2}^{2\ast}\right),$ where ${\stackrel{\u0304}{Y}}_{2}^{\ast}$ and ${S}_{2}^{2\ast}$ are defined in Eqs. (35) and (36), respectively.
The estimator of the difference between means of delta-lognormal distributions based on left-censored data was proposed as previously described in Thangjai & Niwitpong (2023). According to Thangjai & Niwitpong (2023), the estimator of the ratio of means of delta-lognormal distributions based on left-censored data is given by (38)${\stackrel{\u02c6}{\theta}}^{\ast}=\frac{{\stackrel{\u02c6}{\theta}}_{1}^{\ast}}{{\stackrel{\u02c6}{\theta}}_{2}^{\ast}},$ where ${\stackrel{\u02c6}{\theta}}_{1}^{\ast}$ and ${\stackrel{\u02c6}{\theta}}_{2}^{\ast}$ are defined in Eqs. (34) and (37), respectively.
The lower and upper limits of the confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data are given by (39)${L}_{PB}=\overline{{\stackrel{\u02c6}{\theta}}^{\ast}}-{z}_{1-\alpha /2}sd\left({\stackrel{\u02c6}{\theta}}^{\ast}\right)$ and (40)${U}_{PB}=\overline{{\stackrel{\u02c6}{\theta}}^{\ast}}+{z}_{1-\alpha /2}sd\left({\stackrel{\u02c6}{\theta}}^{\ast}\right),$ where $\overline{{\stackrel{\u02c6}{\theta}}^{\ast}}$ is the mean of ${\stackrel{\u02c6}{\theta}}^{\ast}$, $sd\left({\stackrel{\u02c6}{\theta}}^{\ast}\right)$ is the standard deviation of ${\stackrel{\u02c6}{\theta}}^{\ast}$, and z_{1−α/2} is the 100(1 − α/2)-th percentile of the standard normal distribution.
Therefore, the 100(1 − α)% two-sided confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data using the parametric bootstrap approach is given by (41)$C{I}_{PB}=\left[{L}_{PB},{U}_{PB}\right],$ where L_{PB} and U_{PB} are defined in Eqs. (39) and (40), respectively.
The Algorithm 3 is used to construct the parametric bootstrap confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data.
Algorithm 3.
Step 1: Parametric bootstrapping assumes that the data comes from a known distribution with unknown parameters. We estimate these parameters from the data and then use the estimated distributions to simulate the samples. Generate ${y}_{1}^{\ast}=\left({y}_{11}^{\ast},{y}_{12}^{\ast},\dots ,{y}_{1{n}_{1\left(2\right)}}^{\ast}\right)$ from normal distribution with ${\stackrel{\u02c6}{\mu}}_{1}$ and ${\stackrel{\u02c6}{\sigma}}_{1}^{2}$ and generate ${y}_{2}^{\ast}=\left({y}_{21}^{\ast},{y}_{22}^{\ast},\dots ,{y}_{2{n}_{2\left(2\right)}}^{\ast}\right)$ from normal distribution with ${\stackrel{\u02c6}{\mu}}_{2}$ and ${\stackrel{\u02c6}{\sigma}}_{2}^{2}$
Step 2: Compute ${\stackrel{\u0304}{y}}_{1}^{\ast}$ from Eq. (32), ${s}_{1}^{2\ast}$ from Eq. (33), and ${\stackrel{\u02c6}{\theta}}_{1}^{\ast}$ from Eq. (34)
Step 3: Compute ${\stackrel{\u0304}{y}}_{2}^{\ast}$ from Eq. (35), ${s}_{2}^{2\ast}$ from Eq. (36), and ${\stackrel{\u02c6}{\theta}}_{2}^{\ast}$ from Eq. (37)
Step 4: Compute ${\stackrel{\u02c6}{\theta}}^{\ast}$ from Eq. (38)
Step 5: Repeat step 1 - step 4, a total m times and obtain an array of ${\stackrel{\u02c6}{\theta}}^{\ast}$’s
Method of variance estimates recovery approach
According to Maneerat, Niwitpong & Niwitpong (2020), the lower and upper limits of the confidence interval for μ_{1} are defined by (42)$l}_{{\mu}_{1}}={\stackrel{\u02c6}{\mu}}_{1}-{z}_{1-\alpha /2}\sqrt{\frac{\left({n}_{1\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{1}^{2}}{{n}_{1\left(2\right)}{\chi}_{{n}_{1\left(2\right)}-1}^{2}}$ and (43)${u}_{{\mu}_{1}}={\stackrel{\u02c6}{\mu}}_{1}+{z}_{1-\alpha /2}\sqrt{\frac{\left({n}_{1\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{1}^{2}}{{n}_{1\left(2\right)}{\chi}_{{n}_{1\left(2\right)}-1}^{2}}},$ where z_{1−α/2} is the 100(1 − α/2)-th percentile of the standard normal distribution, ${\chi}_{{n}_{1\left(2\right)}-1}^{2}$ is the chi-squared distribution with n_{1(2)} − 1 degrees of freedom, and ${\stackrel{\u02c6}{\mu}}_{1}$ and ${\stackrel{\u02c6}{\sigma}}_{1}^{2}$ are defined in Eqs. (5) and (6), respectively.
Similarly, the lower and upper limits of the confidence interval for μ_{2} are given by (44)$l}_{{\mu}_{2}}={\stackrel{\u02c6}{\mu}}_{2}-{z}_{1-\alpha /2}\sqrt{\frac{\left({n}_{2\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{2}^{2}}{{n}_{2\left(2\right)}{\chi}_{{n}_{2\left(2\right)}-1}^{2}}$ and (45)${u}_{{\mu}_{2}}={\stackrel{\u02c6}{\mu}}_{2}+{z}_{1-\alpha /2}\sqrt{\frac{\left({n}_{2\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{2}^{2}}{{n}_{2\left(2\right)}{\chi}_{{n}_{2\left(2\right)}-1}^{2}}},$ where z_{1−α/2} is the 100(1 − α/2)-th percentile of the standard normal distribution, ${\chi}_{{n}_{2\left(2\right)}-1}^{2}$ is the chi-squared distribution with n_{2(2)} − 1 degrees of freedom, and ${\stackrel{\u02c6}{\mu}}_{2}$ and ${\stackrel{\u02c6}{\sigma}}_{2}^{2}$ are defined in Eqs. (11) and (12), respectively.
Following Maneerat, Niwitpong & Niwitpong (2020), the lower and upper limits of the confidence interval for ${\sigma}_{1}^{2}$ are defined by (46)$l}_{{\sigma}_{1}^{2}}=\frac{\left({n}_{1\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{1}^{2}}{{\chi}_{1-\alpha /2,{n}_{1\left(2\right)}-1}^{2}$ and (47)${u}_{{\sigma}_{1}^{2}}=\frac{\left({n}_{1\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{1}^{2}}{{\chi}_{\alpha /2,{n}_{1\left(2\right)}-1}^{2}},$ where ${\chi}_{1-\alpha /2,{n}_{1\left(2\right)}-1}^{2}$ and ${\chi}_{\alpha /2,{n}_{1\left(2\right)}-1}^{2}$ are the 100(1 − α/2)-th and 100(α/2)-th percentiles of the chi-squared distribution with n_{1(2)} − 1 degrees of freedom, and ${\stackrel{\u02c6}{\sigma}}_{1}^{2}$ is defined in Eq. (6).
The lower and upper limits of the confidence interval for ${\sigma}_{2}^{2}$ are defined by (48)$l}_{{\sigma}_{2}^{2}}=\frac{\left({n}_{2\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{2}^{2}}{{\chi}_{1-\alpha /2,{n}_{2\left(2\right)}-1}^{2}$ and (49)${u}_{{\sigma}_{2}^{2}}=\frac{\left({n}_{2\left(2\right)}-1\right){\stackrel{\u02c6}{\sigma}}_{2}^{2}}{{\chi}_{\alpha /2,{n}_{2\left(2\right)}-1}^{2}},$ where ${\chi}_{1-\alpha /2,{n}_{2\left(2\right)}-1}^{2}$ and ${\chi}_{\alpha /2,{n}_{2\left(2\right)}-1}^{2}$ are the 100(1 − α/2)-th and 100(α/2)-th percentiles of the chi-squared distribution with n_{2(2)} − 1 degrees of freedom, and ${\stackrel{\u02c6}{\sigma}}_{2}^{2}$ is defined in Eq (12).
Applying the concept of Donner & Zou (1993), the lower and upper limits of confidence interval for ${\theta}_{1}=exp\left({\mu}_{1}+\frac{1}{2}{\sigma}_{1}^{2}\right)$ are given by (50)${l}_{{\theta}_{1}}=exp\left[\left({\stackrel{\u02c6}{\mu}}_{1}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{1}^{2}\right)-\sqrt{{\left({\stackrel{\u02c6}{\mu}}_{1}-{l}_{{\mu}_{1}}\right)}^{2}+{\left(\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{1}^{2}-\frac{1}{2}{l}_{{\sigma}_{1}^{2}}\right)}^{2}}\right]$ and (51)${u}_{{\theta}_{1}}=exp\left[\left({\stackrel{\u02c6}{\mu}}_{1}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{1}^{2}\right)+\sqrt{{\left({u}_{{\mu}_{1}}-{\stackrel{\u02c6}{\mu}}_{1}\right)}^{2}+{\left(\frac{1}{2}{u}_{{\sigma}_{1}^{2}}-\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{1}^{2}\right)}^{2}}\right],$ where ${\stackrel{\u02c6}{\mu}}_{1}$, ${\stackrel{\u02c6}{\sigma}}_{1}^{2}$, l_{μ1}, u_{μ1}, ${l}_{{\sigma}_{1}^{2}}$, and ${u}_{{\sigma}_{1}^{2}}$ are defined in Eqs. (5), (6), (42), (43), (46), and (47), respectively.
Similarly, the lower and upper limits of confidence interval for ${\theta}_{2}=exp\left({\mu}_{2}+\frac{1}{2}{\sigma}_{2}^{2}\right)$ are given by (52)${l}_{{\theta}_{2}}=exp\left[\left({\stackrel{\u02c6}{\mu}}_{2}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{2}^{2}\right)-\sqrt{{\left({\stackrel{\u02c6}{\mu}}_{2}-{l}_{{\mu}_{2}}\right)}^{2}+{\left(\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{2}^{2}-\frac{1}{2}{l}_{{\sigma}_{2}^{2}}\right)}^{2}}\right]$ and (53)${u}_{{\theta}_{2}}=exp\left[\left({\stackrel{\u02c6}{\mu}}_{2}+\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{2}^{2}\right)+\sqrt{{\left({u}_{{\mu}_{2}}-{\stackrel{\u02c6}{\mu}}_{2}\right)}^{2}+{\left(\frac{1}{2}{u}_{{\sigma}_{2}^{2}}-\frac{1}{2}{\stackrel{\u02c6}{\sigma}}_{2}^{2}\right)}^{2}}\right],$ where ${\stackrel{\u02c6}{\mu}}_{2}$, ${\stackrel{\u02c6}{\sigma}}_{2}^{2}$, l_{μ2}, u_{μ2}, ${l}_{{\sigma}_{2}^{2}}$, and ${u}_{{\sigma}_{2}^{2}}$ are defined in Eqs. (11), (12), (44), (45), (48), (49), respectively.
Using the concept of Donner & Zou (2012) and Thangjai & Niwitpong (2023), the lower and upper limits of confidence interval for $\theta =\frac{{\theta}_{1}}{{\theta}_{2}}$ are given by (54)$L}_{MOVER}=\frac{{\stackrel{\u02c6}{\theta}}_{1}{\stackrel{\u02c6}{\theta}}_{2}-\sqrt{{\left({\stackrel{\u02c6}{\theta}}_{1}{\stackrel{\u02c6}{\theta}}_{2}\right)}^{2}-{l}_{{\theta}_{1}}{u}_{{\theta}_{2}}\left(2{\stackrel{\u02c6}{\theta}}_{1}-{l}_{{\theta}_{1}}\right)\left(2{\stackrel{\u02c6}{\theta}}_{2}-{u}_{{\theta}_{2}}\right)}}{{u}_{{\theta}_{2}}\left(2{\stackrel{\u02c6}{\theta}}_{2}-{u}_{{\theta}_{2}}\right)$ and (55)${U}_{MOVER}=\frac{{\stackrel{\u02c6}{\theta}}_{1}{\stackrel{\u02c6}{\theta}}_{2}+\sqrt{{\left({\stackrel{\u02c6}{\theta}}_{1}{\stackrel{\u02c6}{\theta}}_{2}\right)}^{2}-{u}_{{\theta}_{1}}{l}_{{\theta}_{2}}\left(2{\stackrel{\u02c6}{\theta}}_{1}-{u}_{{\theta}_{1}}\right)\left(2{\stackrel{\u02c6}{\theta}}_{2}-{l}_{{\theta}_{2}}\right)}}{{l}_{{\theta}_{2}}\left(2{\stackrel{\u02c6}{\theta}}_{2}-{l}_{{\theta}_{2}}\right)},$ where ${\stackrel{\u02c6}{\theta}}_{1}$, ${\stackrel{\u02c6}{\theta}}_{2}$, l_{θ1}, u_{θ1}, l_{θ2}, and u_{θ2} are defined in Eqs. (8), (14), (50), (51), (52), and (53), respectively.
Therefore, the 100(1 − α)% two-sided confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data using the MOVER approach is given by (56)$C{I}_{MOVER}=\left[{L}_{MOVER},{U}_{MOVER}\right],$ where L_{MOVER} and U_{MOVER} are defined in Eqs. (54) and (55), respectively.
Results
In this section, we conducted simulation studies to evaluate the performance of the proposed confidence interval, which was constructed using four different approaches. We calculated the coverage probability and average length using R software. The criteria for choosing the best performing confidence interval were a coverage probability greater than or equal to 0.95 and the shortest average length for each tested scenario. For each generated data set, we used R code to compute the confidence intervals based on the GCI, Bayesian, parametric bootstrap, and MOVER approaches (Thangjai & Niwitpong, 2023), with M = 5,000 runs for each and m = 2,500 runs for the GCI, Bayesian, and parametric bootstrap approaches. Following Owen & De Rouen (1980), the standardized sensitivity for many air contaminants is typically around 0.25. Although 0.25 was determined to be the most appropriate value for ξ when examining the censoring techniques, additional runs were performed using ξ = 0.10, allowing for the examination of results for the delta distribution with different values of μ and σ. We generated random sample sizes of (n_{1}, n_{2}) = (20,20), (30,30), (20,30), (50,50), (30,50), (100,100), and (50,100) with specific parameters, as described in Table 1.
Run number | (μ_{1}, μ_{2}) | (σ_{1}, σ_{2}) | (δ_{1}, δ_{2}) | (ξ_{1}, ξ_{2}) |
---|---|---|---|---|
0 | (0.00,0.00) | (0.30,0.30) | (0.10,0.10) | No censoring |
1 | (0.00,0.00) | (1.00,1.00) | (0.10,0.10) | (0.10,0.10) |
2 | (0.00,0.00) | (1.00,1.00) | (0.10,0.10) | (0.10,0.25) |
3 | (0.00,0.00) | (1.00,1.00) | (0.10,0.25) | (0.10,0.10) |
4 | (0.00,0.00) | (1.00,1.00) | (0.10,0.25) | (0.10,0.25) |
5 | (0.00,0.00) | (1.00,2.00) | (0.10,0.10) | (0.10,0.10) |
6 | (0.00,0.00) | (1.00,2.00) | (0.10,0.10) | (0.10,0.25) |
7 | (0.00,0.00) | (1.00,2.00) | (0.10,0.25) | (0.10,0.10) |
8 | (0.00,0.00) | (1.00,2.00) | (0.10,0.25) | (0.10,0.25) |
The Algorithm 4 is used to construct the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data, and then the coverage probability and average length of the confidence intervals are computed.
Algorithm 4.
Step 1: Generate z_{1} from delta-lognormal distribution with parameters μ_{1}, σ_{1}, and δ_{1} and set x_{1} from log-normal distribution with parameters μ_{1} and σ_{1}
Step 2: Generate z_{2} from delta-lognormal distribution with parameters μ_{2}, σ_{2}, and δ_{2} and set x_{2} from log-normal distribution with parameters μ_{2} and σ_{2}
Step 3: Compute y_{1} = log(x_{1}) and select y_{1} > log(ξ_{1})
Step 4: Compute y_{2} = log(x_{2}) and select y_{2} > log(ξ_{2})
Step 5: Compute n_{1(1)}, n_{1(2)}, n_{2(1)}, n_{2(2)}, ${\stackrel{\u02c6}{\mu}}_{1}$, ${\stackrel{\u02c6}{\mu}}_{2}$, ${\stackrel{\u02c6}{\sigma}}_{1}$, ${\stackrel{\u02c6}{\sigma}}_{2}$, ${\stackrel{\u02c6}{\theta}}_{1}$, ${\stackrel{\u02c6}{\theta}}_{2}$, and $\stackrel{\u02c6}{\theta}$
Step 6: Construct the confidence intervals CI_{GCI}, CI_{BS}, CI_{PB}, and CI_{MOVER}
Step 7: If L ⩽ θ ⩽ U, set p = 1; else set p = 0
Step 8: Compute U − L
Step 9: Repeat step 1 - step 8, a total M times
Step 10: Compute mean of p defined by the coverage probability
Step 11: Compute mean of U − L defined by the average length
(n_{1}, n_{2}) | Run number | CP (AL) | |||
---|---|---|---|---|---|
CI_{GCI} | CI_{BS} | CI_{PB} | CI_{MOVER} | ||
(20, 20) | 0 | 0.9442 | 0.9570 | 0.9376 | 0.9480 |
(0.4111) | (0.4326) | (0.3913) | (0.4358) | ||
1 | 0.9822 | 0.9812 | 0.9396 | 0.9870 | |
(4.2060) | (3.5736) | (2.1539) | (5.1344) | ||
2 | 0.9796 | 0.9850 | 0.9488 | 0.9850 | |
(4.6143) | (3.8363) | (2.2088) | (6.4682) | ||
3 | 0.9862 | 0.9858 | 0.9258 | 0.9948 | |
(4.5750) | (4.0626) | (2.4027) | (6.2972) | ||
4 | 0.9828 | 0.9904 | 0.9574 | 0.9898 | |
(5.0102) | (4.2274) | (2.4848) | (6.2869) | ||
5 | 0.9566 | 0.9718 | 0.9668 | 0.9774 | |
(1.7387) | (1.5506) | (1.5333) | (2.4622) | ||
6 | 0.9540 | 0.9768 | 0.9642 | 0.9772 | |
(1.7741) | (1.6403) | (1.5060) | (2.6641) | ||
7 | 0.9476 | 0.9718 | 0.9578 | 0.9864 | |
(1.8618) | (1.7981) | (1.8586) | (3.0510) | ||
8 | 0.9342 | 0.9754 | 0.9614 | 0.9848 | |
(1.9484) | (1.9493) | (1.8062) | (5.8979) | ||
(30,30) | 0 | 0.9344 | 0.9506 | 0.9334 | 0.9460 |
(0.3275) | (0.3440) | (0.3233) | (0.3468) | ||
1 | 0.9816 | 0.9824 | 0.9420 | 0.9868 | |
(2.6356) | (2.3787) | (1.6508) | (2.8968) | ||
2 | 0.9808 | 0.9878 | 0.9582 | 0.9832 | |
(2.8724) | (2.4416) | (1.6901) | (2.9633) | ||
3 | 0.9880 | 0.9878 | 0.9366 | 0.9952 | |
(2.8720) | (2.7300) | (1.8001) | (3.4266) | ||
4 | 0.9798 | 0.9924 | 0.9614 | 0.9880 | |
(3.2285) | (2.8937) | (1.9375) | (3.5624) | ||
5 | 0.9540 | 0.9678 | 0.9612 | 0.9766 | |
(1.0608) | (0.9934) | (1.0431) | (1.2940) | ||
6 | 0.9520 | 0.9760 | 0.9630 | 0.9786 | |
(1.0498) | (1.0423) | (1.0376) | (1.3833) | ||
7 | 0.9340 | 0.9622 | 0.9432 | 0.9810 | |
(1.1444) | (1.1641) | (1.2322) | (1.5948) | ||
8 | 0.9378 | 0.9784 | 0.9628 | 0.9816 | |
(1.1568) | (1.2536) | (1.2359) | (1.7320) | ||
(20,30) | 0 | 0.9428 | 0.9568 | 0.9406 | 0.9508 |
(0.3748) | (0.3932) | (0.3589) | (0.3964) | ||
1 | 0.9814 | 0.9858 | 0.9328 | 0.9874 | |
(3.9451) | (3.4056) | (1.9245) | (4.9194) | ||
2 | 0.9778 | 0.9866 | 0.9418 | 0.9814 | |
(4.3365) | (3.5499) | (1.9858) | (5.1065) | ||
3 | 0.9856 | 0.9814 | 0.9080 | 0.9928 | |
(4.0345) | (3.5815) | (2.0299) | (5.1957) | ||
4 | 0.9820 | 0.9938 | 0.9616 | 0.9886 | |
(4.7817) | (4.0717) | (2.2738) | (5.8989) | ||
5 | 0.9610 | 0.9722 | 0.9580 | 0.9768 | |
(1.3996) | (1.2598) | (1.1147) | (1.9116) | ||
6 | 0.9572 | 0.9774 | 0.9670 | 0.9824 | |
(1.3685) | (1.3347) | (1.1302) | (2.0536) | ||
7 | 0.9372 | 0.9638 | 0.9416 | 0.9818 | |
(1.4648) | (1.4277) | (1.3012) | (2.2402) | ||
8 | 0.9382 | 0.9826 | 0.9658 | 0.9856 | |
(1.5085) | (1.5732) | (1.3543) | (2.5036) | ||
(50,50) | 0 | 0.9374 | 0.9496 | 0.9422 | 0.9488 |
(0.2475) | (0.2593) | (0.2508) | (0.2609) | ||
1 | 0.9824 | 0.9806 | 0.9460 | 0.9868 | |
(1.7057) | (1.6105) | (1.1956) | (1.7847) | ||
2 | 0.9808 | 0.9876 | 0.9632 | 0.9826 | |
(1.8963) | (1.6690) | (1.2599) | (1.8472) | ||
3 | 0.9854 | 0.9842 | 0.9246 | 0.9932 | |
(1.8930) | (1.9056) | (1.3053) | (2.1481) | ||
4 | 0.9754 | 0.9924 | 0.9600 | 0.9810 | |
(2.1639) | (2.0206) | (1.4505) | (2.2564) | ||
5 | 0.9516 | 0.9660 | 0.9578 | 0.9750 | |
(0.6564) | (0.6582) | (0.6677) | (0.7660) | ||
6 | 0.9446 | 0.9722 | 0.9636 | 0.9764 | |
(0.6627) | (0.7121) | (0.7045) | (0.8355) | ||
7 | 0.9254 | 0.9540 | 0.9288 | 0.9776 | |
(0.6808) | (0.7374) | (0.7530) | (0.9003) | ||
8 | 0.9276 | 0.9760 | 0.9594 | 0.9834 | |
(0.6866) | (0.8167) | (0.8121) | (0.9944) | ||
(30,50) | 0 | 0.9376 | 0.9510 | 0.9390 | 0.9464 |
(0.2918) | (0.3055) | (0.2894) | (0.3074) | ||
1 | 0.9852 | 0.9866 | 0.9382 | 0.9902 | |
(2.3965) | (2.1589) | (1.4367) | (2.5955) | ||
2 | 0.9836 | 0.9900 | 0.9534 | 0.9844 | |
(2.6092) | (2.2311) | (1.5015) | (2.6781) | ||
3 | 0.9846 | 0.9820 | 0.9158 | 0.9922 | |
(2.4721) | (2.3535) | (1.5021) | (2.8636) | ||
4 | 0.9818 | 0.9932 | 0.9682 | 0.9888 | |
(2.8244) | (2.5417) | (1.6759) | (3.0589) | ||
5 | 0.9550 | 0.9676 | 0.9498 | 0.9774 | |
(0.8059) | (0.7777) | (0.7231) | (0.9666) | ||
6 | 0.9412 | 0.9736 | 0.9584 | 0.9794 | |
(0.7694) | (0.8141) | (0.7389) | (1.0140) | ||
7 | 0.9362 | 0.9606 | 0.9284 | 0.9804 | |
(0.8027) | (0.8387) | (0.7959) | (1.0721) | ||
8 | 0.9284 | 0.9782 | 0.9594 | 0.9816 | |
(0.8330) | (0.9602) | (0.8796) | (1.2372) | ||
(100,100) | 0 | 0.9400 | 0.9530 | 0.9512 | 0.9520 |
(0.1726) | (0.1807) | (0.1784) | (0.1820) | ||
1 | 0.9852 | 0.9858 | 0.9530 | 0.9888 | |
(1.0970) | (1.0656) | (0.8239) | (1.1218) | ||
2 | 0.9696 | 0.9832 | 0.9512 | 0.9730 | |
(1.2069) | (1.0866) | (0.8579) | (1.1443) | ||
3 | 0.9832 | 0.9862 | 0.9238 | 0.9934 | |
(1.2068) | (1.2656) | (0.8807) | (1.3390) | ||
4 | 0.9716 | 0.9872 | 0.9372 | 0.9724 | |
(1.3686) | (1.3111) | (0.9788) | (1.3837) | ||
5 | 0.9462 | 0.9610 | 0.9438 | 0.9734 | |
(0.3998) | (0.4269) | (0.4103) | (0.4614) | ||
6 | 0.9156 | 0.9706 | 0.9596 | 0.9758 | |
(0.3991) | (0.4655) | (0.4399) | (0.5040) | ||
7 | 0.9014 | 0.9382 | 0.9036 | 0.9712 | |
(0.4029) | (0.4698) | (0.4404) | (0.5244) | ||
8 | 0.8904 | 0.9720 | 0.9538 | 0.9804 | |
(0.4092) | (0.5367) | (0.5033) | (0.5946) | ||
(50,100) | 0 | 0.9422 | 0.9530 | 0.9482 | 0.9510 |
(0.2137) | (0.2237) | (0.2177) | (0.2250) | ||
1 | 0.9846 | 0.9838 | 0.9432 | 0.9890 | |
(1.5013) | (1.4103) | (1.0234) | (1.5517) | ||
2 | 0.9760 | 0.9846 | 0.9564 | 0.9804 | |
(1.6523) | (1.4719) | (1.0816) | (1.6180) | ||
3 | 0.9842 | 0.9848 | 0.9204 | 0.9932 | |
(1.5717) | (1.5690) | (1.0646) | (1.7336) | ||
4 | 0.9786 | 0.9930 | 0.9642 | 0.9808 | |
(1.8058) | (1.6931) | (1.2063) | (1.8667) | ||
5 | 0.9508 | 0.9608 | 0.9426 | 0.9732 | |
(0.4717) | (0.4879) | (0.4421) | (0.5458) | ||
6 | 0.9256 | 0.9722 | 0.9534 | 0.9728 | |
(0.4616) | (0.5328) | (0.4777) | (0.5973) | ||
7 | 0.9098 | 0.9412 | 0.8990 | 0.9724 | |
(0.4611) | (0.5181) | (0.4653) | (0.5922) | ||
8 | 0.8950 | 0.9716 | 0.9502 | 0.9774 | |
(0.4714) | (0.6041) | (0.5407) | (0.6880) |
Notes:
Bold font means the confidence interval with coverage probability greater than or equal to 0.95 and the shortest average length.
The coverage probability and average length of the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data are presented in Table 2 and shown in Figs. 1–3. For run 0, the Bayesian approach outperforms the others in terms of both coverage probability and average length for all sample sizes. Overall, the coverage probabilities are less than or equal to the nominal confidence level of 0.95. Therefore, we used the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data to estimate the ratio of means for datasets containing zero, positive, and censored observations. For runs 1–8, the results show that the coverage probabilities of the confidence intervals based on the GCI, Bayesian, parametric bootstrap and MOVER approaches were almost greater than the nominal confidence level of 0.95. The average lengths of the confidence interval based on the Bayesian approach were the shortest for (ξ_{1}, ξ_{2}) = (0.10,0.10), while the average lengths of the confidence interval based on the parametric bootstrap approach were shorter than those of the others for (ξ_{1}, ξ_{2}) = (0.10,0.25). The results indicate that the Bayesian approach is recommended for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data for (ξ_{1}, ξ_{2}) = (0.10,0.10). However, the parametric bootstrap approach can be used to estimate the confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data for (ξ_{1}, ξ_{2}) = (0.10,0.25). Moreover, the GCI approach can be used to construct the confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data for run 3 and run 5 when the sample sizes are increase.
Empirical application
The GCI, Bayesian, parametric bootstrap, and MOVER approaches discussed in ‘Materials & Methods’ can be applied to estimate the ratio of average daily rainfall datasets from Chiang Rai and Chiang Mai provinces in Thailand. Table 3 shows the daily rainfall data from June 1st to June 30th, 2022, presented by the Thai Meteorological Department. The table includes 30 observations, out of which 13 of 30 (43.33%) represent positive observed values in Chiang Rai province, and nine of 30 (30.00%) represent positive observed values in Chiang Mai province. Table 4 shows the possible distributions for the positive rainfall data applied to the minimum Akaike information criterion (AIC). Figure 4 presents the densities of the daily rainfall data in Chiang Rai and Chiang Mai provinces. Figure 5 presents the histograms of the daily rainfall data in Chiang Rai and Chiang Mai provinces. Figure 6 presents the normal QQ-plots of the log-transformed the daily rainfall data in Chiang Rai and Chiang Mai provinces. The log-transformed positive daily rainfall values of Chiang Rai and Chiang Mai provinces follow normal distributions. Therefore, the daily rainfall datasets in Chiang Rai and Chiang Mai provinces fit the delta-lognormal distributions.
Province | Daily rainfall data (mm) | ||||
---|---|---|---|---|---|
Chiang Rai province | 0.0 | 0.4 | 0.5 | 0.0 | 0.0 |
0.0 | 0.1 | 0.0 | 0.2 | 0.0 | |
0.0 | 0.0 | 0.0 | 3.3 | 21.0 | |
7.9 | 0.2 | 0.0 | – | 2.5 | |
0.0 | 0.0 | 47.5 | 0.5 | 0.0 | |
10.4 | 0.0 | 0.0 | 9.0 | – | |
Chiang Mai province | 0.0 | 0.0 | 0.0 | 0.0 | 41.9 |
0.4 | 0.0 | 0.0 | 0.1 | 0.0 | |
0.0 | 0.0 | 0.0 | 0.0 | 1.3 | |
0.0 | 0.2 | – | – | 0.0 | |
0.0 | 0.0 | 23.1 | 0.0 | 0.1 | |
– | 0.0 | – | 5.4 | 8.7 |
Notes:
Source: Thai Meteorological Department.
Distribution | Chiang Rai province | Chiang Mai province |
---|---|---|
Normal | 107.2557 | 76.5153 |
Log-Normal | 73.6376 | 51.4921 |
Gamma | 75.4915 | 52.5742 |
Exponential | 80.9402 | 58.5944 |
Notes:
Bold font means the distribution with the lowest AIC value.
For Chiang Rai province, the statistics are n_{1} = 30, n_{1(1)} = 18, n_{1(2)} = 12, ${\stackrel{\u02c6}{\mu}}_{1}=-1.24$, ${\stackrel{\u02c6}{\sigma}}_{1}^{2}=9.84$, and ${\stackrel{\u02c6}{\theta}}_{1}=39.61$. For Chiang Mai province, the statistics are n_{2} = 30, n_{2(1)} = 23, n_{2(2)} = 7, ${\stackrel{\u02c6}{\mu}}_{2}=-1.65$, ${\stackrel{\u02c6}{\sigma}}_{2}^{2}=13.57$, and ${\stackrel{\u02c6}{\theta}}_{2}=168.88$. Therefore, the ratio of means of the daily rainfall data in Chiang Rai and Chiang Mai provinces is $\stackrel{\u02c6}{\theta}=0.23$. The 95% two-sided confidence intervals for the ratio of means of the daily rainfall data in Chiang Rai and Chiang Mai provinces are constructed based on GCI, Bayesian, parametric bootstrap, and MOVER approaches. For GCI approach, CI_{GCI} = [0.0005,34.4100] with an interval length of 34.4095. For Bayesian approach, CI_{BS} = [0.0000,291.1944] with an interval length of 291.1944. For parametric bootstrap approach, CI_{PB} = [-54.4900,63.6449] with an interval length of 118.1349. For MOVER approach, CI_{MOVER} = [0.0000,40960.7200] with an interval length of 40960.7200. The lower and upper limits of the 95% confidence interval correspond to the 2.50-th and 97.50-th percentiles of the average rainfall average ratio between Chiang Rai and Chiang Mai provinces. Therefore, the GCI approach has the shortest interval length. Therefore, the GCI approach is recommended for constructing the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data. Moreover, confidence intervals for the ratio of means of delta-lognormal distributions, based on left-censored data, can be applied to environmental, meteorological, and climatological data, which often consist of positive values or exhibit right-skewed distributions, such as PM2.5 and PM10.
Discussion
Ratio of parameters focuses on the relative strength or proportion of effects, while difference of parameters emphasizes the absolute difference in the effects of two variables. Both concepts are valuable in statistical analysis, and their application in rainfall data analysis depends on the specific research question and the variables being examined. In bioassays, the ratio quantities are of potential interest. Calculating relative potency necessitates estimating the ratio of normal means. This is due to the fact that the ratio of means represents the expected values of the least squares estimates in a simple linear regression. Moreover, the problem of estimating the unoriented direction of a mean vector can lead to ratio estimation, as the direction is fully specified by the collection of all ratios of the component means (James, 1982). Therefore, the ratio of means is important. Several researchers have studied interval estimation for the ratio of means. For example, in environmental science, Zhang et al. (2021) constructed simultaneous confidence intervals for ratios of means of zero-inflated log-normal populations, with an application to rainfall data. Moreover, Singhasomboon & Piladaeng (2023) approximated the estimation of the ratio of means for log-normal distributions, applied to PM2.5 concentrations in northern Thailand. In medical science, Abdel-Karim (2015), Zhou & Tu (2020), and Chen & Zhou (2006b) estimated the ratio of means for medical costs. Furthermore, Singhasomboon, Panichkitkosolkul & Volodin (2021) constructed confidence intervals for the ratio of means to compare the survival times in months for patients who died from two cancer groups.
Thangjai et al. (2023) proposed the confidence interval estimation for the ratio of the percentiles of two delta-lognormal distributions using the Bayesian and parametric bootstrap approaches. Thangjai & Niwitpong (2023) estimated the confidence intervals for mean and difference between means of delta-lognormal distributions based on left-censored data using the GCI, Bayesian, parametric bootstrap, and MOVER approaches. In this study, we extended the GCI, Bayesian, parametric bootstrap, and MOVER approaches to construct the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data.
We recommend using the Bayesian approach to construct the credible intervals for the ratio of means of delta-lognormal distributions based on left-censored data. This is consistent with the findings of previous studies by Thangjai et al. (2023) and Thangjai & Niwitpong (2023). Moreover, the parametric bootstrap is considered to construct the confidence interval. It is similar to Thangjai et al. (2023) and Altunkaynak & Gamgam (2019). Additionally, the GCI approach, which is similar to the method proposed by Chen & Zhou (2006a), Tian & Wu (2007), and Ye, Ma & Wang (2010), can also be used for estimating the confidence interval.
Conclusions
We constructed confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data using the GCI, Bayesian, parametric bootstrap, and MOVER approaches. The Bayesian credible intervals performed the best in terms of coverage probabilities and average lengths for (ξ_{1}, ξ_{2}) = (0.10,0.10). Moreover, we recommend using the GCI approach to construct the confidence interval for (ξ_{1}, ξ_{2}) = (0.10,0.25).
Supplemental Information
The daily rainfall data of Chiang Rai and Chiang Mai provinces
(A) Chiang Rai (B) Chiang Mai.