The Bayesian confidence intervals for measuring the difference between dispersions of rainfall in Thailand

The coefficient of variation is often used to illustrate the variability of precipitation. Moreover, the difference of two independent coefficients of variation can describe the dissimilarity of rainfall from two areas or times. Several researches reported that the rainfall data has a delta-lognormal distribution. To estimate the dynamics of precipitation, confidence interval construction is another method of effectively statistical inference for the rainfall data. In this study, we propose confidence intervals for the difference of two independent coefficients of variation for two delta-lognormal distributions using the concept that include the fiducial generalized confidence interval, the Bayesian methods, and the standard bootstrap. The performance of the proposed methods was gauged in terms of the coverage probabilities and the expected lengths via Monte Carlo simulations. Simulation studies shown that the highest posterior density Bayesian using the Jeffreys’ Rule prior outperformed other methods in virtually cases except for the cases of large variance, for which the standard bootstrap was the best. The rainfall series from Songkhla, Thailand are used to illustrate the proposed confidence intervals.


INTRODUCTION
Recently, the Earth's climate has been changing significantly due to the greenhouse effect, which is causing both rising temperatures and variability in precipitation (Attavanich, 2013). In particular, Thailand, which is an agricultural country, is greatly affected by such phenomena since agriculture mainly relies on rainfall. The amount of rainfall in Thailand fluctuates quite widely due to the influence of the southwest and northeast monsoons (Eso, Kuning & Chuai-Aree, 2015). In previous years, several areas in Thailand have been affected by heavy rain that produced flooding, a major cause of economic, life, and property loss.
It is important to investigate the coefficient of variation of rainfall data series to understand the dynamics of precipitation in each area. Furthermore, the difference between two areas or time periods of heavy rainfall measured with their coefficients of variation is of interest. The government can use this information for advanced planning to prevent problems caused by excessive rainfall. Many researchers have found that rainfall data series follow a bivariate lognormal distribution (a delta-lognormal distribution) (Fukuchi, 1988;Shimizu, 1993;Kong et al., 2012;Maneerat, Niwitpong & Niwitpong, 2019aYosboonruang, Niwitpong & Niwitpong, 2019b;Yue, 2000).
Confidence interval construction is another method of effective statistical inference for applying to delta-lognormal distributions and methods to construct them have been reported by several researchers. Zhou & Tu (2000) proposed confidence intervals for the mean including a percentile-t bootstrap interval based on sufficient statistics, a bias-corrected maximum likelihood method, and an interval based on a likelihood ratio testing method; the bootstrap interval performed the best for both one-sided and two-sided intervals with a small sample size. Tian (2005) compared the generalized variables method and the generalized pivotal quantity (GPQ) to construct confidence intervals for the mean, between which the generalized variables method was preferable. Tian & Wu (2006) recommended using the adjusted signed log-likelihood ratio statistic to construct confidence intervals for the mean. Chen & Zhou (2006) considered interval estimations for the ratio of or difference between two means using a true generalized pivotal (GP) method, an approximate GP method, a signed log-likelihood ratio method, and a modified signed log-likelihood ratio method; their results show that the approximate GP method performed the best. Fletcher (2008) used three methods, Aitchison's estimator, a modification of Cox's method, and a profile-likelihood interval, to construct confidence intervals for the mean; they found that the profile-likelihood interval was the best unless the sample size was small with a low-to-moderate level of skewness. Li, Zhou & Tian (2013) presented an approximate GPQ and the fiducial quantity to establish confidence intervals for the mean; their results indicate that the fiducial method was the most suitable. Wu & Hsieh (2014) introduced the generalized confidence interval (GCI) to construct confidence intervals for the mean that were better than Aitchison's method, a modified Land's method, and the profile-likelihood interval. Maneerat, Niwitpong & Niwitpong (2018) constructed confidence intervals for the mean using GCI, the method of variance estimate recovery (MOVER) based on the variance stabilizing transformation (VST), Wilson's score, and Jeffrey's method; GCI and the three MOVER methods had similar performances except for cases where the probability had values close to zero and the coefficient of variation was large. Moreover, they compared GCI and MOVER based on a weighted beta distribution and VST to construct confidence intervals for the mean and recommended MOVER based on VST (Maneerat, Niwitpong & Niwitpong, 2019b). In addition, Maneerat, Niwitpong & Niwitpong (2019a) suggested Bayesian methods to construct the highest posterior density (HPD) intervals for a single mean and the difference between two means.
Apart from the mean, the coefficient of variation, which is defined as the ratio of the standard deviation to the mean, has been used to solve this statistical problem. There have been many studies focused on confidence interval estimation for the coefficient of variation of normal and non-normal distributions. For instance, Wong & Wu (2002) constructed confidence intervals by developing small sample asymptotic methods for both normal and non-normal models. In addition, confidence interval estimations for the coefficient of variation of a normal distribution have been reported by Tian (2005), Donner & Zou (2012), and Wongkhao, Niwitpong & Niwitpong (2015). Confidence intervals for the coefficient of variation have been established for skewed distributions. Sangnawakij & Niwitpong (2017a) presented confidence interval estimations for the coefficient of variation and the difference between coefficients of variation based on MOVER, GCI, and asymptotic confidence interval for two-parameter exponential distributions, their results indicating that GCI outperformed the other methods. Thangjai & Niwitpong (2017) proposed confidence intervals for the weighted coefficients of variation of two-parameter exponential distributions using the adjusted MOVER, GCI, and large sample methods, their result showing that GCI was the best choice. Yosboonruang, Niwitpong & Niwitpong (2018) constructed confidence intervals for the coefficient of variation of a delta-lognormal distribution using GCI and a modified Fletcher method and found that GCI was the most appropriate. Yosboonruang, Niwitpong & Niwitpong (2019a) presented the fiducial generalized confidence interval (FGCI) and MOVER to construct confidence intervals for three parameters of a delta-lognormal coefficient of variation. They reported that FGCI was suitable for small sample sizes while MOVER performed similarly well to FGCI when the sample sizes were large. In addition, they constructed confidence intervals using Bayesian methods with equitailed confidence intervals and the HPD interval and compared them with FGCI; their results show that the Bayesian equitailed confidence interval was appropriate in all cases (Yosboonruang, Niwitpong & Niwitpong, 2019b).
Confidence interval estimations for functions of the coefficient of variation are of interest. For normal distributions, Donner & Zou (2012) presented MOVER to construct a confidence interval for the difference between two coefficients of variation. Their proposed method performed well for both the coverage percentage and balance between the tail errors. Niwitpong (2015) proposed confidence intervals for the difference between the coefficients of variation with bounded parameters; their results show that their proposed confidence intervals outperformed other classical ones in terms of the coverage probability and the average length.
For skewed distributions, Buntao & Niwitpong (2012) constructed confidence intervals for the difference between coefficients of variation for lognormal and delta-lognormal distributions by using the GP method and a closed-form method of variance estimation; their results for both lognormal and delta-lognormal distributions indicate that the GP method was better than the closed-form method in all cases. Buntao & Niwitpong (2013) produced confidence intervals for the ratio of the coefficients of variation of delta-lognormal distributions based on the GP method and the MOVER based Wald interval; they suggested that the GP method was the most appropriate. Sangnawakij & Niwitpong (2017b) constructed new confidence intervals for functions of the difference between and the ratio of the coefficients of variation with restricted parameters in two gamma distributions; they found that the expected lengths of the proposed confidence intervals were shorter than other classical estimators.
Although a number of previous studies have reported on constructing confidence intervals for several parameters in each distribution, there has been only one study on constructing confidence intervals for the difference between the coefficients of variation of two delta-lognormal distributions. Constructing confidence intervals using general methods are quite complex. Furthermore, results have revealed that the performances of these methods are not consistent since the coverage probabilities are less than the target in a few cases. From the perspective of rainfall data, estimating the difference between two independent coefficients of variation can help to elucidate rainfall variability in terms of time or area. It is useful for forecasting rainfall to help in planning for and managing risky situations that can arise from rainfall variation.
In this study, the difference between the coefficients of variation of two delta-lognormal distributions was investigated. In previous studies (Donner & Zou, 2012;Li, Zhou & Tian, 2013;Wu & Hsieh, 2014;Sangnawakij & Niwitpong, 2017a;Thangjai & Niwitpong, 2017;Maneerat, Niwitpong & Niwitpong, 2018Yosboonruang, Niwitpong & Niwitpong, 2018, confidence intervals for the difference between the coefficients of variation of two delta-lognormal distributions were constructed using three methods (GCI, FGCI, and MOVER). Our preliminary study indicates that these methods performed similarly, although FGCI is the best due to having the shortest expected length. Therefore, we constructed new confidence intervals for the difference between the coefficients of variation of two delta-lognormal distributions using Bayesian methods and a standard bootstrap (SB) method and compared them with FGCI. The details of each method are presented in the next section, after which the results are presented. Next, the efficacies of the proposed methods for constructing confidence intervals are illustrated using rainfall data in an empirical example, followed by a discussion and conclusions of the study outcomes.

MATERIALS AND METHODS
In statistical inference and its applications, data containing non-negative values can be skewed and many zero observations can be observed. Aitchison (1955) introduced the delta-lognormal distribution for data series containing non-negative values and true-zero values of the variables. The positive observed values, denoted by n i(1) , have a lognormal distribution, and the true-zero observed values, denoted by n i(0) , have a binomial distribution with the probability of zero observationsd The distribution function of the delta-lognormal distribution presented by Tian & Wu (2006) is (1) where Hðx ij ; m i ; s 2 i Þ is the lognormal cumulative distribution function. Assume that Y ij = ln(X ij ), i = 1, 2, j = 1, 2,…,n i(1) is a normal distribution with mean m i and variance s 2 i . Thus, the population mean and variance of a delta-lognormal distribution as presented by Aitchison (1955) Herein, we focus on confidence interval estimations for the difference between the coefficients of variation of two delta-lognormal distributions. The coefficient of variation of a delta-lognormal distribution can be expressed as (2) It is easy to find the difference between two independent coefficients of variation: (3)

The fiducial generalized confidence interval
The basic concept of the fiducial distribution was introduced by Fisher (1930). Moreover, statistical inference using fiduciality can be found in several studies (Dawid & Stone, 1982;Aldrich, 2000;Hannig, Iyer & Patterson, 2006;Hannig, 2009;Hannig & Lee, 2009). After that, Li, Zhou & Tian (2013) proposed the generalized fiducial quantity (GFQ) of a population mean by using the concept of fiducial inference and then constructed confidence intervals for the mean based on the fiducial of a lognormal distribution with excess zeros. Furthermore, Yosboonruang, Niwitpong & Niwitpong (2019a) recommended FGCI to construct confidence intervals for the coefficient of variation of a delta-lognormal distribution. From Hannig (2009) andLi, Zhou &Tian (2013), the GFQs for δ i and s 2 i are and respectively, where U i $ v 2 n ið1Þ À1 . Next, the GFQ for γ can be defined as Therefore, the 100 (1 − a)% confidence interval for γ is where T γ (a/2) and T γ (1 − a/2) are the 100 (a/2)-th and 100 (1 − a/2)-th percentiles of the distribution of T γ , respectively.

Bayesian methods
A delta-lognormal distribution is a combination of the two distributions mentioned earlier, with unknown parameters comprising δ i , m i and s 2 i , denoted as u ¼ ðd i ; m i ; s 2 i Þ. To compare the two population coefficients of variation, the joint likelihood function is expressed as Our approach points toward the difference between two independent coefficients of variation, given as Eq. (3), thus the unknown parameters are δ i , m i and s 2 i , denoted as u ¼ ðd 1 ; m 1 ; s 2 1 ; d 2 ; m 2 ; s 2 2 Þ. The Fisher information ofũ computed by the second-order derivative of the log-likelihood function which is defined as By Eq. (8), the Fisher information matrix forũ becomes To establish confidence intervals using the Bayesian methods, the left-invariant Jeffreys, the Jeffreys' Rule, and uniform priors were used. In this study, we are interested in constructing the HPD intervals. The probability of the shortest interval is discovered when the posterior density value at the lower and upper limits is equal, thus the upper and lower tail areas are not necessarily equal (Bolstad & Curran, 2017).

The Bayesian method using the left-invariant Jeffreys prior
Rainfall series that consist of zero and non-zero values follow a combination of two distributions: binomial and lognormal. As mentioned previously, the parameter of interest for a binomial distribution isd i and by using the Fisher information matrix ofd i , we can obtain the invariant Jeffreys prior by the square root of the determinant of Fisher information matrix which is defined as which is Beta (1/2,1/2). Subsequently, the posterior distribution ofd i for binomial distribution can be expressed as which is Beta (n i(0) + 1/2,n i(1) + 1/2). By Eq. (10), the left-invariant Jeffreys prior for the parameter of interest, s 2 i , from a lognormal distribution obtained by the square root of the determinant of Fisher information matrix is pðs 2 i Þ ¼ 1=s 2 i (Rao & D'Cunha, 2016). Suppose thatd i and s 2 i are independent, then the prior distribution for a delta-lognormal distribution can be written as pðd i ; i . Consequently, the joint posterior density function can be defined as ½lnðx ij Þ Àm i 2 =ðn ið1Þ À 1Þ. Therefore, the posterior distribution ofd i is a beta distribution,d i jdata $ Betaðn ið0Þ þ 1=2; n ið1Þ þ 1=2Þ. Similarly, the posterior distribution of s 2 i is an inverse gamma distribution, s 2 i jdata $ Inv À Gamma½ðn ið1Þ À 1Þ=2; ðn ið1Þ À 1Þŝ 2 i =2.
2. Generate the posterior densities of thed i jdata.
3. Generate the posterior densities of the s 2 i jdata.

The standard bootstrap method
Bootstrapping is a type of resampling method that draws samples with replacement from the initial population Efron (1979). According to sample the data . . . ; x Ã in i Þ be a bootstrap sample from the data. Sinced i andŝ 2 i are the independent unbiased estimators of δ i and s 2 i , respectively, the bootstrap estimators of δ i and s 2 i ared Ã i andŝ 2Ã i , respectively. By resampling K bootstrap samples, letĝ Ã k ¼ĝ Ã 1;k Àĝ Ã 2;k , k = 1,2,…,K be the kth bootstrap estimator of γ. Subsequently, the 100 (1 − a)% confidence interval for γ using SB is where S Ã g Ã is the standard error ofĝ Ã .

The Monte Carlo simulation study
Coverage probabilities and expected lengths were used to compare the performance of the confidence intervals of the proposed methods via Monte Carlo simulation at a nominal confidence level of 0.95. The coverage probabilities that were greater than the nominal confidence level together with the shortest expected lengths were considered as the best. A total of 15,000 replications for each parameter combination were applied for the simulation study involving all of the methods. Moreover, 5,000 duplicates were used for the FGCI and Bayesian methods, and 3,000 resampling samples were used for the bootstrap method. The sample sizes were set as n 1 ,n 2 = 25,50,100; m 1 ,m 2 = 0; δ 1 ,δ 2 = 0.2,0.5,0.8; and s 2 1 ; s 2 2 ¼ 0:5; 1:0; 2:0. Note that in the studies by Fletcher (2008) and Wu & Hsieh (2014), the combinations of n 1 ,n 2 = 25; δ 1 ,δ 2 = 0.2; and s 2 1 ; s 2 2 ¼ 0:5; 1:0; 2:0 were not considered because the expected non-zero values were less than 10.
The methods to construct confidence intervals for the difference between the independent coefficients of variation of two delta-lognormal distributions were evaluated. The results in Table 1 and Figs. 1-3 show that FGCI was stable and close to the target in terms of coverage probability for almost all cases. For the Bayesian HPD intervals based on the left-invariant Jeffreys prior (B linvj ), Jeffreys' Rule prior (B jrule ), and the uniform prior (B uni ), the coverage probabilities were close to or greater than the target in all cases. In addition, the coverage probabilities of the SB were greater than the target in cases of variances equal to 1.0 and 2.0. However, according to the expected lengths, B jrule mostly had shorter expected lengths than the other method except for a few cases when the sample sizes were large in both groups (n 1 ,n 2 = 50,100) and the variance was equal to 0.5 and 1.0, for which the expected lengths of FGCI were the shortest. Moreover, in cases of n 1 :n 2 = 25:25, 50:50, 100:100 and s 2 1 : s 2 2 ¼ 1:0 : 1:0, 2.0:2.0, n 1 :n 2 = 25:50, 50:100 and s 2 1 : s 2 2 ¼ 2:0 : 2:0, and n 1 :n 2 = 25:100 and s 2 1 : s 2 2 ¼ 1:0 : 2:0, the SB method had the shortest expected lengths.

The empirical study
Datasets of rainfall from Thailand were chosen because they usually contain zero values, albeit data containing non-zero values normally follow a lognormal distribution. For rainfall data, Ananthakrishnan & Soman (1989) used the normalized rainfall curve  (NRC) to describe the relationship between the accumulated percentage of the rain amount and the number of rain days in a rainfall series. Their results indicate that the coefficient of variation of the rainfall datasets can be used in the unique determination of  the NRC. Moreover, Shimizu (1993) introduced a probability model for a combination of bivariate and lognormal distributions to represent rainfall data. The author used monthly rainfall data from Jana and Ranod, Songkhla, Thailand from 2008 to 2017 to illustrate confidence intervals for the difference between coefficients of variation from two areas. Songkhla is located on the east coast of southern Thailand and is somewhat rainy due to the influences of the southwest monsoon coming from the Indian Ocean and the northeast monsoon coming from the Gulf of Thailand. This area has a lot of rain from May to December, which decreases from January to April (the datasets were collected by the Southern Meteorological Center (East Coast)). These datasets included both positive and true-zero observations. The positive values for each area create skewness, as shown in Fig. 4, and thus their distributions were subjected to Akaike information criterion (AIC) analyses. The AIC values according to normal, Cauchy, lognormal, exponential, and gamma distributions in Jana were 1421.5050, 1355.5600, 1279.9710, 1281.8810, and To confirm AIC results, the normality plots of the log-transformation of the monthly rainfall data from both areas in Figs. 5 and 6 indicate that both rainfall series are lognormal distributions. Moreover, the true-zero values from Jana and Ranod are binomial distributions. Therefore, the distributions of the monthly rainfall series from Jana and Ranod are delta-lognormal. The summary statistics for Jana are n 1 = 120,d 1 ¼ 0:8917, m 1 ¼ 4:2556,ŝ 2 1 ¼ 1:7953, andĝ 1 ¼ 0:3149 and for Ranod are n 2 = 120,d 2 ¼ 0:7417, m 2 ¼ 4:0846,ŝ 2 2 ¼ 2:4928, andĝ 2 ¼ 0:3865. The difference betweenĝ 1 andĝ 2 is γ = − 0.0716. The 95% confidence intervals for FGCI and SB are (−4.0492, −0.0558),  Fig. 7. The Bayesian method using the Jeffreys' Rule prior outperformed the others in terms of the coverage probability and interval length. Therefore, these results are in accordance with those from the simulation studies when the variance is large. Furthermore, the results for the Bayesian method using the Jeffreys' Rule prior demonstrate that there is not a difference in the rainfall intensity between the areas.

DISCUSSION
Herein, Bayesian and SB methods are proposed to construct confidence intervals for the difference between delta-lognormal coefficients of variation and then compared with FGCI recommended by Yosboonruang, Niwitpong & Niwitpong (2019a). It was found that the coverage probabilities of FGCI were more consistent with the target than the Bayesian and SB methods. The coverage probabilities of the Bayesian method were greater than the nominal confidence level and mostly close to 1.00, which suggests overestimation. Nevertheless, the expected lengths of the Bayesian method using the Jeffreys' Rule prior were shorter than FGCI in almost every case. This is due to the criterion that the posterior density values at the lower and upper limits are equal, which was applied for constructing the confidence intervals of the Bayesian methods. Moreover, in case of small variances, it is notable that the expected lengths of the confidence intervals were sufficiently narrow. This indicated that FGCI and the Bayesian methods can be efficiently used to construct the confidence intervals. Furthermore, the coverage probabilities of the SB method were greater than the nominal confidence level only for the large variance cases, although remarkably, it supplied the shortest expected lengths. However, these three methods required a large amount of computing to obtain the interval estimates due to FGCI must be calculated GFQ for parameters of interest (δ i and s 2 i ) and the Bayesian method must be obtained the posterior densities ofd i and s 2 i . In addition, SB method have to resample bootstrap samples for computing the estimators of δ i and s 2 i which takes more time than FGCI and the Bayesian methods. The results using the two rainfall data series were matched with the simulation, with the Bayesian method using the Jeffreys' Rule prior demonstrating the difference between their coefficients of variation much better than the others.

CONCLUSIONS
In this study, the three concepts: FGCI, Bayesian, and SB methods were used to construct five confidence intervals for the difference between two independent coefficients of variation of a delta-lognormal distribution. Of these, the Bayesian method was used to construct three confidence intervals using the left-invariant Jeffreys, Jeffreys' Rule, and uniform priors under HPD intervals. Other confidence intervals based on the SB method and FGCI were also used.
The results of the simulation studies indicate that the performance of the Bayesian HPD based on the Jeffreys' Rule prior performed the best in almost all cases. Although the coverage probabilities were close to 1.00 for all of the priors, the expected lengths of the Jeffreys' Rule prior were shorter for the confidence intervals of the difference between the coefficients of variation of two delta-lognormal distributions in almost all cases. Moreover, FGCI is appropriate for a large sample size together with small variance while the SB method is suggested for a large variance. Furthermore, a comparison of the simulation