Measuring the dispersion of rainfall using Bayesian confidence intervals for coefficient of variation of delta-lognormal distribution: a study from Thailand

Since rainfall data series often contain zero values and thus follow a delta-lognormal distribution, the coefficient of variation is often used to illustrate the dispersion of rainfall in a number of areas and so is an important tool in statistical inference for a rainfall data series. Therefore, the aim in this paper is to establish new confidence intervals for a single coefficient of variation for delta-lognormal distributions using Bayesian methods based on the independent Jeffreys’, the Jeffreys’ Rule, and the uniform priors compared with the fiducial generalized confidence interval. The Bayesian methods are constructed with either equitailed confidence intervals or the highest posterior density interval. The performance of the proposed confidence intervals was evaluated using coverage probabilities and expected lengths via Monte Carlo simulations. The results indicate that the Bayesian equitailed confidence interval based on the independent Jeffreys’ prior outperformed the other methods. Rainfall data recorded in national parks in July 2015 and in precipitation stations in August 2018 in Nan province, Thailand are used to illustrate the efficacy of the proposed methods using a real-life dataset.


INTRODUCTION
Presently, the effects of global climate change caused by many factors, both natural and man-made (such as fuel burning, burning forests, deforestation, and oil drilling), are continuous. Such factors directly enhance natural changes such as the greenhouse effect and cause changes in precipitation, sea level, and the polar vortex. Thailand is a country that has been affected, as has been seen in the past few years. Especially in the north of Thailand, a lot of deforestation has caused flooding because there are insufficient trees to absorb water due to heavy rain. Subsequently, many organizations, both governmental and from the private sector, are interested in finding ways to mitigate the damage from such events, and thus a study on measuring the dispersion of rainfall in areas with the potential risk of flooding has become necessary. In statistics, the measurement of the The coefficient of variation is another interesting parameter defined as the ratio of the standard deviation to the mean. It is useful to describe the dispersion of data and can be used to compare the degree of variation between two or more datasets with different measurement units. The coefficient of variation is used in several fields, such as medical science, meteorology, agriculture and economics (Kim, Lee & Choi, 2005;Gulhar et al., 2012;Tian, 2005). Recently, several researchers have considered various approaches to construct confidence intervals for coefficients of variation. For example, Wong & Wu (2002), Tian (2005), Mahmoudvand & Hassani (2009), Donner & Zou (2012), and Wongkhao, Niwitpong & Niwitpong (2015) established confidence intervals for the coefficient of variation for a normal distribution. After that, Van Zyl & Van Der Merwe (2017) proposed a Bayesian control chart for the common coefficient of variation for a normal distribution. In studies on two-parameter exponential distributions, Sangnawakij & Niwitpong (2017) used three methods, namely MOVER, GCI, and the asymptotic confidence interval, to establish confidence intervals for a single coefficient of variation and the difference between coefficients of variation, and Thangjai & Niwitpong (2017) presented confidence intervals based on an adjusted MOVER, GCI, and a large sample method for weighted coefficients of variation.
There have been studies on other skewed distributions, such as the one by Fletcher (2008) who presented three methods: Aitchison's estimator (the classical method), a modification of Cox's method for the lognormal, and a profile-likelihood interval, to construct confidence intervals for the mean of a delta-lognormal distribution. Fletcher suggested that Cox's method and profile-likelihood interval, which are the modified methods, are well performed to construct the confidence intervals for the mean of a delta-lognormal distribution. While Aitchison's estimator tend to have too low an upper limit. Therefore, Fletcher not recommend the Aitchison's estimator. Buntao & Niwitpong (2012) revealed the GPA and a closed-form method of variance estimation for coefficients of variation for both lognormal and delta-lognormal distributions. Harvey & Van Der Merwe (2012) constructed confidence intervals for means and variances of lognormal and bivariate lognormal distributions using a Bayesian method. Niwitpong (2013) presented a new confidence interval for the coefficient of variation of a lognormal distribution with restricted parameters. D'Cunha & Rao (2014) offered a Bayes confidence interval for the mean of a lognormal distribution and compared it with the maximum likelihood estimator method. Sangnawakij, Niwitpong & Niwitpong (2015) proposed MOVER with Score and Wald interval methods to construct confidence intervals for the ratio of coefficients of variation of gamma distributions. Rao & D'Cunha (2016) presented Bayesian confidence intervals for the median of a lognormal distribution and compared it with the confidence interval obtained from a Monte Carlo simulation. Recently, Yosboonruang, Niwitpong & Niwitpong (2018) constructed confidence intervals for the coefficient of variation of a delta-lognormal distribution based on a modified Fletcher method using the concept of Fletcher (2008), and the GCI. The modified Fletcher, based on its variance, is the basic method to construct the confidence interval. Although this method failed in term of the coverage probability and the expected length, it is used to compare with the GCI. Moreover, they proposed methods including the fiducial generalized confidence interval (FGCI) and MOVER based on the VST, the Wilson score, and Jeffreys' method to establish the confidence intervals for the coefficient of variation of three parameters of a delta-lognormal distribution, of which FGCI is recommended for constructing confidence intervals (Yosboonruang, Niwitpong & Niwitpong, 2019). In addition, they extended this study to construct confidence intervals for the coefficient of variation.
The goal of this study is to propose new confidence intervals using Bayesian methods and comparing them with FGCI proposed by Yosboonruang, Niwitpong & Niwitpong (2019) for a single coefficient of variation of a delta-lognormal distribution. The methods and theories to establish the confidence intervals are described in section "Methods". Next, a simulation study and results are presented in section "Results", and then the proposed methods are applied to the real-world datasets, as detailed in "An empirical study". The last two sections contain discussion and conclusions on the study.

METHODS
Þbe a positive random variable from a lognormal distribution with parameters m and s 2 , denoted as LN m; s 2 ð Þ. The probability density function of V i is given by Suppose that the population of interest contains both zero and non-zero observed values, denoted by n (0) and n (1) , respectively, where n ¼ n (0) + n (1) . The zero observations follow a binomial distribution, n 0 ð Þ $ Bin n; d 0 ð Þ, where d′ ¼ 1d is the probability of zero observations, and the non-zero observations follow a lognormal distribution, thus resulting in a delta-lognormal distribution. Let X ¼ X 1 ; X 2 ; :::; X n ð Þbe a random sample from a delta-lognormal distribution, denoted by D d 0 ; m; s 2 ð Þ. The distribution function of a delta-lognormal population presented by Tian & Wu (2006) can be derived as where F x i ; m; s 2 ð Þis the lognormal cumulative distribution function. Let Aitchison (1955) described the respective population mean and variance of X as and The minimum variance unbiased estimator of m X was expressed by Aitchison (1955); the estimator of m X is given bŷ i¼1 ln x i ð Þ Àm ½ 2 , then the coefficient of variation of X can be expressed as The methods to construct the confidence intervals for h are proposed in the following section.
The Bayesian confidence interval for a single coefficient of variation If a delta-lognormal distribution has three unknown parameters d 0 ; m; s 2 ð Þ, then the joint likelihood function is given by Therefore, the Fisher information matrix of the unknown parameters d 0 ; m; s 2 ð Þper unit observation is written as In the following section, the Bayesian confidence interval is constructed upon three priors: the independent Jeffreys, Jeffreys' rule, and uniform.
The Bayesian confidence interval using the independent Jeffreys' prior , where I h ð Þ is a Fisher information matrix. It is a non-informative prior distribution used in Bayesian parameter estimation and is very useful because it has the notable property of invariance under the reparameterization of h (Jeffreys, 1946).
The independent Jeffreys' prior is a non-informative prior under the concept of establishing the product of Jeffreys' prior for each parameter while imposing staticity on the others (Rubio & Liseo, 2014).
For a binomial distribution, the parameter of interest is the probability d′, then the Jeffreys' invariant prior for a binomial parameter is given by which is Beta 1=2; 1=2 ð Þ (Bolstad & Curran, 2017). Subsequently, the posterior distribution of d′ is in the form which is a beta distribution Beta n 0 ð Þ þ 1=2; n 1 ð Þ þ 1=2 À Á . Similarly, the independent Jeffreys' prior for a lognormal distribution is p s 2 ð Þ / s À2 . Therefore, the prior distribution for a delta-lognormal distribution can be expressed as The joint posterior density function is clearly defined as wherem ¼ 1 Since d′ and s 2 are independent, then the posterior distributions of d′ and s 2 are a beta and an inverse gamma distribution, respectively, as follows: and To construct the Bayesian confidence interval, d and s 2 in Eq. (6) are substituted by d′ | x and s 2 | x defined in Eqs. (13) and (14), respectively. Therefore, the 100 1 À a ð Þ% two-sided confidence interval for the coefficient of variation based on the independent Jeffreys' prior Bayesian is obtained by where L B:indj h and U B:indj h are the lower and upper bounds of the 100 1 À a ð Þ% equitailed confidence interval and the highest posterior density (HPD) interval of h, respectively.
The HPD interval is an interval in the domain of a posterior probability distribution which gives the narrowest length of the interval (Hyndman, 1995;Yau & Campbell, 2019). It represents the most credible points which cover most of the distribution. In addition, each point inside the interval has a higher probability density than those outside it.
The Bayesian confidence interval using the Jeffreys' Rule prior As mentioned previously, the Jeffreys' Rule prior is obtained from the square root of the determinant of the Fisher information matrix. This prior is appropriate for a single parameter. The Jeffreys' Rule prior has the rule that the prior is invariant (the valuable property) (Lee, 2012), which is imposed as p s 2 ð Þ / s À3 . From Harvey & Van Der Merwe (2012), the Jeffreys' Rule prior for d′ in a binomial distribution is It is easy to find the Jeffreys' Rule prior for the delta-lognormal distribution, which is defined as Subsequently, the joint posterior density is given by wherem ¼ 1 In addition, the posterior density of d′ becomes and the posterior distribution of s 2 can be expressed as Next, the confidence limit of h is constructed using d′ | x and s 2 | x given by Eqs. (18) and (19), respectively. Therefore, the 100 1 À a ð Þ% equitailed confidence interval and HPD interval for the coefficient of variation based on the Jeffreys' Rule prior Bayesian are obtained by where L B:jrule h and U B:jrule h are the lower and upper bounds of the confidence limit, respectively.

The Bayesian confidence interval using the uniform prior
The prior probability of the uniform prior is a constant function (Stone, 2013). This means that the uniform prior gives equally likely a priori to all possible values (O'Reilly & Mars, 2015). The uniform prior for the binomial proportion is Rao, 2017), and that of a delta-lognormal distribution is p d 0 ; s 2 ð Þ/1. The joint posterior density function can be expressed as  (21), the respective posterior distributions of d′ and s 2 are formed as and which are beta and inverse gamma distributions, respectively. From Eqs. (22) and (23), the confidence limit for h can be established, and consequently, the 100 1 À a ð Þ% equitailed confidence interval and HPD interval for the coefficient of variation based on the uniform prior Bayesian are as follows: where L B:uni h and U B:uni h are the lower and upper bounds of the confidence limit, respectively.

The FGCI for a single coefficient of variation
The fiducial approach was first introduced by Fisher (1930) and Algorithm 1 Step 1: Generate x i , i = 1, 2, ..., n from a delta-lognormal distribution.
Step 6: Repeat Steps 3-5 5,000 times and obtain an array of g.
Step 8: Repeat Steps 1-7 15,000 times to compute the coverage probability and the expected length. where U $ x 2 n 1 ð Þ À1 . Subsequently, the generalized fiducial quantity for h is Therefore, the 100 1 À a ð Þ% generalized fiducial quantity interval for the coefficient of variation is defined by where R h a=2 ð Þ and R h 1 À a=2 ð Þare the 100 a=2 ð Þ-th and 100 1 À a=2 ð Þ-th percentiles of the distribution of R h , respectively.

RESULTS
To evaluate the performance of the proposed methods, their coverage probabilities and expected lengths were estimated via Monte Carlo simulation using the R statistical programming language (Venables & Smith, 2009). Normally, the best confidence intervals are chosen from the coverage probability that is greater than or closest to the nominal confidence level and has the shortest expected lengths. In the simulation study, sample size n was set as 25, 50, 100, 200; m as 0; d as 0.2, 0.5, 0.8, 0.9; and s 2 as 0.1, 0.5, 1.0, 2.0. We eliminated the case of n = 25, d = 0.2 and s 2 = 0.1, 0.5, 1.0, 2.0 because the expected non-zero observations were less than 10 (see Fletcher, 2008;Wu & Hsieh, 2014). For all of the simulations, the number of replications was set as 15,000, and 5,000 repetitions were used for the Bayesian and FGCI methods; the nominal confidence level was 0.95.
The results in Table 1 show that the Bayesian method using the independent Jeffreys' prior for the equitailed confidence interval outperformed the others because the coverage probabilities were consistently greater than or close to the target in all cases. In addition, for the equitailed confidence intervals, the coverage probabilities of the Bayesian using the Jeffreys' Rule prior were less than the nominal confidence level of 0.95 for some of the cases: n = 25, d = 0.5, s 2 = 0.1, 2.0; n = 50, 100, d = 0.2, s 2 = 0.1, 2.0; and n = 200, d = 0.2, s 2 = 0.1. For the Bayesian method using the uniform prior, the coverage probabilities were close to 1 in a few cases when the sample sizes were less than 100 and had small variances together with high proportion of non-zero values. For the method with HPD intervals, the coverage probabilities of the independent Jeffreys' prior Algorithm 2 Step 1: Generate x i , i = 1, 2, ..., n from a delta-lognormal distribution.
Step 5: Repeat Steps 3-4 5,000 times and obtain an array of R g .
Step 7: Repeat Steps 1-6 15,000 times to compute the coverage probability and the expected length.  did not cover the target in most cases, especially for large sample sizes. Similarly, a few cases with the Bayesian method using the uniform prior had coverage probabilities less than the nominal confidence level when the sample sizes were large. Moreover, the Bayesian method using the Jeffreys' Rule prior had coverage probabilities of less than 0.95 in almost all cases. Last, the coverage probabilities with FGCI did not cover the nominal confidence level when the variances were small for all sample sizes. In addition, when considering the expected lengths of all methods which is shown in Table 2, these were wide in cases of s 2 = 2.0 and became narrower when the sample size increased, although they corresponded with the coverage probabilities in almost all cases. Furthermore, the values were similar for all of the methods. Moreover, the expected lengths of the interval when n = 50, d = 0.2, and s 2 = 2.0 were very much larger than the other cases because the number of expected non-zero observations was small together with a large variance. This case might  have affected the parameter estimation, thus it is possible that the efficacy of the confidence intervals constructed from it was not very good.
An empirical study Ananthakrishnan & Soman (1989) studied a daily rainfall data series focusing on the normalized rainfall curve (NRC). They found that the NRC is uniquely determined by the coefficient of variation of the rainfall series. To verify the effectiveness of the proposed confidence intervals, we used two examples of rainfall datasets from Nan province, Thailand as follows.

Example 1
The rainfall data was collected in July 2015 for national parks in Nan province, Thailand: Doi Phu Kha, Mae Charim, Nanthaburi, Tham Sa Koen, Sri Nan, Khun Sathan, and Doi Pha Klong recorded by the Protected Area Regional Office 13 Phrae, Thailand. For this data series, there were 217 rainfall measurements, of which 117 were positive, showing a right-skewed distribution. The density of this data is presented in Fig. 1. Next, the minimum Akaike information criterion (AIC) was first to test the distribution of the positive rainfall data. The results in Table 3 reveal that the AIC value of the lognormal distribution was smallest, thus the distribution of this positive data series was the lognormal distribution. To validate the AIC test, a normal Q-Q plot for log-transformation data series is shown in Fig. 2. The distribution of zero values in this rainfall series coincided with the method mentioned in the "Methods" section for a binomial distribution. Therefore, a delta-lognormal distribution was appropriate for these data. Next, summary statistics were computed: n = 217,d ¼ 0:5392,m ¼ 2:4762,ŝ 2 ¼ 0:9381, and CV = 1.9337. Finally, the 95% confidence intervals for h were calculated, as reported in Table 4. These results correspond with those from the simulation study when the sample size was large in that the coverage probabilities of the Bayesian methods (equitailed confidence intervals) were greater than the target. This indicates that the Bayesian method using the Jeffreys' Rule prior is appropriate to construct a confidence interval for this rainfall data due to it having the shortest expected length compared to the other methods. The estimated coefficient of variation in Table 4 means that the variability of the rainfall was rather high. This indicates   Fig. 3. The positive values follow a lognormal distribution, as indicated by the   Table 5 and a normal Q-Q plot of the log-transformed data displayed in Fig. 4. In addition, the zero values have a binomial distribution (as discussed by Aitchison (1955)), thus the overall distribution is delta-lognormal. The summary statistics were n = 248,d ¼ 0:6331,m ¼ 1:5822,ŝ 2 ¼ 2:2598, and CV = 3.7595. The results in Table 6 report the 95% confidence intervals for h. The results of the methods to construct the confidence intervals are in accordance with those in the simulation study for the case of a large sample size. The Bayesian method based on the Jeffreys' Rule prior (equitailed confidence intervals) had the shortest expected length. The coefficient of variation estimation in Table 6 indicates that the rainfall of this area was highly volatile, which affected the water level of the Nan River. Moreover, there might have been flooding in some of the areas due to high rainfall.

DISCUSSION
Our findings reveal that the Bayesian method using the independent Jeffreys' prior to construct the equitailed confidence intervals performed well for all cases due to the coverage probabilities being consistently greater than or close to the nominal confidence level while the expected lengths were mostly no different from the other methods.  Moreover, underestimation occurred for a few of the cases when applying the Bayesian methods based on the Jeffreys' Rule prior (equitailed), the independence Jeffreys' prior (HPD), and the uniform prior (HPD), and it appeared in almost all cases of the Jeffreys' Rule prior (HPD). In contrast, overestimation occurred in a few cases of applying the Bayesian method based on the uniform prior (equitailed) when the sample size was less than 100 together with a small variance and high proportion of non-zero values.

CONCLUSIONS
We proposed the construction of confidence intervals for a single coefficient of variation of a delta-lognormal distribution using Bayesian methods and compared them with FGCI.  The Bayesian methods, which are based on the independent Jeffreys' prior, the Jeffreys' Rule prior, and the uniform prior, were constructed under equitailed confidence intervals or HPD intervals. The performance of the confidence intervals was assessed using the coverage probability and expected length through Monte Carlo simulations.
The simulation studies showed that the Bayesian equitailed confidence intervals based on the independent Jeffreys' prior is recommended as a confidence interval for a single coefficient of variation. Future researchers may also be extended to the case of the coefficients of variation function.