A Bayesian approach to construct confidence intervals for comparing the rainfall dispersion in Thailand

Natural disasters such as drought and flooding are the consequence of severe rainfall fluctuation, and rainfall amount data often contain both zero and positive observations, thus making them fit a delta-lognormal distribution. By way of comparison, rainfall dispersion may not be similar in enclosed regions if the topography and the drainage basin are different, so it can be evaluated by the ratio of variances. To estimate this, credible intervals using the highest posterior density based on the normal-gamma prior (HPD-NG) and the method of variance estimates recovery (MOVER) for the ratio of delta-lognormal variances are proposed. Monte Carlo simulation was used to assess the performance of the proposed methods in terms of coverage probability and relative average length. The results of the study reveal that HPD-NG performed very well and was able to meet the requirements in various situations, even with a large difference between the proportions of zeros. However, MOVER is the recommended method for equal small sample sizes. Natural rainfall datasets for the northern and northeastern regions of Thailand are used to illustrate the practical use of the proposed credible intervals.


INTRODUCTION
Natural phenomena can often be random events in statistics, and natural rainfall is an important one because it is directly related to the quality of life, agriculture, economic growth, and industry, among others. Rice is the main export product from Thai agriculture, so natural rainfall is a crucial water resource for farmers. However, climate change has incurred prolonged droughts, thereby decreasing agricultural and fishery yields, and conversely, caused violent flooding and health-related issues in Thailand (Marks, 2011). Farmers who consume approximately 70% of the country's water supply are at the forefront of these impacts due to changes in rainfall amount (Marks, 2011). Thus, extreme rainfall variation is one of the main consequences of climate change and can result in drought or flooding, and ineffective water management can exacerbate both situations (Duangdai & Likasiri, 2017).
Thailand is divided into five regions: northern, northeastern, central, eastern, and southern. The north is mountainous and contains the sources of several important rivers, including the Mekong Basin (a principal river system of Thailand). Nan is one of the provinces in the northern region faced with heavy rainfall for around two weeks which caused landslides and flooding during the rainy season (mid-May to mid-October) (Hariraksapitak, Sriring & Navaratnam, 2018). Nicely & Counselor (2018) reported that approximately 60% of the northeastern region's arable land has been turned over to total rice cultivation. Meanwhile, 5-10% of precipitation above the normal level occurred during May-July 2018 in the northern and northeastern regions, as reported by the Thai Meteorological Department (Nicely & Counselor, 2018). These situations have led us to consider estimating the rainfall fluctuation between the northern and northeastern regions of Thailand in terms of their ratio of variances during July 2018. It is of major importance to estimate the dispersion in weekly rainfall amount between the northern and northeastern regions in Thailand for the benefit of key industries, and the information gained could help in realizing climate change awareness. The information could be advantageous for the Thai government and other related organizations to realize and plan for solving or reducing the risk of environmental issues if a large variation in rainfall is known in advance. The results from normality plots (Fig. 1), histograms (Fig. 2), and applying the Akaike information criterion (AIC) show that the weekly natural rainfall data for both regions fit delta-lognormal distributions. Note that the log-transformed data of the positive rainfall amounts fit a normal distribution which is symmetrical. Aitchison & Brown (1963) first introduced the delta-lognormal distribution for non-negative data containing zero values with the probability 0 < δ < 1; positive observations with the remainder of the probability 1 − δ follow a lognormal distribution and the zeros follow a binomial distribution with binomial proportion δ. The delta-lognormal distribution has been fitted for real-world examples in many research areas such as the environment (Owen & DeRouen, 1980;Hasan & Krishnamoorthy, 2018), fishery surveys (Pennington, 1983;Smith, 1988Smith, , 1990Lo, Jacobson & Squire, 1992;Fletcher, 2008;Wu & Hsieh, 2014) and medicine (Zhou & Tu, 1999, 2000Tian & Wu, 2006;Hasan & Krishnamoorthy, 2018).
In statistical inference, one of the parameters of interest is the variance (dispersion in environment) defined as the second central moment; the positive square root of the variance is called the standard deviation (Casella & Berger, 2002). To compare two independent populations, the ratio of their variances is the measurement of the variation between them with no difference resulting in a ratio equal to 1. Confidence intervals (CIs) have been applied to several distributions to estimate the variability in terms of variance and the ratio of variances. For example, Krishnamoorthy, Mathew & Ramachandran (2006) obtained CIs for lognormal variance based on the idea of the generalized confidence interval (GCI) which they applied to the variation in the air lead levels in 15 industrial facilities in a health hazard evaluation. Bebu & Mathew (2008) claimed that GCI was better than a modified signed log-likelihood ratio test for establishing CIs for the ratio of variances for bivariate lognormal distributions even when the sample sizes were small. Niwitpong (2017) showed that GCI performed well for the ratio of variances of lognormal distributions to solve the problems that occur when applying the traditional approach. Casella & Berger (2002) argued that CIs can capture the parameter of interest better than point estimates. Although a few attempts have been conducted on CIs for the ratio of variances for some distributions, research on the delta-lognormal distribution has yet to be carried out. Clearly, there is a need for research that investigates and constructs CIs for the ratio of delta-lognormal variances using the highest posterior density based on normal-gamma prior (HPD-NG) and MOVER. These proposed CIs were compared with the existing HPD-based Jeffreys' (HPD-Jef) and Jeffreys' Rule (HPD-Rul) priors of Harvey & Van der Merwe (2012), GCI of Wu & Hsieh (2014) and fiducial GCI (FGCI) of Hasan & Krishnamoorthy (2018).
The organization of this paper is as follows. The concepts of all of the CIs for the ratio of variances in delta-lognormal distributions are elaborated in "Methods". The simulation procedure and numerical results are reported in "Results." In "Discussion," the proposed methods are applied to real-world datasets to estimate the rainfall fluctuation between the northern and northeastern regions in Thailand. Last, we present a discussion and conclusions on the study outcomes.

METHODS
Let X ij = (X i1 , X i2 , …, X ini ); i = 1, 2 and j = 1, 2, …, n i be non-negative random samples draw from a delta-lognormal distribution with parameters m i , σ 2 i and δ i , stand for The maximum likelihood estimates of δ i , m i and σ 2 P n ið1Þ j¼1 y ij , and s 2 i ¼ 1 n ið1Þ P n ið1Þ j¼1 y ij Àm i À Á 2 , respectively. From Eq. (1), the ratio in delta-lognormal variances is obtained as where ω i is log-transformed as ln The methods for constructing CIs for θ are described as seen below.

GCI
On the ideas of GCI, the generalized pivotal quantity (GPQ) is necessary to satisfy the two requirements of Weerahandi (1993). The GPQ of δ i based on VST was proposed by Wu & Hsieh (2014) as 1Þ. The GPQs of m i and σ 2 i are which are presented by Krishnamoorthy & Mathew (2003). These GPQs lead to obtain the GPQ of θ as becomes the 100(1− ζ)%GCI for θ. Algorithm 1 shows the steps to compute GCI as seen above. (1) Generate W i ∼ N(0, 1), Z i ∼ N(0, 1) and U i $ x 2 n ið1Þ À1 . (2) Compute R mi , R s 2 i , and R δi . (3) Compute R ln ωi .

HPD credible interval
The HPD credible interval is a parameter estimate based on the posterior probability in Bayesian framework. Box & Tiao (1973) defined the HPD ideas consisting of two requirements: the set of value that contain 100(1 − ζ)% of the posterior distribution and the property that the density within the region is equal or greater than outside. The posterior density is unimodal and symmetric, the region based on their definition becomes to a equal-sided CI (ζ/2 and 1 − ζ/2 percentiles of the posterior density). This is clearly different between equal-sided CI and HPD credible interval if there is a highly skewed in the posterior density. Recently, HPDs based on beta and uniform priors were recommended to constructed for single mean and the difference between two delta-lognormal means proposed by Maneerat, Niwitpong & Niwitpong (2019). The HPD credible interval is then focused on our study. Recall that X ij > 0 be random variables from lognormal i . For X ij = 0, the number of zero values n i(0) be random sample of binomial distribution, denoted as B(n i ,δ i ). The likelihood function of X ij is Algorithm 3. ( (3) Compute CI ln ωi and CI θ .

which leads to obtain the Fisher information matrix of
Here θ = ln ω 1 − ln ω 2 so that the HPD credible intervals based on different priors for θ are described.

Jeffreys' prior
The Jeffreys' prior for ω is defined as where PðkÞ J / ffiffiffiffiffiffiffiffiffiffi ffi Iðs 2 Þ p and PðdÞ J / ffiffiffiffiffiffiffiffi IðdÞ p . The prior Eq. (17) is combined with its likelihood Eq. (16) to obtain the posterior distributions of each parameters σ 2 i , m i , and δ i . Firstly, the posterior density of ω is using Bayes' theorem (Casella & Berger, 2002), given by Obtain that which is the posterior density of σ 2 |x, as inverted gamma distribution, denoted as IG(a i , b i ); a i = r i /2 and b i ¼ r iŝ 2 i =2; r i = n i(1) − 1. Given σ 2 i and x, one obtains that This is the posterior density of m i |σ 2 i , x, as normal Nðm i ; s 2 i =n ið1Þ Þ. For δ i , its posterior was proved as which is the beta distribution with c i and d i , denoted as beta(c i , d i ); c i = n i(0) + 1/2 and d i = n i(1) + 1/2.

Jeffreys' Rule prior
According to Harvey & Van der Merwe (2012), the difference between Jeffreys and Jeffreys' Rule priors is the posterior densities of σ 2 i and δ i , while Jeffreys' Rule prior is defined as To obtain the joint posterior of ω denoted as P(ω|x) JR , the Jeffreys' Rule prior Eq. (21) is combined with the likelihood Eq. (16). For σ 2 i , its posterior has been changed to inverted gamma distribution with shape parameter s i /2 and scale parameter s iŝ 2 i =2; s i = n i(1) + 1. Then, the posterior distribution of m i given σ 2 i and x is changed because it depends on the posterior density of σ 2 i . Likewise, the posterior of δ i becomes P(δ|x) JR = beta(n i(0) + 1/2, n i(1) + 3/2). Their posterior distributions represent to estimate own parameter, meanwhile Pðs 2 i jx ij Þ JR , P(m i |σ 2 i , x ij ) JR and P(δ i |x ij ) JR are independent. As a result, HPD credible interval for θ is computed based on Jeffreys' Rule prior. The steps for establishing HPD-intervals based on two priors are detailed as Algorithm 4.

Normal-gamma prior
DeGroot (1970) defined the conjugate families for random sample of normal distribution. This is a necessary to develop credible interval based on Bayesian approach for the delta-lognormal variance. Using Theorem 1 of the conjugate families for random sample of normal distribution (DeGroot, 1970), assume that (Y = ln X ij ; i = 1, 2 j = 1, 2, …, n i(1) ) be a random sample from normal distribution with the mean m = (m 1 , m 2 ) and precision λ = (λ 1 , λ 2 ); λ i = 1/σ 2 i where X ∼ LN(m, λ). The normal-gamma prior of τ = (m, λ)′ is defined the marginal distributions between normal m i |λ i ∼ N(m, [k i λ i ] − 1 ) and gamma λ i ∼ G(a i , β i ), given by (1) Generate σ 2 * i denoted as the posterior distribution of σ 2 i based on priors: (1) ).

RESULTS
The CIs proposed in this study are HPD-NG and MOVER. The former develops the NG prior to obtain the posterior density, while the latter is an extension of Hasan and Krishnamoorthy's MOVER (Hasan & Krishnamoorthy, 2018). Both were compared with the existing CIs HPD-Jef and HPD-Rul of Harvey & Van der Merwe (2012), GCI of Wu & Hsieh (2014), and FGCI of Hasan & Krishnamoorthy (2018). Two simulation studies were conducted to show the aforementioned CI performances under equal and unequal sample sizes in different situations: S1. The probability of additional zero δ i is varied while m i = 3 and σ 2 i = 1. S2. The mean m i and δ i are varied while σ 2 i = 1. S1 was designed and simulated to be consistent with the weekly rainfall datasets, as can be seen in the next section. The second one was constructed to indicate CI performance when both the mean m i and δ i are changed, i.e. whether the numerical simulation is consistent with S1.
The coverage probability (CP) and relative average length (RAL, denoted as the ratio of the average lengths of a proposed CI to HPD-Rul) were used to assess the performances of the CIs using Monte Carlo simulations. For 5,000 simulation runs, GPQs and FGPQs were fixed at 2,500 at a nominal level of 0.95. In a comparison of the methods, the following criteria were used to judge the best-performing CI: a CP closer to or greater than the nominal level and the narrowest RAL of less than 1 and minimal. The steps of the simulation procedure are given in Algorithm 5.
The findings are based on the simulation work as follows. For the ratio of variances, the numerical evaluation shows that for the method in the S1 scenario, MOVER provided good performance for a small difference between δ i with equal small sample sizes (Table 1). Importantly, HPD-NG had good coverage (a CP greater than 0.95) and the shortest CIs for both equal and unequal medium sample sizes as well as a large difference in δ i for unequal large sample sizes. Meanwhile, HPD-Rul's performance satisfied the criteria for a Algorithm 5.
(5) Compute CPs and RALs with all CIs. small difference in δ i and unequal large sample sizes. The results in Table 2 for case S2 indicate that for a small difference in δ i , MOVER could meet the requirements for equal small sample sizes, while HPD-NG maintained the given target for both equal and unequal medium sample sizes, and unequal large sample sizes. Moreover, HPD-Rul performed well for a large difference in δ i and unequal medium-to-large sample sizes. From the evidence of both situations, HPD-Jef gave the best CP in all cases even though its average lengths were mostly wider than HPD-NG. Meanwhile, GCI and FGCI performances provided average lengths that were broader than other methods in all situations.
An empirical application Duangdai & Likasiri (2017) predicted the global temperature, and forest and seasonal rainfall amount in northern Thailand by various mathematical models. Their results indicate that during 1973-2008, the rainfall fluctuation during the rainy season was less than the summer rainfall, although the forest cover was higher in the summer than in the rainy season. Approximately 60% of the total arable land in the northeast has been turned over to rice cultivation (Nicely & Counselor, 2018). Importantly, both the northern and northeastern regions of Thailand are important agricultural areas where the planted rice is rain-fed and is the most valuable crop in the Lower Mekong Basin (Zhang et al., 2014). These findings led us to focus on the rainfall amounts in the northern and northeastern areas in Thailand, especially regarding rainfall variation in the rainy season. Importantly, a comparison of the rainfall in the northern and northeastern regions in terms of the ratio of variances was investigated and estimated with our proposed CIs. There were 272 substations in total for both regions to record rainfall measurements by the Thai Meteorological Center. The 272 rainfall observed values contained 40 of 62 (64.52%) and 145 of 210 (69.05%) positive records in the north (62 substations) and northeast (210 substations) areas during 2-8 July 2018, respectively. The remainder were zero observations for both sets.
As mentioned previously, the results can be interpreted as the rainfall variability in the northern region being less than the northeastern one during 2-8 July 2018, which implies that growing rice in the northeastern region was probably more affected than in the northern region because the former's rain variation was larger. It is possible that this information could influence approximately 60% of the total arable area in the northeast. Note that Nicely & Counselor (2018) estimated 70% of rice production during the marketing year 2018-2019 cultivated under desirable weather conditions. This analysis might provide useful information to the Royal Thai Government to carry out effective water management. Our findings made a few realizations about natural disasters due to climate change during the rainy season in the northern and northeastern Thailand as well.
The results are also in agreement with the simulation study ones in Table 1.

DISCUSSION
The HPD-NG was proposed for constructing confidence intervals for comparing the rainfall dispersion in north and northeast regions. How to select the prior in this situation is that we found the CP performance of HPD depend on the posterior densities of σ 2 and δ in the previous study. To our knowledge, we believed that m and σ 2 might be suitable for random variables of the normal and gamma distributions, respectively. This becomes the normal-gamma prior of (m,σ 2 ). For δ, it was motivated by Jeffreys' prior, while a beta distribution was also developed and recommended by Jin, Thulin & Larsson (2017) so that beta (d,d) becomes a prior of δ in this study.  The difference between HPD-Jef and HPD-Rul is the posterior densities of σ 2 and δ, although it can be seen from the results of the simulation study that HPD-Rul's outcomes were agreement with Harvey & Van der Merwe (2012) when focusing on the ratio of delta-lognormal means. This implies that both HPD performances are dependent on the posterior of σ 2 and δ. For the proposed HPD-NG, its prior of τ = (m,λ); λ = 1/σ 2 is the inverse of the Jeffreys' prior, leading us to obtain the posterior distributions of m and σ 2 derived from the NG prior under the assumption that the mean m has a normal distribution and the precision λ has a gamma distribution. After that, the posterior of σ 2 has a inverse-gamma distribution with its parameters (a i,ni(1) ,β i,ni(1) ). Importantly, the parameter β i,ni(1) is different from the inverse-gamma based one on Jeffreys' prior, resulting in a different posterior of m as well. Likewise, the prior of δ was found and recommended by Jin, Thulin & Larsson (2017). For these reasons, the HPD-NG prior was developed to estimate the ratio between delta-lognormal variances. However, more research on this topic needs to be conduct to find an appropriate prior to obtain a better performance. Furthermore, MOVER could maintain its performance to satisfy the target criteria even with a small difference between the binomial proportions and small sample sizes. It is possible that an interval estimate for δ based on Wilson's interval satisfies the criteria for small to moderate sample sizes, as confirmed by Donner & Zou (2011).

CONCLUSIONS
The purpose of this study was to develop CIs, namely HPD-NG and MOVER, for the ratio of delta-lognormal variances through Monte Carlo simulations. By way of comparison, both were examined to report their performance with the existing methods: HPD-Jef, HPD-Rul, GCI and FGCI. These proposed CIs were applied to estimate and compare the rainfall fluctuation between the northern and northeastern regions in Thailand in terms of the ratio of variances. The findings of this study indicate that HPD-NG is the best and most recommended CI in situations S1 (with varied δ) for both equal and unequal medium sample sizes, and for a  large difference between the binomial proportions for unequal large sample sizes. Both MOVER and HPD-Rul are the next best CIs recommended for a small difference between the binomial proportions with different sample sizes: MOVER for equal small sample sizes and HPD-Rul for unequal large sample sizes. For situation S2 where δ and m are both varied, HPD-NG delivered the best performance for a small binomial proportion and both equal and unequal medium sample sizes as well as a small proportion of zeros for unequal large sample sizes. MOVER is also recommended for situations similar to situation S1, while HPD-Rul is recommended for a large proportion of zeros and unequal medium-tolarge sample sizes. Although the CPs of HPD-Jef, GCI and FGCI results satisfied the criteria, their average lengths were wider than our proposed methods in all situations. Note: In this paper, confidence intervals for the ratio variances of delta-lognormal are proposed, whereas, other submission proposed CIs for the difference between variances of delta-lognormal populations based on Herbert et al. (2011) andKrishnamoorthy, Lian &Mondal (2010). Although, the difference between them is that the former was considered to estimate the ratio of two delta-lognormal variances (v 1 /v 2 ) using HPD-based normal gamma prior. The latter was presented HPDs based on other priors for the difference in delta-lognormal variances (v 1v 2 ) for large variational situations. According to Herbert et al. (2011) and Krishnamoorthy, Lian & Mondal (2010), the assessment of the range of distribution of (v 1v 2 ) can be estimated using CIs for (v 1v 2 ) such that it can be implied that there is the distributed range between (v 1v 2 ) and (v 1 /v 2 ). Finally, both of random variables are used to check significantly difference in the two independent datasets. The next contrast between this submission and other one is the illustrated data that there are different dispersions and locations in each paper.

ADDITIONAL INFORMATION AND DECLARATIONS
Funding