Estimation of the percentile of Birnbaum-Saunders distribution and its application to PM2.5 in Northern Thailand

Warisa Thangjai; Sa-Aat Niwitpong; Suparat Niwitpong

doi:10.7717/peerj.17019

Estimation of the percentile of Birnbaum-Saunders distribution and its application to PM2.5 in Northern Thailand

Warisa Thangjai¹, Sa-Aat Niwitpong ², Suparat Niwitpong²

1Department of Statistics, Ramkhamhaeng University, Bangkok, Thailand

2Department of Applied Statistics, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand

DOI: 10.7717/peerj.17019

Published: 2024-02-29
Accepted: 2024-02-06
Received: 2023-11-01

Academic Editor: Maria Gavrilescu

Subject Areas: Statistics, Computational Science, Ecotoxicology, Environmental Contamination and Remediation, Environmental Impacts
Keywords: Bayesian approach, Birnbaum-Saunders distribution, Bootstrap approach, Generalized confidence interval approach, Percentile

Copyright: © 2024 Thangjai et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Thangjai W, Niwitpong S, Niwitpong S. 2024. Estimation of the percentile of Birnbaum-Saunders distribution and its application to PM2.5 in Northern Thailand. PeerJ 12:e17019 https://doi.org/10.7717/peerj.17019

The authors have chosen to make the review history of this article public.

Abstract

The Birnbaum-Saunders distribution plays a crucial role in statistical analysis, serving as a model for failure time distribution in engineering and the distribution of particulate matter 2.5 (PM2.5) in environmental sciences. When assessing the health risks linked to PM2.5, it is crucial to give significant weight to percentile values, particularly focusing on lower percentiles, as they offer a more precise depiction of exposure levels and potential health hazards for the population. Mean and variance metrics may not fully encapsulate the comprehensive spectrum of risks connected to PM2.5 exposure. Various approaches, including the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, were employed to establish confidence intervals for the percentile of the Birnbaum-Saunders distribution. To assess the performance of these intervals, Monte Carlo simulations were conducted, evaluating them based on coverage probability and average length. The results demonstrate that the GCI approach is a favorable choice for estimating percentile confidence intervals. In conclusion, this article presents the results of the simulation study and showcases the practical application of these findings in the field of environmental sciences.

Introduction

PM2.5, also known as particulate matter 2.5, refers to tiny particles or droplets suspended in the air with a size not exceeding 2.5 micrometers. These minute particles fall under the category of airborne pollutants and can be composed of a variety of materials, including dust, soil, soot, smoke, and liquid droplets (Thangjai & Niwitpong, 2020). Due to their diminutive size, PM2.5 particles can be readily inhaled into the respiratory system, penetrating deeply. The presence of PM2.5 in the atmosphere raises notable environmental and health concerns, given its potential adverse impact on human health. Long-term exposure to PM2.5 is associated with a spectrum of health problems, including respiratory issues, cardiovascular ailments, and even premature mortality. Short-term exposure to elevated PM2.5 levels can exacerbate conditions such as asthma and bronchitis. The monitoring of PM2.5 levels is a routine practice in assessing air quality, and numerous countries have established air quality standards and regulations to limit the concentration of these fine particles in the air to protect public health. Residents in regions with heightened levels of PM2.5 are frequently advised to take precautions, like staying indoors during periods of poor air quality and using air purifiers to reduce their exposure. Thailand has encountered air quality challenges related to PM2.5, especially during the dry season when activities like agricultural and forest burning contribute to increased levels of particulate matter in the air. Cities like Bangkok, Chiang Mai, Mae Hong Son, and Lampang provinces have experienced periods of subpar air quality due to PM2.5 pollution, prompting health advisories and government interventions to mitigate its impact. Thailand has initiated a variety of measures to combat air pollution, including the implementation of regulations, public awareness campaigns, and steps to reduce pollution sources, such as curbing open burning and enforcing vehicle emission standards. Much like many other governments globally, the Thai government has been actively working to enhance air quality and reduce the health risks associated with PM2.5 exposure. Many researchers have studied PM2.5 air pollution, including Broomandi et al. (2021) and Galán-Madruga et al. (2023).

The Birnbaum-Saunders distribution has become well-known in the fields of life testing and engineering. Its initial application was to model the time until failure, a consequence of the growth of a dominant crack under cyclic stress, where failure occurs once it surpasses a specified threshold (Birnbaum & Saunders, 1969a). Additionally, Bhattacharyya & Fries (1982) illustrated that the Birnbaum-Saunders distribution can function as an approximation of the inverse Gaussian distribution. Desmond (1986) also proposed that this distribution can be considered a balanced combination of the inverse Gaussian distribution and its reciprocal. In the context of statistical inference, Birnbaum & Saunders (1969b) introduced maximum likelihood estimation for the distribution’s parameters. Ng, Kundu & Balakrishnan (2003) put forward a modified moment estimation method for these parameters, along with a technique to mitigate bias in both the maximum likelihood estimator and the modified moment estimator. Wu & Wong (2004) presented approximated confidence intervals for the Birnbaum-Saunders distribution using higher-order likelihood asymptotic procedures. Furthermore, Li & Xu (2016) carried out a comparative study involving the fiducial estimator, maximum likelihood estimator, and Bayesian estimator for the distribution’s unknown parameters. Guo et al. (2017) developed approaches for interval estimation and hypothesis testing pertaining to the common mean of multiple Birnbaum-Saunders populations, utilizing a hybrid methodology that combines generalized inference and large sample theory. Jayalath (2021) applied a flexible Gibbs sampler to draw inferences about the two-parameter Birnbaum-Saunders distribution when dealing with right-censored data. Lastly, Puggard, Niwitpong & Niwitpong (2022) introduced novel techniques for estimating confidence intervals for both the variance and the difference in variances of Birnbaum-Saunders distributions, using PM2.5 data. Based on their work, we subsequently developed confidence intervals for the percentiles of Birnbaum-Saunders distributions, also leveraging PM2.5 data.

In the realm of statistical analysis, while mean and variance are frequently utilized measures, in practical scenarios, percentiles assume greater significance than mean or variance. Percentiles are a commonly used statistical concept. They help determine where an observation stands relative to a specified percentage of values below it, and they serve as indicators of both central tendency and variability. Percentiles are a statistical tool employed to elucidate the relative position of a specific data point within a dataset or a probability distribution. They reveal the percentage of data points that reside either below or above a designated value in the dataset. This methodology is often applied to gain insights into data distributions and to gauge where a particular data point stands in relation to others. The 25th percentile, also known as the first quartile, designates the threshold below which 25% of the data points are situated. It serves as the lower demarcation of the initial quarter of the dataset. The 50th percentile, commonly referred to as the median, is the pivotal point that effectively bisects the dataset into two equal halves, with precisely half of the data points positioned below it and half above it. There is a necessity to compare two distributions, and the researcher will need a specific parameter for the purpose of comparison. Often, researchers commonly select the mean as their primary reference point, presuming it to be the most dependable parameter for characterizing the population. Nonetheless, this preference is not universally applicable, and there are situations where the median emerges as a more reliable reference, especially when dealing with a strongly skewed distribution (Price & Bonett, 2002). Meanwhile, the 75th percentile, or the third quartile, signifies the boundary beneath which 75% of the data points are positioned. It denotes the upper threshold of the initial three-quarters of the dataset. Percentiles are invaluable for comprehending the distribution of data, identifying atypical data points, and facilitating comparative analyses. They are widely used in various fields, including education, where they aid in grading and ranking students, healthcare, for evaluating growth and health metrics, and finance, where they are instrumental in scrutinizing investment returns and associated risks. In situations where PM2.5 levels are high and pose a health threat, percentiles offer a more appropriate option than utilizing the mean and variance. Several researchers have delved into the realm of statistical inference pertaining to percentiles and quartiles. These scholars include Marshall & Walsh (1950), Harrell & Davis (1982), Kaigh & Lachenbruch (1982), Albers & Löhnberg (1984), Cox & Jaber (1985), Chang & Tang (1994), Padgett & Tomlinson (2003), Guo & Krishnamoorthy (2005), Huang & Johnson (2006), Navruz & Özdemir (2018), Hasan & Krishnamoorthy (2018), as well as Abdollahnezhad & Jafari (2018).

A multitude of scholars have delved into statistical investigations regarding the Birnbaum-Saunders distribution. Prominent contributors in this domain comprise Birnbaum & Saunders (1969b), Engelhardt, Bain & Wright (1981), Achcar (1993), Lu & Chang (1997), Ng, Kundu & Balakrishnan (2003), Wu & Wong (2004), Leiva et al. (2008), Wang (2012), Niu et al. (2014), Wang, Sun & Park (2016), and Guo et al. (2017). The objective of this article is to present confidence intervals for percentiles within the Birnbaum-Saunders distribution. Four distinct methods, namely the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, are employed to compute interval estimations for percentiles within the population. These methods make use of simulated data to establish the confidence intervals. To enhance their practical utility, a computer program has been created in the R programming language to compute coverage probability and average interval length. The article includes a numerical example to demonstrate the application of this program.

Methods

Let $X = (X_{1}, X_{2}, . . ., X_{n})$ be a random variable of size $n$ drawn from a Birnbaum-Saunders distribution. The probability density function is defined by

(1) $f (x; α, β) = \frac{1}{2 α β \sqrt{2 π}} ({(\frac{β}{x})}^{1 / 2} + {(\frac{β}{x})}^{3 / 2}) \exp (- \frac{1}{2 α^{2}} (\frac{x}{β} + \frac{β}{x} - 2)); x > 0, α, β > 0,$ where $α$ is the shape parameter and $β$ is the scale parameter.

According to Chang & Tang (1994) and Padgett & Tomlinson (2003), the percentile of X is given by

(2) $θ = \frac{β}{4} {(α z_{p} + \sqrt{α^{2} z_{p}^{2} + 4})}^{2},$ where $z_{p} = Φ^{- 1} (p)$ is the standard normal $p$ -th quantile.

Generalized confidence interval approach

Following Puggard, Niwitpong & Niwitpong (2022), the GPQ for $β$ is given by

(3) $R_{β} = {\begin{matrix} max (β_{1}, β_{2}); & T \leq 0 \\ min (β_{1}, β_{2}); & T > 0 \end{matrix},$ where $β_{1}$ and $β_{2}$ are two solutions for $β$ can be derived by solving $A β^{2} - 2 B β + C = 0$ , $A = (n - 1) J^{2} - \frac{1}{n} L T^{2}$ , $B = (n - 1) I J - (1 - I J) T^{2}$ , $C = (n - 1) I^{2} - \frac{1}{n} K T^{2}$ , $I = \frac{1}{n} \sum_{i = 1}^{n} \sqrt{X_{i}}$ , $J = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{\sqrt{X_{i}}}$ , $K = \sum_{i = 1}^{n} {(\sqrt{X_{i}} - I)}^{2}$ , $L = \sum_{i = 1}^{n} {(\frac{1}{\sqrt{X_{i}}} - J)}^{2}$ , and $T \sim t (n - 1)$ .

Suppose that $S_{1} = \sum_{i = 1}^{n} X_{i}$ , $S_{2} = \sum_{i = 1}^{n} \frac{1}{X_{i}}$ , and $V \sim χ^{2} (n)$ . The GPQ for $α$ is given by

(4) $R_{α} = \sqrt{\frac{s_{2} {(R_{β})}^{2} - 2 n R_{β} + s_{1}}{R_{β} V}},$ where $R_{β}$ is defined in Eq. (3) and $s_{1}$ and $s_{2}$ are the observed values of $S_{1}$ and $S_{2}$ , respectively.

The GPQ for $θ$ is obtained by

(5) $R_{θ} = \frac{R_{β}}{4} {(R_{α} z_{p} + \sqrt{{(R_{α})}^{2} z_{p}^{2} + 4})}^{2},$ where $z_{p} = Φ^{- 1} (p)$ is the standard normal $p$ -th quantile, $R_{β}$ is defined in Eq. (3), and $R_{α}$ is defined in Eq. (4).

Therefore, the $100 (1 - γ) %$ two-sided confidence interval for the percentile of Birnbaum-Saunders distribution using the GCI approach is given by

(6) $C I_{θ . G C I} = [L_{θ . G C I}, U_{θ . G C I}] = [R_{θ} (γ / 2), R_{θ} (1 - γ / 2)],$ where $R_{θ} (γ / 2)$ and $R_{θ} (1 - γ / 2)$ denote the $100 (γ / 2)$ -th and $100 (1 - γ / 2)$ -th percentiles of $R_{θ}$ , respectively.

Algorithm 1 was used to construct the GCI for the percentile of Birnbaum-Saunders distribution.

Step 1: Generate sample from the Birnbaum-Saunders distribution

Step 2: Compute A, B, C,

S_{1}

S_{2}

Step 3: At the

m

step

(a) Simulate

T ~ t (n - 1)

and compute

R_{β}

using Eq. (3)

(b) If

R_{β} < 0

, resimulate

T \sim t (n - 1)

V \sim χ^{2} (n)

and compute

R_{α}

using Eq. (4)

(d) Compute

R_{θ}

using Eq. (5)

Step 4: Repeat step 3, a total M times and obtain an array of

R_{θ}

’s

Step 5: Compute

L_{θ . G C I} = R_{θ} (γ / 2)

and

U_{θ . G C I} = R_{θ} (1 - γ / 2)

DOI: 10.7717/peerj.17019/table-6

Bootstrap approach

Suppose that $x = (x_{1}, x_{2}, . . ., x_{n})$ is a random sample drawn from Birnbaum-Saunders distribution with shape parameter $α$ and scale parameter $β$ . Let $\hat{α}$ and $\hat{β}$ be the maximum likelihood estimators of $α$ and $β$ , respectively. Let $x^{*} = (x_{1}^{*}, x_{2}^{*}, . . ., x_{n}^{*})$ be a bootstrap sample drawn from Birnbaum-Saunders distribution with $\hat{α}$ and $\hat{β}$ . Hence, ${\hat{α}}^{*}$ and ${\hat{β}}^{*}$ are acquired by utilizing B bootstrap samples.

Suppose that $b (\hat{α}, α)$ and $b (\hat{β}, β)$ are the bias estimators of $\hat{α}$ and $\hat{β}$ , respectively. According to Puggard, Niwitpong & Niwitpong (2022), the estimators for $b (\hat{α}, α)$ and $b (\hat{β}, β)$ are obtained by

(7) $\hat{b} (\hat{α}, α) = \frac{1}{B} \sum_{k = 1}^{B} {\hat{α}}_{k}^{*} - \hat{α}$ and

(8) $\hat{b} (\hat{β}, β) = \frac{1}{B} \sum_{k = 1}^{B} {\hat{β}}_{k}^{*} - \hat{β} .$

Following MacKinnon & Smith (1998), the respective correct estimates for ${\hat{α}}^{*}$ and ${\hat{β}}^{*}$ are obtained by

(9) $\tilde{α_{k}} = {\hat{α}}_{k}^{*} - 2 \hat{b} (\hat{α}, α)$ and

(10) $\tilde{β_{k}} = {\hat{β}}_{k}^{*} - 2 \hat{b} (\hat{β}, β),$ where $k = 1, 2, . . ., B$ .

Therefore, the bootstrap estimator of the percentile is obtained as

(11) ${\hat{θ}}_{k} = \frac{\tilde{β_{k}}}{4} {(\tilde{α_{k}} z_{p} + \sqrt{{(\tilde{α_{k}})}^{2} z_{p}^{2} + 4})}^{2},$ where $z_{p} = Φ^{- 1} (p)$ is the standard normal $p$ -th quantile, $\tilde{α_{k}}$ is defined in Eq. (9), and $\tilde{β_{k}}$ is defined in Eq. (10).

Therefore, the $100 (1 - γ) %$ two-sided confidence interval for the percentile of Birnbaum-Saunders distribution using the bootstrap approach is given by

(12) $C I_{θ . B} = [L_{θ . B}, U_{θ . B}] = [{\hat{θ}}_{k} (γ / 2), {\hat{θ}}_{k} (1 - γ / 2)],$ where ${\hat{θ}}_{k} (γ / 2)$ and ${\hat{θ}}_{k} (1 - γ / 2)$ denote the $100 (γ / 2)$ -th and $100 (1 - γ / 2)$ -th percentiles of ${\hat{θ}}_{k}$ , respectively.

Algorithm 2 was used to construct the bootstrap confidence interval for the percentile of Birnbaum-Saunders distribution.

Step 1: Generate sample from the Birnbaum-Saunders distribution

Step 2: At the

b

step

(a) Generate

x^{*} = (x_{1}^{*}, x_{2}^{*}, . . ., x_{n}^{*})

with replacement from

x = (x_{1}, x_{2}, . . ., x_{n})

(b) Compute

\hat{b} (\hat{α}, α)

using Eq. (7) and compute

\hat{b} (\hat{β}, β)

using Eq. (8)

\tilde{α_{k}}

using Eq. (9) and

\tilde{β_{k}}

using Eq. (10)

(d) Compute

{\hat{θ}}_{k}

using Eq. (11)

Step 3: Repeat step 2, a total B times and obtain an array of

{\hat{θ}}_{k}

’s

Step 4: Compute

L_{θ . B} = {\hat{θ}}_{k} (γ / 2)

and

U_{θ . B} = {\hat{θ}}_{k} (1 - γ / 2)

DOI: 10.7717/peerj.17019/table-7

Bayesian approach

Bayesian approach offers a structured manner to integrate prior knowledge and revise beliefs as fresh data emerges. Assume that $υ$ follows an inverse gamma distribution with parameters $a$ and $b$ , represented as $I G (υ | a, b)$ . Wang, Sun & Park (2016) utilized inverse gamma distributions as prior distributions of $β$ and $α^{2}$ , denoted as $I G (β | a_{1}, b_{1})$ and $I G (α^{2} | a_{2}, b_{2})$ , respectively. The marginal distribution of $β$ is given by

(13) $\begin{array}{l} p (β | x) \propto β^{- (n + a_{1} + 1)} \exp (- \frac{b_{1}}{β}) \prod_{i = 1}^{n} ({(\frac{β}{x_{i}})}^{1 / 2} + {(\frac{β}{x_{i}})}^{3 / 2}) \\ \times {(\sum_{i = 1}^{n} \frac{1}{2} (\frac{x_{i}}{β} + \frac{β}{x_{i}} - 2) + b_{2})}^{- \frac{n + 1}{2} - a_{2}} . \end{array}$

The conditional posterior distribution of $α^{2}$ given $β$ is defined by

(14) $p (α^{2} | x, β) \propto I G (\frac{n}{2} + a_{2}, \frac{1}{2} \sum_{i = 1}^{n} (\frac{x_{i}}{β} + \frac{β}{x_{i}} - 2) + b_{2}) .$

The samples in Eqs. (13) and (14) are derived using Markov Chain Monte Carlo methods.

Wang, Sun & Park (2016) employed the generalized ratio-of-uniforms method to generate posterior samples for $β$ . Wakefield, Gelfand & Smith (1991) introduced the generalized ratio-of-uniforms method. It involves a pair of random variables $(u, v)$ , each of which follows a uniform distribution as defined by

(15) $A (r) = {(u, v) : 0 < u \leq {[p (\frac{v}{u^{r}} | x)]}^{1 / (r + 1)}}, r \geq 0$ where $r$ is a constant and $p (\cdot | x)$ is defined by using Eq. (13). Therefore, $β = \frac{v}{u^{r}}$ has density $\frac{p (β | x)}{\int p (β | x) d β}$ . To sample random data points in $A (r)$ , the random variables $(u, v)$ are generated from a uniform distribution over a one-dimensional bounded rectangle $[0, a (r)] \times [b^{-} (r), b^{+} (r)]$ .

Suppose that $a (r)$ , $b^{-} (r)$ , and $b^{+} (r)$ are given by

(16) $a (r) = sup_{β > 0} {[p (β | x)]^{1 / (r + 1)}},$

(17) $b^{-} (r) = inf_{β > 0} {β [p (β | x)]^{r / (r + 1)}},$ and

(18) $b^{+} (r) = sup_{β > 0} {β [p (β | x)]^{r / (r + 1)}} .$

Wang, Sun & Park (2016) proposed that $a (r)$ and $b^{+} (r)$ are finite and $b^{-} (r) = 0$ . The prospective variate $β = \frac{v}{u^{r}}$ is accepted if $u \leq [p (β | x)]^{1 / (r + 1)}$ ; otherwise, the process is reiterated. The posterior samples for $α^{2}$ are acquired using the LearnBayes package within the R software suite. Hence, the square root of $α^{2}$ represents the posterior samples of $α$ .

Therefore, the posterior distribution of $θ$ is obtained as

(19) $θ_{B a y e} = \frac{β}{4} {(α z_{p} + \sqrt{α^{2} z_{p}^{2} + 4})}^{2},$ where $z_{p} = Φ^{- 1} (p)$ is the standard normal $p$ -th quantile.

Therefore, the $100 (1 - γ) %$ two-sided credible interval for the percentile of Birnbaum-Saunders distribution using the Bayesian approach is given by

(20) $C I_{θ . B a y e} = [L_{θ . B a y e}, U_{θ . B a y e}] = [θ_{B a y e} (γ / 2), θ_{B a y e} (1 - γ / 2)],$ where $θ_{B a y e} (γ / 2)$ and $θ_{B a y e} (1 - γ / 2)$ denote the $100 (γ / 2)$ -th and $100 (1 - γ / 2)$ -th percentiles of $θ_{B a y e}$ , respectively.

Algorithm 3 was used to construct the Bayesian credible interval for the percentile of Birnbaum-Saunders distribution.

Step 1: Specify the values of

a_{1}

a_{2}

b_{1}

b_{2}

, and

r

, then compute

a (r)

using Eq. (16) and compute

b^{+} (r)

using Eq. (18)

Step 2: At the

i

step

(a) Generate

u

from uniform distribution with parameters 0 and

a (r)

, denoted as

U (0, a (r))

(b) Generate

v

from uniform distribution with parameters 0 and

b^{+} (r)

, denoted as

U (0, b^{+} (r))

ρ = \frac{v}{u^{r}}

(d) If the value of

ρ

is accepted, the set

β_{(i)} = ρ

u \leq [p (β | x)]^{1 / (r + 1)}

; otherwise, repeat step (a)–step (c)

(e) Generate

λ

from inverse gamma distribution with parameters

\frac{n}{2} + a_{2}

and

\frac{1}{2} \sum_{j = 1}^{n} (\frac{x_{j}}{β_{(i)}} + \frac{β_{(i)}}{x_{j}} - 2) + b_{2}

, denoted as

I G (\frac{n}{2} + a_{2}, \frac{1}{2} \sum_{j = 1}^{n} (\frac{x_{j}}{β_{(i)}} + \frac{β_{(i)}}{x_{j}} - 2) + b_{2})

and compute

α_{(i)} = \sqrt{λ}

Step 3: Compute the posterior distribution of

θ

, denoted as

θ_{B a y e}

, using Eq. (19)

Step 4: Repeat step 2 and step 3, a total M times and obtain an array of

θ_{B a y e}

’s

Step 5: Compute

L_{θ . B a y e} = θ_{B a y e} (γ / 2)

and

U_{θ . B a y e} = θ_{B a y e} (1 - γ / 2)

DOI: 10.7717/peerj.17019/table-8

Highest posterior density approach

The HPD interval was constructed using the posterior distribution of $θ$ as specified in Eq. (19). This interval has the smallest length among all intervals that contain $100 (1 - γ) %$ within the posterior probability. Every point within the interval has a greater probability than any point located outside of it, as explained by Box & Tiao (2011).

Therefore, the $100 (1 - γ) %$ two-sided credible interval for the percentile of Birnbaum-Saunders distribution using the HPD approach is given by

(21) $C I_{θ . H P D} = [L_{θ . H P D}, U_{θ . H P D}],$ where $L_{θ . H P D}$ and $U_{θ . H P D}$ are determined using the hdi function within the HDInterval package of the R software suite.

Algorithm 4 was used to construct the HPD interval for the percentile of the Birnbaum-Saunders distribution.

Step 1: Specify the values of

a_{1}

a_{2}

b_{1}

b_{2}

, and

r

, then compute

a (r)

using Eq. (16) and compute

b^{+} (r)

using Eq. (18)

Step 2: At the

i

step

(a) Generate

u

from uniform distribution with parameters 0 and

a (r)

, denoted as

U (0, a (r))

(b) Generate

v

from uniform distribution with parameters 0 and

b^{+} (r)

, denoted as

U (0, b^{+} (r))

ρ = \frac{v}{u^{r}}

(d) If the value of

ρ

is accepted, the set

β_{(i)} = ρ

u \leq [p (β | x)]^{1 / (r + 1)}

; otherwise, repeat step (a)–step (c)

(e) Generate

λ

from inverse gamma distribution with parameters

\frac{n}{2} + a_{2}

and

\frac{1}{2} \sum_{j = 1}^{n} (\frac{x_{j}}{β_{(i)}} + \frac{β_{(i)}}{x_{j}} - 2) + b_{2}

, denoted as

I G (\frac{n}{2} + a_{2}, \frac{1}{2} \sum_{j = 1}^{n} (\frac{x_{j}}{β_{(i)}} + \frac{β_{(i)}}{x_{j}} - 2) + b_{2})

and compute

α_{(i)} = \sqrt{λ}

Step 3: Compute the posterior distribution of

θ

, denoted as

θ_{B a y e}

, using Eq. (19)

Step 4: Repeat step 2 and step 3, a total M times and obtain an array of

θ_{B a y e}

’s

Step 5: Compute

L_{θ . H P D}

and

U_{θ . H P D}

DOI: 10.7717/peerj.17019/table-9

Results

In this study, confidence intervals are proposed using the GCI approach, the bootstrap approach, the Bayesian approach, and the HPD approach. A simulation study was conducted to evaluate the performance of these confidence intervals. A Monte Carlo simulation study was carried out to evaluate the effectiveness of the suggested confidence intervals for percentile of the Birnbaum-Saunders distribution, utilizing the R software. The evaluation involved comparing the performance of these confidence intervals in terms of coverage probabilities and average lengths. The most desirable confidence interval is defined as one that achieves a coverage probability of the nominal confidence level 0.95 or higher, and the shortest average length.

According to Puggard, Niwitpong & Niwitpong (2022), normal random variables were used to generate the Birnbaum-Saunders random variables. For percentiles, the shape parameter was set as $α =$ 0.10, 0.25, 0.50, 0.75, and 1.00, while the scale parameter was fixed at $β =$ 1.00. Moreover, for Bayesian credible interval and HPD interval, we considered $r =$ 2.00 and set the hyperparameter $a_{1}$ , $a_{2}$ , $b_{1}$ and $b_{2}$ to $10^{- 4}$ . We conducted 5,000 replications with 5,000 for the GCI using GPQ, B = 500 for the bootstrap confidence interval, and M = 1,000 for the Bayesian credible interval and HPD interval.

Algorithm 5 was used to compute the coverage probabilities and average lengths of the proposed confidence intervals for the percentile of Birnbaum-Saunders distribution.

Step 1: Use Algorithm 1–Algorithm 4 to construct the confidence intervals

Step 2: If

L_{θ} \leq θ \leq U_{θ}

, set

p =

1; else set

p =

Step 3: Compute

U_{θ} - L_{θ}

Step 4: Repeat step 1–step 3, a total 5,000 times

Step 5: Compute mean of

p

defined by the coverage probability

Step 6: Compute mean of

U_{θ} - L_{θ}

defined by the average length

DOI: 10.7717/peerj.17019/table-10

The findings are based on the following simulation work. The performances of the proposed confidence intervals for the percentile of the Birnbaum-Saunders distribution were presented in Table 1 and displayed in Figs. 1 and 2. From Table 1, for $n \leq 50$ , the results showed that the coverage probabilities of all the proposed confidence intervals were lower than the nominal confidence level of 0.95. However, the GCI had coverage probabilities close to the nominal confidence level of 0.95. For $n = 100$ , the coverage probabilities of both the GCI and Bayesian credible interval exceeded the nominal confidence level of 0.95 in some cases. Figures 1 and 2 present the coverage probabilities and average lengths of the confidence intervals for the percentile, corresponding to various sample sizes and shape parameters, respectively. According to Fig. 1, it can be observed that the coverage probabilities were close to the nominal confidence level of 0.95 as the sample size increased. Furthermore, the average lengths of all approaches decreased as the sample size increased. Based on the simulation results presented in Fig. 2, the coverage probabilities were close to the nominal confidence level of 0.95 as the shape parameter increased. Additionally, the average lengths of all approaches increased with the shape parameter.

Table 1:

The coverage probabilities and average lengths of 95% two-sided confidence intervals for the percentile of Birnbaum-Saunders distribution.

$n$	$p$	$α$	Coverage probability (Average length)
$n$	$p$	$α$	$C I_{θ . G C I}$	$C I_{θ . B}$	$C I_{θ . B a y e}$	$C I_{θ . H P D}$
10	0.50	0.10	0.9494 (0.1388)	0.9008 (0.1130)	0.9354 (0.1294)	0.9320 (0.1281)
		0.25	0.9514 (0.3503)	0.8952 (0.2842)	0.9356 (0.3261)	0.9284 (0.3217)
		0.50	0.9490 (0.7066)	0.9020 (0.5630)	0.9334 (0.6552)	0.9330 (0.6386)
		0.75	0.9474 (1.0770)	0.9026 (0.8353)	0.9320 (0.9837)	0.9290 (0.9434)
		1.00	0.9484 (1.4662)	0.9044 (1.0969)	0.9340 (1.3028)	0.9352 (1.2265)
30	0.50	0.10	0.9498 (0.0738)	0.9298 (0.0689)	0.9424 (0.0721)	0.9412 (0.0715)
		0.25	0.9480 (0.1840)	0.9296 (0.1718)	0.9428 (0.1801)	0.9382 (0.1784)
		0.50	0.9510 (0.3628)	0.9342 (0.3369)	0.9466 (0.3543)	0.9436 (0.3498)
		0.75	0.9462 (0.5350)	0.9280 (0.4917)	0.9396 (0.5177)	0.9358 (0.5088)
		1.00	0.9526 (0.6948)	0.9408 (0.6287)	0.9496 (0.6623)	0.9466 (0.6475)
50	0.50	0.10	0.9422 (0.0564)	0.9308 (0.0540)	0.9400 (0.0556)	0.9352 (0.0552)
		0.25	0.9488 (0.1408)	0.9372 (0.1349)	0.9446 (0.1388)	0.9436 (0.1376)
		0.50	0.9490 (0.2754)	0.9398 (0.2625)	0.9460 (0.2711)	0.9438 (0.2681)
		0.75	0.9490 (0.4035)	0.9356 (0.3812)	0.9446 (0.3937)	0.9426 (0.3884)
		1.00	0.9512 (0.5232)	0.9382 (0.4869)	0.9444 (0.5027)	0.9432 (0.4946)
100	0.50	0.10	0.9560 (0.0396)	0.9466 (0.0385)	0.9506 (0.0392)	0.9486 (0.0389)
		0.25	0.9472 (0.0982)	0.9374 (0.0957)	0.9452 (0.0974)	0.9414 (0.0966)
		0.50	0.9526 (0.1925)	0.9438 (0.1869)	0.9490 (0.1903)	0.9462 (0.1885)
		0.75	0.9486 (0.2804)	0.9442 (0.2704)	0.9500 (0.2754)	0.9476 (0.2724)
		1.00	0.9520 (0.3595)	0.9428 (0.3419)	0.9494 (0.3481)	0.9454 (0.3440)

DOI: 10.7717/peerj.17019/table-1

Note:

Bold font means the confidence interval with coverage probability greater than or equal to 0.95 and the shortest average length.

Figure 1: Comparison of the coverage probabilities and average lengths of the confidence intervals for the percentile according to sample sizes.
(A) Coverage probability (B) Average lengths.

Download full-size image

DOI: 10.7717/peerj.17019/fig-1

Figure 2: Comparison of the coverage probabilities and average lengths of the confidence intervals for the percentile according to shape parameters.
(A) Coverage probability (B) Average lengths.

Download full-size image

DOI: 10.7717/peerj.17019/fig-2

Empirical application

The GCI, bootstrap, Bayesian, and HPD approaches can be employed to calculate confidence intervals for the percentile of PM2.5 levels in Mae Hong Son province and Lampang province, Thailand.

Table 2 contains data on daily PM2.5 levels reported by the Pollution Control Department from January 1 to June 30, 2023. Figure 3 provides histograms illustrating the distribution of daily PM2.5 levels in Mae Hong Son and Lampang provinces. Due to the positively skewed nature of the data, seven distribution models were considered: normal, log-normal, Weibull, gamma, exponential, Cauchy, and Birnbaum-Saunders distributions. The suitability of these models for fitting the daily PM2.5 level data was assessed using the Akaike Information Criterion (AIC). Table 3 displays the AIC values for these seven probability models, calculated based on the PM2.5 level data from both provinces. The results in Table 3 indicated that the Birnbaum-Saunders distribution is the most appropriate model for fitting the daily PM2.5 level data in both Mae Hong Son and Lampang provinces, as it yields the lowest AIC value.

Table 2:

Daily PM2.5 levels data in Mae Hong Son and Lampang provinces.

Provinces	Daily PM2.5 levels ( $μ g / m^{3}$ )
Mae Hong Son	25	21	19	18	18	20	16	23	25	26
	20	18	17	21	27	28	24	28	27	28
	25	21	26	30	36	38	40	42	41	35
	40	41	47	42	41	34	30	31	37	39
	54	64	78	69	57	70	68	13	9	15
	21	27	28	34	36	48	59	66	75	78
	79	92	97	103	104	91	91	111	118	142
	216	134	53	59	62	60	98	116	104	148
	137	144	255	309	244	237	209	263	321	256
	233	144	109	147	176	204	203	147	117	108
	124	119	104	106	166	214	138	61	59	56
	53	50	38	34	34	49	39	36	50	49
	34	23	23	22	28	29	31	26	13	18
	8	8	5	11	12	13	17	18	19	19
	21	19	19	15	17	17	16	17	13	10
	11	8.3	11.6	13.3	10.3	6.2	5.5	5.8	7.1	14.1
	13.5	15.2	10	10.1	10.2	10.2	9	5.5	3.7	3.7
	3.9	3.6	3.3	4.2	4.2	4.4	4.4	5	5.5	4.8
	7.1
Lampang	29	30	28	30	27	21	19	33	39	32
	37	33	30	27	32	41	43	39	41	40
	44	50	49	57	60	59	71	62	59	67
	88	87	80	82	68	52	40	39	65	106
	108	113	125	158	211	153	42	23	19	27
	57	67	74	72	88	92	59	96	143	155
	150	138	132	119	100	84	90	97	101	99
	103	64	49	48	53	41	51	34	32	39
	42	51	62	129	119	99	121	94	103	148
	148	86	64	92	172	185	182	113	80	72
	96	133	130	100	122	119	82	89	91	93
	91	60	55	37	61	48	30	38	44	34
	36	63	32	37	48	52	51	46	17	19
	21	20	16	17	22	23	23	35	33	31
	36	27	24	23	29	31	28	34	25	22
	18	18.6	17.4	20.3	20.6	16.1	16.8	16.7	14.9	16.5
	19.2	20.4	21.6	18.7	16.91	8.2	16.3	16	14.8	15.6
	14.8	13.5	13.3	14.7	12.8	13	12.5	12.7	14.4	13
	12.7

DOI: 10.7717/peerj.17019/table-2

Note:

Pollution Control Department, Thailand http://air4thai.pcd.go.th/webV3/#/History.

Figure 3: Histograms of the daily PM2.5 level data for (A) Mae Hong Son and (B) Lampang provinces.

Download full-size image

DOI: 10.7717/peerj.17019/fig-3

Table 3:

The estimated AIC values for the probability models using the PM2.5 level data from Mae Hong Son and Lampang provinces.

Distributions	Mae Hong Son province	Lampang province
Normal	2,031.7990	1,876.4660
Log-normal	1,818.1560	1,787.6230
Weibull	1,836.4100	1,802.0070
Gamma	1,836.9690	1,795.2050
Exponential	1,834.9720	1,836.3210
Cauchy	1,946.9400	1,910.7870
Birnbaum-Saunders	1,812.9130	1,782.0890

DOI: 10.7717/peerj.17019/table-3

Note:

Bold font means the distribution with the lowest AIC value.

Table 4 reports the statistics of daily PM2.5 level data in Mae Hong Son and Lampang provinces. Table 5 displays 95% two-sided confidence intervals for the percentile of daily PM2.5 level data in these provinces, using the GCI, bootstrap, Bayesian, and HPD approaches. The results reveal that all confidence intervals encompass the true percentiles. Regarding interval length, the HPD approach yielded the shortest results for daily PM2.5 level data percentiles, while the GCI approach produced the longest results. However, in simulation, the lengths of the bootstrap, Bayesian, and HPD approaches were shorter than the GCI approach, but their coverage probabilities were below the nominal confidence level of 0.95. Additionally, the coverage probability and average length in the simulation were calculated using 5,000 random samples, whereas the length in the example was computed using a single sample. Consequently, the bootstrap, Bayesian, and HPD approaches are not recommended for constructing confidence interval for percentile.

Table 4:

Sample statistics for the daily PM2.5 level data in Mae Hong Son and Lampang provinces.

Statistics	Mae Hong Son province	Lampang province
Minimum	3.30	8.20
Mean	58.18	58.39
Maximum	321.00	211.00
$n$	181	181
$\hat{α}$	1.25	0.78
$\hat{β}$	31.93	45.46
$\hat{θ}$	31.93	45.46

DOI: 10.7717/peerj.17019/table-4

Table 5:

The lower limit (

L_{θ}

) and upper limit (

U_{θ}

) of the 95% confidence intervals for the percentile of the daily PM2.5 level data in Mae Hong Son and Lampang provinces.

Approaches	Mae Hong Son province		Lampang province
Approaches	$[L_{θ}, U_{θ}]$	Interval length	$[L_{θ}, U_{θ}]$	Interval length
GCI	$[27.8787, 38.3599]$	10.4812	$[40.0647, 49.7619]$	9.6972
Bootstrap	$[28.0199, 38.1855]$	10.1656	$[40.0333, 49.6188]$	9.5855
Bayesian	$[27.9843, 37.7447]$	9.7604	$[40.5136, 49.7726]$	9.2590
HPD	$[28.0339, 37.7767]$	9.7428	$[40.4549, 49.5468]$	9.0919

DOI: 10.7717/peerj.17019/table-5

Discussion

In the field of environmental sciences and air quality, percentiles serve as a means to characterize the majority of PM2.5 levels. Utilizing the confidence interval for the percentile of PM2.5 levels enables an estimation of the predominant PM2.5 levels. As a result, leveraging the estimated percentile of PM2.5 levels can contribute to strategic planning for the reduction of air pollutants.

Puggard, Niwitpong & Niwitpong (2022) introduced the HPD approach for constructing confidence intervals for the variance and difference of variances of Birnbaum-Saunders distributions. Nevertheless, in certain situations, percentiles may be more suitable than variance. Hence, the aim of this study was to estimate the percentiles of the Birnbaum-Saunders distribution. Confidence intervals for the percentile of Birnbaum-Saunders distribution were generated using the GCI, bootstrap, Bayesian, and HPD approaches. The GCI approach exhibited strong performance in constructing these intervals. All four approaches utilized simulation data to create these confidence intervals. The GCI approach leverages GPQs for interval construction, while the bootstrap approach relies on the sampling distribution. In contrast, the Bayesian and HPD approaches are rooted in prior distributions. The study results suggest that the GCI approach is the preferred method for constructing confidence intervals for the percentile of the Birnbaum-Saunders distribution. This conclusion is consistent with the findings of previous studies by Ye, Ma & Wang (2010), Thangjai, Niwitpong & Niwitpong (2018), and Thangjai & Niwitpong (2022).

Conclusion

The confidence intervals for the percentile of the Birnbaum-Saunders distribution were established using four approaches: GCI, bootstrap, Bayesian, and HPD approaches. According to simulation results, the average lengths of the bootstrap, Bayesian, and HPD approaches were shorter than the GCI approach, but their coverage probabilities fell below the nominal confidence level of 0.95. As a result, the bootstrap, Bayesian, and HPD approaches are not recommended for constructing confidence interval for percentile. The findings consistently highlight the GCI approach as the most reliable in terms of coverage probability. Therefore, the GCI approach is recommended for constructing confidence interval for percentile.

Supplemental Information

Daily PM2.5 levels data in Mae Hong Son and Lampang provinces.

Source: Pollution Control Department, Thailand. http://air4thai.pcd.go.th/webV3/#/History.

DOI: 10.7717/peerj.17019/supp-1

Download

R code to construct histograms of PM2.5 data from (A) Mae Hong Son Province (B) Lampang Province.

DOI: 10.7717/peerj.17019/supp-2

Download

R code to construct confidence intervals from all methods in this article.

DOI: 10.7717/peerj.17019/supp-3

Download

[1] Abdollahnezhad K, Jafari AA. 2018. Testing the equality of quantiles for several normal populations. Communications in Statistics—Simulation and Computation 47(7):1890-1898

[2] Achcar JA. 1993. Inferences for the Birnbaum-Saunders fatigue life model using Bayesian methods. Computational Statistics and Data Analysis 15(4):367-380

[3] Albers W, Löhnberg P. 1984. An approximate confidence interval for the difference between quantiles in a biomedical problem. Statistica Neerlandica 38(1):20-22

[4] Bhattacharyya G, Fries A. 1982. Fatigue failure models-Birnbaum-Saunders vs. inverse Gaussian. IEEE Transactions on Reliability 31(5):439-441

[5] Birnbaum ZW, Saunders SC. 1969a. A new family of life distributions. Journal of Applied Probability 6(2):319-327

[6] Birnbaum ZW, Saunders SC. 1969b. Estimation for a family of life distributions with applications to fatigue. Journal of Applied Probability 6(2):328-347

[7] Box GEP, Tiao GC. 2011. Bayesian inference in statistical analysis. New York, USA: John Wiley & Sons.

[8] Broomandi P, Geng X, Guo W, Pagani A, Topping D, Kim JR. 2021. Dynamic complex network analysis of PM2.5 concentrations in the UK, using hierarchical directed graphs (V1.0.0) Sustainability 13(2201):1-14

[9] Chang DS, Tang LC. 1994. Percentile bounds and tolerance limits for the Birnbaum-Saunders distribution. Communications in Statistics-Theory and Methods 23(10):2853-2863

[10] Cox TF, Jaber K. 1985. Testing the equality of two normal percentiles. Communications in Statistics-Simulation and Computation 14(2):345-356

[11] Desmond AF. 1986. On the relationship between two fatigue-life models. IEEE Transactions on Reliability 35(2):167-169

[12] Engelhardt M, Bain LJ, Wright FT. 1981. Inference on the parameters of the Birnbaum-Saunders fatigue life distribution based on maximum likelihood estimation. Technometrics 23(3):251-255

[13] Galán-Madruga D, Broomandi P, Satyanaga A, Jahanbakhshi A, Bagheri M, Fathian A, Sarvestan R, Cárdenas-Escudero J, Cáceres JO, Kumar P, Kim JR. 2023. A methodological framework for estimating ambient PM2.5 particulate matter concentrations in the UK. Journal of Environmental Sciences (in press).

[14] Guo H, Krishnamoorthy K. 2005. Comparison between two quantiles: the normal and exponential cases. Communications in Statistics—Simulation and Computation 34(2):243-252

[15] Guo X, Wu H, Li G, Li Q. 2017. Inference for the common mean of several Birnbaum-Saunders populations. Journal of Applied Statistics 44(5):941-954

[16] Harrell FE, Davis CE. 1982. A new distribution-free quantile estimator. Biometrika 69(3):635-640

[17] Hasan MS, Krishnamoorthy K. 2018. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. Journal of Statistical Computation and Simulation 88(8):1499-1514

[18] Huang L-F, Johnson RA. 2006. Confidence regions for the ratio of percentiles. Statistics and Probability Letters 76(4):384-392

[19] Jayalath KP. 2021. Fiducial inference on the right censored Birnbaum-Saunders data via Gibbs sampler. Stats 4(2):385-399

[20] Kaigh WD, Lachenbruch PA. 1982. A generalized quantile estimator. Communications in Statistics—Theory and Methods 11(19):2217-2238

[21] Leiva V, Barros M, Paula GA, Sanhueza A. 2008. Generalized Birnbaum-Saunders distributions applied to air pollutant concentration. Environmetrics 19(3):235-249

[22] Li YL, Xu AC. 2016. Fiducial inference for Birnbaum-Saunders distribution. Journal of Statistical Computation and Simulation 86(9):1673-1685

[23] Lu MC, Chang DS. 1997. Bootstrap prediction intervals for the Birnbaum-Saunders distribution. Microelectronics Reliability 37(8):1213-1216

[24] MacKinnon JG, Smith JAA. 1998. Approximate bias correction in econometrics. Journal of Econometrics 85(2):205-230

[25] Marshall AW, Walsh JE. 1950. Some test for comparing percentage points of two arbitrary continuous populations.

[26] Navruz G, Özdemir AF. 2018. Quantile estimation and comparing two independent groups with an approach based on percentile bootstrap. Communications in Statistics—Simulation and Computation 47(2):2119-2138

[27] Ng HKT, Kundu D, Balakrishnan N. 2003. Modified moment estimation for the two-parameter Birnbaum-Saunders distribution. Computational Statistics and Data Analysis 43(3):283-298

[28] Niu C, Guo X, Xu W, Zhu L. 2014. Comparison of several Birnbaum–Saunders distributions. Journal of Statistical Computation and Simulation 84(12):2721-2733

[29] Padgett WJ, Tomlinson MA. 2003. Lower confidence bounds for percentiles of Weibull and Birnbaum-Saunders distributions. Journal of Statistical and Simulation 73(6):429-443

[30] Price RM, Bonett DG. 2002. Distribution-free confidence intervals for difference and ratio of medians. Journal of Statistical Computation and Simulation 72(2):119-124

[31] Puggard W, Niwitpong S-A, Niwitpong S. 2022. Confidence intervals for the variance and difference of variances of Birnbaun-Saunders distributions. Journal of Statistical Computation and Simulation 92(13):2829-2845

[32] Thangjai W, Niwitpong S. 2020. Comparing particulate matter dispersion in Thailand using Bayesian confidence intervals for ratio of coefficients of variation. Statistics in Transition New Series 21(5):41-60

[33] Thangjai W, Niwitpong S-A. 2022. The relative potency of two drugs using the confidence interval for ratio of means of two normal populations with unknown coefficients of variation. Journal of Statistics Applications and Probability 11(1):1-14

[34] Thangjai W, Niwitpong S-A, Niwitpong S. 2018. Confidence intervals for variance and difference between variances of one-parameter exponential distributions. Advances and Applications in Statistics 53(3):259-283

[35] Wakefield JC, Gelfand AE, Smith AFM. 1991. Efficient generation of random variates via the ratio-of-uniforms method. Statistics and Computing 1(2):129-133