A new extension of Poisson distribution for asymmetric count data: theory, classical and Bayesian estimation with application to lifetime data

Abdullah Alomair; Muhammad Ahsan-ul-Haq

doi:10.7717/peerj-cs.1748

A new extension of Poisson distribution for asymmetric count data: theory, classical and Bayesian estimation with application to lifetime data

Abdullah Alomair ¹, Muhammad Ahsan-ul-Haq²

1Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia

2College of Statistical Sciences, University of the Punjab, Lahore, Pakistan

DOI: 10.7717/peerj-cs.1748

Published: 2023-12-15
Accepted: 2023-11-20
Received: 2023-07-19

Academic Editor: Kumer Das

Subject Areas: Data Science, Optimization Theory and Computation
Keywords: Poisson-mixture, Dispersion, Moments, Estimation, Bayesian, Censoring, Data analysis

Copyright: © 2023 Alomair and Ahsan-ul-Haq
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Alomair A, Ahsan-ul-Haq M. 2023. A new extension of Poisson distribution for asymmetric count data: theory, classical and Bayesian estimation with application to lifetime data. PeerJ Computer Science 9:e1748 https://doi.org/10.7717/peerj-cs.1748

The authors have chosen to make the review history of this article public.

Abstract

Several research investigations have stressed the importance of discrete data analysis and its relevance to actual events. The current work focuses on a new discrete distribution with a single parameter that can be derived using the Poisson mixing technique. The new distribution is named the Poisson Entropy-Based Weighted Exponential Distribution. It is useful for discussing asymmetric “right-skewed” data with “heavy” tails. Its failure rate function can be used to explain situations with increasing failure rates. The statistical properties of the new distribution are expressed explicitly. The proposed model is simple to manage for under-, equal-, and over-dispersed datasets. The model parameters are estimated using the maximum likelihood method. We consider the parameter estimation for the new model based on right-censored data with a cure fraction. One more focus of the present study is the Bayesian estimation of the model parameters. In the end, three real-world dataset examples were utilized to show the value of the new distribution. These applications revealed that the new model outperforms other standard discrete models.

Introduction

Numerous studies have emphasized the relevance of count data modeling which has aroused significant interest in a range of fields such as medical research, earth science, physics, economics, and insurance. Various lifetime probability distributions have been utilized and investigated in reliability theory. The Poisson distribution is commonly utilized to analyze the “symmetric” and “asymmetric” count datasets, but it cannot describe over-dispersed datasets. As a result, there has been a lot of interest in the discretization of continuous probability distributions. Several techniques may be used to obtain the discrete analog of a continuous probability distribution. The Poisson mixed approach gets great attention from researchers and is most commonly used for generalization or generation of new probability distributions. The Poisson mixed approach is discussed below.

If the Poisson parameter is a random variable with a parameterized distribution (P), then the resulting model is a discrete Poisson mixed model. The distribution P and its parameter vector Θ are referred to as prior distribution and hyperparameter, respectively. The resulting distribution of random variable X is stated as follows:

(1) $f_{X} (x) = \int_{0}^{\infty} f_{X | Λ} (X | λ) f_{Λ}^{P} (λ) d λ,$ where $X | Λ$ is the Poisson distribution with parameter $λ$ as

(2) $f_{X | Λ} (X | λ) = \frac{e^{- λ} λ^{x}}{x!}, x = 0, 1, 2, 3, \dots .$

$f_{Λ}^{P} (λ)$ is a continuous density function and $Λ$ is a random variable of the Poisson parameter $λ$ .

In the literature, many authors have compounded the standard Poisson parameter using standard lifetime distributions. The negative binomial distribution was derived by Greenwood & Yule (1920) by combining the Poisson and gamma distributions. Johnson, Kemp & Kotz (1992) combined the Poisson and exponential distributions to get the geometric distribution. Similarly, various authors introduced mixed Poisson distributions, some examples include the Poisson Lindley (Sankaran, 1970), Poisson Pseudo Lindley (Zeghdoudi & Nedjar, 2017), Poisson transmuted exponential (Bhati, Kumawat & Gómez-Déniz, 2017), Poisson Xgamma (Altun, Cordeiro & Ristić, 2021), Poisson Ailamujia (Hassan et al., 2020), Poisson Quasi-Lindley (Altun, 2019), Poisson XLindley (Ahsan-ul-Haq et al., 2022), Poisson moment exponential (Ahsan-ul-Haq, 2022), and Poisson Mirra (Maya et al., 2022).

Al-Nasser, Rawashdeh & Talal (2022) introduced a new weighted exponential distribution. The resulting distribution is named entropy-based weighted exponential distribution (EBWED). Let X be a continuous random variable that follows EBWED with a single parameter $(β)$ . The probability density function (pdf) of EBWD will be.

(3) $f (x, β) = \frac{β (β x - \ln (β))}{1 - \ln (β)} e^{- β x}; x > 0; \ln (β) \neq 1$

The cumulative distribution function (cdf) of the EBWED is

(4) $F (x, β) = 1 - \frac{(1 + β x - \ln (β))}{1 - \ln (β)} e^{- β x} .$

The innovation of this study is the derivation of a new Poisson mixed distribution for under, equal, and over-dispersed count datasets to address the above-mentioned issues. This study has the following goals;

The main objective is to introduce a new flexible Poisson entropy-based weighted exponential distribution. The ensuing distribution is obtained by mixing Poisson with the entropy-based weighted exponential distribution. The moments and associated measures of the new distribution can be calculated analytically when compared to existing discrete distributions, and it has a strong modeling capability. The new model is also incredibly adaptable.
The model parameter is estimated using the maximum likelihood estimation (MLE) method. A comprehensive simulation is performed to assess the behavior ML estimates.
The new distribution is used to model “asymmetric” and “right skewed” data in the presence of complete and right-censored data.
We also take into account censored data with a cure fraction.
The Bayesian estimation approach is also utilized to estimate the model parameter.

The rest of the document is structured as follows: The derivation of the new discrete probability model is presented in “The PEBWE Distribution”. “Moments and Associated Measures” discusses its underlying mathematical characteristics. “Parameter Estimation” discusses the maximum likelihood estimation for the distribution parameter using complete, censored, and censored data with cure fraction. This section also discusses Bayesian estimation using the MCMC approach. Three examples are given in “Application” to illustrate the adaptability of the new distribution. In the end, concluding remarks and some future directions are given in “Conclusion”.

The pebwe distribution

The following proposition introduces a new mixed-Poisson model by combining the Poisson and Entropy-Based Weighted Exponential distributions.

Proposition 1. Suppose that X follows the compound Poisson-EBWE distribution (PEBWED), which has the following stochastic representation:

$(X | λ) \sim P o i s s o n (λ)$

$(λ | β) \sim E B W E (β)$ where $λ$ and $β > 0$ . Then, the pmf of X is given by

(5) $p (x; β) = Pr (X = x) = \frac{β ((1 + β) \ln (β) - (1 + x) β)}{{(1 + β)}^{2 + x} (\ln (β) - 1)}, x = 0, 1, 2,$

The new model is denoted as $P E B W E D (β)$ , and one can note $X \sim P E B W E D (β)$ to apprise that X follows that PEBWED with parameter $β$ .

Proof. The pmf of X can be obtained using the common mixing method shown below.

$p (x; β) = \int_{0}^{\infty} Pr (X = x | λ) f (λ | β) d λ,$

$= \int_{0}^{\infty} \frac{λ^{x} e^{- λ}}{x!} \frac{β (β λ - \ln (β)) e^{- β λ}}{1 - \ln (β)} d λ,$

$= \frac{1}{(1 - \ln (β)) x!} {β^{2} \int_{0}^{\infty} λ^{x + 1} e^{- λ} e^{- β λ} d λ - β \ln (β) \int_{0}^{\infty} λ^{x} e^{- λ} e^{- β λ} d λ},$

$= \frac{1}{(1 - \ln (β)) x!} {\frac{β^{2} Γ (x + 2)}{{(β + 1)}^{x + 2}} - \frac{β \ln (β) Γ (x + 1)}{{(β + 1)}^{x + 1}}},$

$= \frac{β ((1 + β) \ln (β) - (1 + x) β)}{{(1 + β)}^{2 + x} (\ln (β) - 1)}, x = 0, 1, 2, \dots$

The proof is completed.

Figure 1 depicts the potential pmf plots of the proposed distribution.

Figure 1: Plots of pmf of the PEBWED.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-1

Remark: The first derivative of pmf is

$\frac{d p (x)}{d x} = - \frac{β (β + ((1 + β) \ln (β) - (1 + x) β) \ln (1 + β))}{{(1 + β)}^{2 + x} (\ln (β) - 1)},$ gives

(6) $\hat{x} = \frac{β - β \ln (1 + β) + \ln (β) \ln (1 + β) + β \ln (β) \ln (1 + β)}{β \ln (1 + β)} .$

For $β > 0.6934$ the $\hat{X}$ is a critical point that maximizes the $p (\hat{X}; β)$ and $0 < β \leq 0.6914$ the pmf is a decreasing function of x.

and

$\frac{d^{2} p (x)}{d x^{2}} = \frac{β \ln (1 + β) (2 β + (- (1 + x) β + (1 + β) \ln (β)) \ln (1 + β))}{{(1 + β)}^{2 + x} (\ln (β) - 1)} .$

Therefore, the mode of PEBWED is given by

$M o d e (X) = {\begin{matrix} \frac{β - β \ln (1 + β) + \ln (β) \ln (1 + β) + β \ln (β) \ln (1 + β)}{β \ln (1 + β)} f o r β > 0.6914 \\ 0 o t h e r w i s e \end{matrix}$

The cdf and survival function of the PEBWED is given by

(7) $F (x; β) = Pr (X \leq x) = 1 - \frac{(1 + β (2 + x) - (1 + β) \ln (β))}{{(1 + β)}^{2 + x} (1 - \ln (β))},$ and

(8) $S (x; β) = \frac{(1 + β (2 + x) - (1 + β) \ln (β))}{{(1 + β)}^{2 + x} (1 - \ln (β))} .$

The hazard function (hf) of the PEBWED is given by

(9) $h (x; β) = \frac{p (x; β)}{1 - F (x; β)} = \frac{β ((- 1 - x) β + (1 + β) \ln (β))}{(1 + β) \ln (β) - (2 + x) β - 1} .$

Proposition 2: The PEBWED hf increases as x increases.

Proof: Using the idea of Glaser (1980) and from the pmf of PEBWED

$ρ (x) = - \frac{p^{'} (x; β)}{p (x; β)}$

$= - \frac{β}{(1 + x) β - (1 + β) \ln (β)} + \ln (1 + β)$

It follows that

$ρ^{'} (x) = \frac{β^{2}}{{((1 + x) β - (1 + β) \ln (β))}^{2}} .$

As $ρ^{'} (x) > 0$ , the hf of PEBWED is increasing function.

Furthermore, the graphs in Fig. 2 pertain to the possible shapes of the PEBWED.

Figure 2: Plots of hf of the PEBWED.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-2

Moments and associated measures

In this section, moments, probability generating function, moment generating function, and their associated measures, mean, variance, dispersion index, skewness, and kurtosis are derived and discussed.

Proposition 3: The rth factorial moments of PEBWED are given by

(10) $μ_{(r)} = E (X (X - 1) \dots (X - r + 1)) = \frac{Γ (1 + r) (\ln (β) - r - 1)}{β^{r} (\ln (β) - 1)} .$

Proof: The factorial moment can be calculated using the compound-Poisson theory as follows:

$μ_{(r)} = \frac{β}{1 - \ln (β)} \int_{0}^{\infty} λ^{r} (β λ - \ln (β)) e^{- β λ} d λ,$

$\begin{aligned} = \frac{β}{1 - \ln (β)} {β \int_{0}^{\infty} λ^{r + 1} e^{- β λ} d λ - \ln (β) \int_{0}^{\infty} λ^{r} e^{- β λ} d λ}, \end{aligned}$

$\begin{aligned} = \frac{Γ (1 + r) (\ln (β) - r - 1)}{β^{r} (\ln (β) - 1)} . \end{aligned}$ which complete the proof.

By replacing r = 1, 2, 3, and 4 in Eq. (10), the first four factorial moments of the PEBWED can be derived.

That is,

$μ_{(1)} = \frac{\ln (β) - 2}{β (\ln (β) - 1)},$

$μ_{(2)} = \frac{2 (\ln (β) - 3)}{β^{2} (\ln (β) - 1)},$

$μ_{(3)} = \frac{6 (\ln (β) - 4)}{β^{3} (\ln (β) - 1)},$ and

$μ_{(4)} = \frac{24 (\ln (β) - 5)}{β^{4} (\ln (β) - 1)} .$

Now, using the general connection between factorial moments and moments about the origin, the first four moments about the origin of the PEBWED are obtained. We get

(11) $μ_{1}^{'} = E (X) = \frac{\ln (β) - 2}{β (\ln (β) - 1)},$

$μ_{2}^{'} = E (X^{2}) = \frac{- 2 (3 + β) + (2 + β) \ln (β)}{β^{2} (\ln (β) - 1)},$

$μ_{3}^{'} = E (X^{3}) = \frac{- 2 (12 + β (9 + β)) + (6 + β (6 + β)) \ln (β)}{β^{3} (\ln (β) - 1)},$

$μ_{4}^{';} = E (X^{4}) = \frac{- 2 (60 + β (72 + β (21 + β))) + (2 + β) (12 + β (12 + β)) \ln (β)}{β^{4} (\ln (β) - 1)} .$

Therefore, the variance of PEBWED is obtained as

(12) $V a r (X) = \frac{2 (1 + β) + \ln (β) (- 4 - 3 β + (1 + β) \ln (β))}{β^{2} (\ln (β) - 1)} .$

The dispersion Index (DI) of the PEBWED is given by

(13) $D I (X) = \frac{2 (1 + β) + \ln (β) (- 4 - 3 β + (1 + β) \ln (β))}{β (\ln (β) - 2) (\ln (β) - 1)} .$

To obtain explicit formulations for the skewness and kurtosis of the PEBWED, apply the following equations.

$S k e w n e s s (X) = \frac{E (X^{3}) - 3 E (X^{2}) E (X) + 2 {(E (X))}^{3}}{{[V a r (X)]}^{\frac{3}{2}}},$ and

$K u r t o s i s (X) = \frac{E (X^{4}) - 4 E (X^{3}) E (X) + 6 E (X^{2}) {(E (X))}^{2} - 3 {(E (X))}^{4}}{{[V a r (X)]}^{2}} .$

Proposition 4: The probability generating function (pgf) of PEBWED is given by

(14) $G (s) = E (x^{X}) = \frac{β (- β + (1 - s + β) \ln (β))}{{(1 - s + β)}^{2} (\ln (β) - 1)} .$ for $s \in (- 1, 1) .$

Proof: The pgf of the PEBWED is derived using the well-known compound-Poisson theory in the manner described below

$G (s) = \frac{β}{1 - \ln (β)} \int_{0}^{\infty} e^{λ (s - 1)} (β λ - \ln (β)) e^{- β λ} d λ,$

$= \frac{β}{1 - \ln (β)} {β \int_{0}^{\infty} e^{λ (s - 1)} λ e^{- β λ} d λ - \ln (β) \int_{0}^{\infty} e^{λ (s - 1)} e^{- β λ} d λ},$

$= \frac{β}{1 - \ln (β)} {\frac{β}{{(1 - s + β)}^{2}} - \frac{\ln (β)}{(1 - s + β)}},$

$= \frac{β (- β + (1 - s + β) \ln (β))}{{(1 - s + β)}^{2} (\ln (β) - 1)} .$ which completes the proof.

The moment generating function (mgf) and characteristic function (cf) of the PEBWED are obtained from Eq. (14) when s is substituted by $e^{t}$ and $e^{i t}$ respectively. They are provided, respectively, by

(15) $M (t) = \frac{β (β - \ln (β) + e^{t} \ln (β) - β \ln (β))}{{(- 1 + e^{t} - β)}^{2} (1 - \ln (β))} .$ for $t \leq 0,$ and

(16) $ϕ (t) = \frac{β (β - \ln (β) + e^{i t} \ln (β) - β \ln (β))}{{(- 1 + e^{i t} - β)}^{2} (1 - \ln (β))} .$ for $t \in R .$

The mean, variance, DI, skewness, and kurtosis for the PEBWED are now shown numerically in Table 1 for various parameter choices.

Table 1:

Some computational measures of the PEBWED.

$β$	$E (X)$	$V a r (X)$	$C S (X)$	$C K (X)$	$D I (X)$	$C V (X)$
0.1	13.028	164.42	1.7974	7.6916	12.620	0.9842
0.5	3.1812	10.511	1.6455	6.8850	3.3040	1.0191
0.8	2.2720	5.3450	1.5602	6.4932	2.3526	1.0176
0.9	2.1163	4.5742	1.5307	6.3703	2.1614	1.0106
1.2	1.8525	3.2068	1.4354	6.0221	1.7311	0.9667
1.5	1.7880	2.4702	1.3414	5.7724	1.3815	0.8790
1.8	1.9033	1.8930	1.3640	5.9838	0.9946	0.7229
2.0	2.1294	1.3538	2.0214	7.3516	0.6358	0.5464

DOI: 10.7717/peerj-cs.1748/table-1

Parameter estimation

In this section, the model parameter is estimated using the maximum likelihood approach based on complete and censored sampling, censored sampling with cure fraction. This section also covers parameter estimation using the Bayesian approach.

ML estimation based on complete data

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample obtained from a PEBWE distribution. The log-likelihood function is defined as follows

(17) $l (β | x) = n \log (β) - n \log (1 - \ln (β)) + \sum_{i = 1}^{n} \log (β x_{i} - \ln (β)) + β \sum_{i = 1}^{n} x_{i},$

For the maximum likelihood (ML) estimator of the parameter, differentiate Eq. (17) for β

(18) $\frac{\partial l (β | x)}{\partial β} = \frac{n}{β} + \frac{n}{1 - \log (β)} + \sum_{i = 1}^{n} \frac{x_{i} - \frac{1}{β}}{β x_{i} - \ln (β)} + \sum_{i = 1}^{n} x_{i} .$

Equating Eq. (18) to zero and solving for yields the ML estimator. The resultant expression has no closed-form solution, implying that numerical methods are required to get the ML estimate of the parameter.

ML estimation based on censored data

Given a random sample $(x_{i}, d_{i})$ of size $n, i = 1, \dots, n$ , the ith individual’s involvement to the likelihood function is given by

$L_{i} = {[f (x_{i})]}^{d_{i}} {[S (x_{i})]}^{1 - d_{i}},$ where $d_{i}$ is a censoring indicator variable; it is equal to one for the survival time that was observed and zero for one that was right-censored. The likelihood function for the model parameter is provided by when the data have a PEBWE distribution.

(19) $L (β | x, d) = \prod_{i = 1}^{n} {[\frac{β (β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}]}^{d_{i}} {[\frac{(1 + β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}]}^{1 - d_{i}},$

The corresponding loglikelihood function is

(20) $\begin{aligned} l (β | x, d) = \log (β) \sum_{i = 1}^{n} d_{i} - \log (1 - \ln (β)) \sum_{i = 1}^{n} d_{i} + \sum_{i = 1}^{n} d_{i} \log (β x_{i} - \ln (β)) - \sum_{i = 1}^{n} d_{i} β x_{i} \\ + \sum_{i = 1}^{n} (1 - d_{i}) \log (1 + β x_{i} - \ln (β)) - \sum_{i = 1}^{n} (1 - d_{i}) \log (1 - \ln (β)) - \sum_{i = 1}^{n} (1 - d_{i}) β x_{i} . \end{aligned}$

We have derived the log-likelihood function about β

(21) $\begin{aligned} \frac{\partial l (β | x, d)}{\partial β} = \sum_{i = 1}^{n} \frac{d_{i}}{β} + \sum_{i = 1}^{n} \frac{\frac{d_{i}}{β}}{1 - \ln (β)} + \sum_{i = 1}^{n} \frac{d_{i} (x_{i} - \frac{1}{β})}{β x_{i} - \ln (β)} - \sum_{i = 1}^{n} d_{i} x_{i} \\ + \sum_{i = 1}^{n} \frac{(1 - d_{i}) (x_{i} - \frac{1}{β})}{1 + β x_{i} - \ln (β)} + \sum_{i = 1}^{n} \frac{(1 - d_{i}) (\frac{1}{β})}{1 - \ln (β)} - \sum_{i = 1}^{n} (1 - d_{i}) x_{i} . \end{aligned}$

When we set Eq. (21) to zero, we have the scoring equation that corresponds, and its numerical solution yields the ML estimator.

ML estimation based on censored data and a cure fraction

Survival analysis reveals that a subset of people seems to be impervious to the occurrence of the important event. In clinical trials, some patients who react to the treatment may experience prolonged symptom relief or perhaps a complete recovery. The conventional mixing model’s survival function is provided by

$S (x) = η + (1 - η) S_{0} (x),$ where $η \in (0, 1)$ is the proportion of immunes or cure fraction, and $S_{0} (x)$ is a baseline survival function for vulnerable persons. Given a random sample $(x_{i}, d_{i})$ of size $n, i = 1; \dots, n$ , the i^th subject’s contribution to the likelihood function is given by

$L_{i} = {[f (x_{i})]}^{d_{i}} {[S (x_{i})]}^{1 - d_{i}} = {[(1 - η) f_{0} (x)]}^{d_{i}} {[η + (1 - η) S_{0} (x)]}^{1 - d_{i}},$ where $f_{0} (x)$ is the susceptible individuals’ baseline pdf and $d_{i}$ is a censoring indicator variable. The likelihood and log-likelihood functions for parameter β are given below.

(22) $L (β, η | x, d) = \prod_{i = 1}^{n} {[(1 - η) \frac{β (β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}]}^{d_{i}} {[η + (1 - η) \frac{(1 + β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}]}^{1 - d_{i}},$ and

(23) $\begin{aligned} l (β, η | x, d) = & \sum_{i = 1}^{n} d_{i} \log ((1 - η) \frac{β (β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}) \\ + \sum_{i = 1}^{n} (1 - d_{i}) \log (η + (1 - η) \frac{(1 + β x_{i} - \ln (β))}{1 - \ln (β)} e^{- β x_{i}}) . \end{aligned}$

After differentiating the log-likelihood function for parameters and setting the resultant derivatives to zero, the ML estimators are generated by solving the appropriate equations.

Bayesian estimation

The Bayesian approach has become the most extensively utilized technique in a range of domains, including but not limited to numerous applications. It is especially helpful in engineering, reliability, health sciences, epidemiology, and quality studies due to its capacity to incorporate prior information into the study. So, under this approach, a prior distribution must be assigned to each parameter. For the PEBWE distribution, we can consider the gamma distribution as the prior distribution for the parameter $β$ and the beta distribution for the cure fraction parameter $η$ . The density functions for the gamma and beta distributions are

$β \sim Γ (τ_{1}, λ_{1}), τ_{1}, λ_{1} > 0,$ and

$η \sim Γ (τ_{2}, λ_{2}), τ_{2}, λ_{2} > 0.$ where $τ_{1}, λ_{1}, τ_{2}, λ_{2}$ are the hyperparameters.

The joint posterior expression is gained by multiplying the likelihood function given in Eq. (17) by the prior distribution densities. To simulate the sample from the posterior density, we utilized the Markov chain Monte Carlo (MCMC) procedures as Gibs sampling. We generate 1,006,000 samples for each denomination of parameter. The first 6,000 simulated samples were eliminated as part of a burn-in phase, which is often used to reduce the influence of starting values. The parameter Bayesian estimates were obtained as the mean of samples specified from the joint posterior distribution. Traceplots and the Geweke diagnostic were used to monitor the convergence of the simulated samples. Further, the highest posterior density (HPD) interval of 95% was obtained using the simulated posterior distributions.

Simulation

Here, we conduct a comprehensive simulation analysis to assess the maximum likelihood estimation approach using complete data. Random samples of the PEBWE distribution of sizes (n) 10, 20, 50, 100, and 200 were used considering different values of the parameter (β). All simulation results were based on N = 10,000 replications for the different sample sizes considered for each parameter setting. Table 2 shows the results of the average estimates, absolute bias (AB), mean relative error (MRE), and mean square error (MSE) of all parameter values.

Table 2:

Simulation results based on complete data.

Parameter	$n$	AB	MRE	MSE
$β = 0.1$	10	0.0141	0.1410	0.0024
	20	0.0064	0.0642	0.0008
	50	0.0024	0.0245	0.0003
	100	0.0010	0.0102	0.0001
	200	0.0007	0.0074	0.0001
$β = 0.5$	10	0.1533	0.3067	0.1955
	20	0.0660	0.1320	0.0531
	50	0.0180	0.0360	0.0106
	100	0.0106	0.0212	0.0046
	200	0.0049	0.0097	0.0021
$β = 0.8$	10	0.1794	0.2242	0.2393
	20	0.1020	0.1275	0.1153
	50	0.0480	0.0600	0.0420
	100	0.0219	0.0273	0.0175
	200	0.0117	0.0147	0.0078
$β = 0.9$	10	0.1597	0.1774	0.2351
	20	0.1032	0.1146	0.1186
	50	0.0522	0.0580	0.0502
	100	0.0239	0.0266	0.0237
	200	0.0112	0.0124	0.0107
$β = 1.5$	10	0.0309	0.0206	0.2654
	20	0.0326	0.0218	0.1008
	50	0.0280	0.0187	0.0433
	100	0.0152	0.0102	0.0223
	200	0.0108	0.0072	0.0110
$β = 1.8$	10	0.2553	0.1418	0.4160
	20	0.1028	0.0571	0.1776
	50	0.0012	0.0007	0.0192
	100	0.0035	0.0020	0.0030
	200	0.0012	0.0007	0.0014

DOI: 10.7717/peerj-cs.1748/table-2

Application

In this section, the new model is applied to three over-dispersed and asymmetric, and right-skewed datasets. We compare the fits of PEBWE distribution with Poisson Ailamujia (PA), discrete Burr Hatke (DBH), discrete inverted Topp-Leone (DITL), discrete moment exponential (DME), and Poisson distributions. Different model selection and goodness-of-fit criteria, log-likelihood (L), Akaike information criteria (AIC), Bayesian information criteria (BIC), and Kolmogorov-Smirnov tests are used to compare the fitted models.

Data I: The first data set is about the number of daily death due to coronavirus in China from 23 January to 28 March 2020. The data set is reported at https://www.worldometers.info/coronavirus/country/china/. The data are: 8, 16, 15, 24, 26, 26, 38, 43, 46, 45, 57, 64, 65, 73, 73, 86, 89, 97, 108, 97, 146, 121, 143, 142, 105, 98, 136, 114, 118, 109, 97, 150, 71, 52, 29, 44, 47, 35, 42, 31, 38, 31, 30, 28, 27, 22, 17, 22, 11, 7, 13, 10, 14, 13, 11, 8, 3, 7, 6, 9, 7, 4, 6, 5, 3 and 5. The MLEs, standard errors, and goodness-of-fit measures are presented in Table 3. PP plots of all considered distributions for the first dataset are given in Fig. 3.

Table 3:

The MLEs and model selection measures for the first dataset.

Statistic	Model
Statistic	PEBWE	PA	DBH	DITL	DME	Poisson
$\hat{β}$	0.02446	0.02010	0.99974	0.35393	25.121	49.742
SE	0.00305	0.00178	0.00185	0.04357	2.1865	0.86814
$- l$	324.30	329.99	461.02	366.91	330.52	1,409.8
AIC	650.60	661.97	924.04	735.81	663.03	2,821.6
BIC	652.79	664.16	926.23	738.00	665.22	2,823.8
KS	0.0876	0.1670	0.8120	0.3290	0.1720	0.4970
p-value	0.6900	0.0490	0.0000	0.0000	0.0410	0.0000

DOI: 10.7717/peerj-cs.1748/table-3

Figure 3: PPP plots of all fitted models for first dataset.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-3

The next goal of this study was to estimate the model parameter using the Bayesian estimation approach presented in “Bayesian Estimation”. The posterior mean for the parameter $β$ is 0.0245, and the 95% HPD is 0.0186 to 0.0304. The posterior samples are presented in Fig. 4. The ACF (autocorrelation function) indicates that the posterior samples are independent, and the traceplot demonstrates the appraisal of MCMC samples over the iterations. The Geweke z-score (0.6071) is also indicative of satisfactory convergence of drawn samples to a stable distribution.

Figure 4: Traceplot, density, and ACF plot for the first data.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-4

Data II: The second dataset below is remission times (in weeks) for a group of 30 patients with leukemia who received similar treatment (Lawless, 2011). The data observations are; 1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19, 24, 26, 29, 31+, 42, 45+, 50+, 57, 60, 71+, 85+, 91. The observations with “+” indicate censored times. Using the methodology outlined in “ML estimation based on censored data”, we compute the MLEs. Table 4 shows the ML estimate and goodness of fit metrics. Figure 5 shows a comparison of the PP plots for the model based on the PEBWE distribution and the competitive discrete distributions. The findings presented by models show that the PEBWE distribution efficiently evaluated this data, while the PA distribution is the second-best model. The data was not well fit by models based on the discrete DBH, DITL, and Poisson distributions.

Table 4:

The MLEs and model selection measures for the second dataset.

Statistic	Model
Statistic	PEBWE	PA	DBH	DITL	DME	Poisson
$\hat{β}$	0.0402	0.0352	0.9992	0.3829	14.420	19.118
SE	0.0083	0.0050	0.0035	0.0766	1.6500	0.8940
$- l$	111.21	117.44	147.97	115.24	118.19	290.41
AIC	224.41	236.88	297.94	232.48	238.38	582.81
BIC	225.81	238.29	299.34	233.88	239.78	584.21
KS	0.1630	0.2490	0.7280	0.2650	0.2550	0.4160
p-value	0.4000	0.0480	0.0000	0.0300	0.0400	0.0000

DOI: 10.7717/peerj-cs.1748/table-4

Figure 5: PP plots of all fitted models for the second dataset.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-5

In the Bayesian estimation, similar to the previous example, we utilized gamma as the prior distribution. The mean is 0.0402, and the 95% HPD is 0.02502 to 0.0565. The posterior samples for the parameter are presented in Fig. 6. The ACF (autocorrelation function) indicates that the posterior samples are independent, and the traceplot demonstrates the appraisal of MCMC samples over the iterations. The Geweke z-score (−0.2249) is also indicative of satisfactory convergence of drawn samples to a stable distribution.

Figure 6: Traceplot, density, and ACF plot for the second data set.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-6

Data III: The third dataset is about survival data with a cure fraction. Consider the findings of research done between 2003 and 2013 at the Musculoskeletal Oncology Center of Sun Yat-Sen University’s First Affiliated Hospital in China (Wang et al., 2015). This study’s goal was to assess the efficacy of modular hemipelvis endoprosthesis rebuilding after pelvic tumor resection. Recurrence times for pelvic tumors with marginal or intracapsular margins were 3, 7, 11*, 18, 22*, 25, 28, 32*, 34*, 35, 35*, 36*, 40*, 40*, 41, 54*, 66*, 76*, 84*, 88*, and 92* months, with an asterisk (*) denoting a censored observation. We acquire the ML estimations using the approach described in “ML estimation based on censored data and a cure fraction”. Table 5 shows the ML estimate and goodness of fit metrics. The PP plots based on all competitive distributions are given in Fig. 7. We can see that the results from the PEBWE distribution provide the best fit.

Table 5:

The MLEs and model selection measures for the third dataset.

Statistic	Model
Statistic	PEBWE	PA	DBH	DITL	DME	Poisson
$\hat{β}$	0.0307	0.0350	0.9999	0.1130	14.369	21.922
SE	0.0264	0.0131	0.0144	0.0883	3.9690	1.7400
$\hat{η}$	0.5162	0.5799	0.6580	5.5e-07	0.5820	0.1290
SE	0.2279	0.1396	0.1056	0.6640	0.1340	0.1270
$- l$	40.503	41.079	54.416	42.969	41.091	50.960
AIC	85.006	86.158	112.83	89.938	85.182	105.92
BIC	87.095	88.247	114.92	92.028	88.271	108.01
KS	0.2350	0.3290	0.8410	0.6470	0.3350	0.6500
p-value	0.2000	0.0210	0.0000	0.0000	0.0180	0.0000

DOI: 10.7717/peerj-cs.1748/table-5

Figure 7: PP plots of all fitted models for third dataset.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-7

Similar to the previous example, for the Bayesian estimation, we utilized gamma and beta distribution as prior for $β$ and $η$ parameters. The means of posterior density for the parameters are $\hat{β} = 0.083$ with a 95% HPD interval (0.0301–0.1401) and $\hat{α} = 0.6363$ with a 95% interval (0.4131–0.8434). The posterior samples for the parameter are presented in Fig. 8. The ACF (autocorrelation function) indicates that the posterior samples are independent, and the traceplot demonstrates the appraisal of MCMC samples over the iterations. The Geweke z-score (0.6607) is also indicative of satisfactory convergence of drawn samples to a stable distribution.

Figure 8: Traceplot, density, and ACF plot for the third data.

Download full-size image

DOI: 10.7717/peerj-cs.1748/fig-8

Conclusion

Discrete probability models play an important role in the analysis of count datasets. A new one-parameter discrete distribution is proposed by mixing Poisson and entropy-based weighted exponential distributions. Derived some important mathematical properties of the new model. The model parameter is estimated using the maximum likelihood and Bayesian estimation methods. The Bayesian estimation was performed using the MCMC approach using the Metropolis-Hastings algorithm. More importantly, the new probability model is applied to three datasets one is based on the number of deaths due to COVID-19, second leukemia patients, and third pelvic tumors patients. The proposed distribution provides more efficient results than all considered competitive distributions.

Supplemental Information

R codes.

DOI: 10.7717/peerj-cs.1748/supp-1

Download

[1] Ahsan-ul-Haq M. 2022. On Poisson moment exponential distribution with applications. Annals of Data Science

[2] Ahsan-ul-Haq M, Al-bossly A, El-morshedy M, Eliwa MS. 2022. Poisson XLindley distribution for count data: statistical and reliability properties with estimation techniques and inference. Computational Intelligence and Neuroscience 2022(1):1-16

[3] Al-Nasser AD, Rawashdeh AI, Talal A. 2022. On using shannon entropy measure for formulating new weighted exponential distribution. Journal of Taibah University for Science (16)1035-1047

[4] Altun E. 2019. A new model for over-dispersed count data: poisson quasi-lindley regression model. Mathematical Sciences 13(3):241-247

[5] Altun E, Cordeiro GM, Ristić MM. 2021. An one-parameter compounding discrete distribution. Journal of Applied Statistics 49(8):1935-1956

[6] Bhati D, Kumawat P, Gómez-Déniz E. 2017. A new count model generated from mixed poisson transmuted exponential family with an application to health care data. Communications in Statistics—Theory and Methods 46(22):11060-11076

[7] Glaser RE. 1980. Bathtub and related failure rate characterizations. Journal of the American Statistical Association 75(371):667-672

[8] Greenwood M, Yule GU. 1920. An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. Journal of the Royal Statistical Society 83(2):255-279

[9] Hassan A, Shalbaf GA, Bilal S, Rashid A. 2020. A new flexible discrete distribution with applications to count data. Journal of Statistical Theory and Applications 19(1):102-108

[10] Johnson NL, Kemp AW, Kotz S. 1992. Univariate discrete distributions. Hoboken: John Wiley & Sons.

[11] Lawless JF. 2011. Statistical models and methods for lifetime data. Hoboken: John Wiley & Sons.