Confidence intervals for the common coefficient of variation of rainfall in Thailand

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

- The initial submission of this article was received on February 17th, 2020 and was peer-reviewed by 3 reviewers and the Academic Editor.
- The Academic Editor made their initial decision on March 10th, 2020.
- The first revision was submitted on May 13th, 2020 and was reviewed by 3 reviewers and the Academic Editor.
- A further revision was submitted on July 23rd, 2020 and was reviewed by the Academic Editor.
- A further revision was submitted on August 29th, 2020 and was reviewed by the Academic Editor.
- The article was Accepted by the Academic Editor on August 30th, 2020.

Accept

Dear Dr. Niwitpong,

Thank you for attending to the comments of the reviewers. The manuscript is now in a much better status to be accepted for publication especially that you decided based on Rev 3 comments to take the rainfall instead of the PM2.5 to prove your approach.

Download
Version 0.4 (PDF)
Download author's response letter
- submitted
Aug 29, 2020

Major Revisions

The manuscript has improved but still needs to be revised one more time to attend to authors comments. I do expect a point by point clarification to all the comments raised by the reviewers.

Download
Version 0.3 (PDF)
Download author's response letter
- submitted
Jul 23, 2020

Major Revisions

The revised manuscript still lacks focus and coherence. It is recommended that the manuscript is revised and turned into a shorter communication. There is no need for all the reviews for basic information in the introduction, the method and even at the beginning of the result description including Table 1. English editing is also needed.

Thank you so much for the response. I still have only a comment. Could the authors check your R-code (from supplemental material) in the following? I understand that qnorm() that the authors used gives the lower percentile.

#-------------------------------------------------------------------------------------------------------------------

# Compute CI based on MOVER

#-------------------------------------------------------------------------------------------------------------------

z <- qnorm(alpha/2) # compute z at alpha/2

Please check also your results from this.

no

no

no

Cite this review as

ok

na

ok

Thank you for doing such a very careful revision.

Cite this review as

Statement "In meteorology, the coefficient of variation has been used to describe the dispersion of PM2.5" lines 84-85 can benefit from a reference

The work might also benefit from some further english editing

Authors have added "For measurements of PM2.5, physical dispersion models are able to solve for spaciotemporal of PM2.5 levels if the following three assumptions are satisfied: (i) PM2.5 measurements is a stochastic process that can be modeled as a log-normal distribution (ii) PM2.5 measurements across different areas are independent but have a common variance and (iii) PM2.5 measurements made in different time in different geographical locations are comparable." lines 90-95. Authors have misunderstood my comment.

My comment was: "This work, and in order to better describe (forcibly atemporal) PM2.5 measurements, presupposes as true, underlying assumptions that need validation. These assumptions are: i) PM2.5 measurements is a stochastic process that can be modeled as a log-normal distribution, ii) PM2.5 measurements across different areas are independent but have a common variance, and iii) PM2.5 measurements made in different time periods in different geographical locations are comparable"

This work will only be valid if the authors can prove these assumptions are correct. It is my belief, however, that these assumptions do not hold. I will expand on the assumptions here:

i) In order for the author's research to be valid, they should be able to prove that PM measurements is a stochastic process that can be modeled as a log-normal distribution. The authors cite Rumburg et al. (2001) as the sole example of using log-normal distribution to fit PM measurements. However, Rumburg et al. (2001) estimated the distribution parameters for a single location based on hundreds of measurements over the period of several years. In this work the authors use at most 25 measurements per location in 9 separate locations without regard to the dates in which the measurements were taken.

ii) The authors assume that PM2.5 measurements across different areas are independent but have a common variance, whereas they actually don't. Some PM sources are intermittent in nature and some are more constant. Therefore depending on what sources affect the particular location, the variances will be different.

ii) Ambient PM concentrations are affected by a multitude of well studied sources and sinks, and as such measurements made in different time in different geographical locations are not necessarily comparable. Again, in Rumburg et al. (2001), the only similar work the authors cited, one location is taken in consideration and the result are only assumed valid for that particular location. In this work, the authors present as comparable distributions fitted from different locations, taken at probably different times, and with a very small sample size.

The authors propose 4 approaches for the confidence interval estimation of the common coefficient of variation of log-normal distributions (FGCI, MOVER, computational, and Bayesian). Given that the authors were not able to ascertain the validity of necessary underlying assumptions, the validity and the value of applying these approaches to PM measurements is not clear.

Cite this review as

Download
Version 0.2 (PDF)
Download author's response letter
- submitted
May 13, 2020

Major Revisions

Dear Dr. Niwitpong,

One of the reviewers has major concerns over the assumptions that were made in the analysis and two others requested major revisions. Kindly take all the points raised by the reviewers in consideration and address them in a revised manuscript. Once resubmitted, the manuscript will be subject to a second round of reviews.

[# PeerJ Staff Note: Please ensure that all review comments are addressed in a rebuttal letter and any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate. It is a common mistake to address reviewer questions in the rebuttal letter but not in the revised manuscript. If a reviewer raised a question then your readers will probably have the same question so you should ensure that the manuscript can stand alone without the rebuttal letter. Directions on how to prepare a rebuttal letter can be found at: https://peerj.com/benefits/academic-rebuttal-letters/ #]

Comment to the Authors

Title: A common particulate matter dispersion in Thailand using confidence intervals for common coefficient of variation

This paper introduces the confidence intervals (CIs) for the common coefficient of variation (CV) of log-normal (LN) distributions based on four methods: the fiducial generalized method, the adjusted method of variance estimates recovery (adjusted MOVER), computational method, and Bayesian method. All CIs were conducted using simulations in terms of coverage probability (CP) and average length (AL). An example on PM2.5 from a part of Thailand is applied. I really appreciate the authors use the real data on air pollution (PM2.5) that are a current problem in many countries. The title of this paper is suggested to be “Confidence interval for the common coefficient of variation of PM2.5 in Thailand”. I still have the major comments as follows.

1. The authors note in abstract that the common CV is usually used to COMPARE the dispersion of data from different populations. I do not understand. In fact, the common CV is a measurement that used to ESTIMATE the true parameter from k populations. So, this estimator cannot use to compare the dispersion of many populations. Please make clear this point first.

Moreover, I suggest the authors point out the important issue in the manuscript that why it is of interest in the common CV, instead of compute the CV for each population. I say this because in applications the mean, variance, or CV of PM2.5 in the area is usually estimated. We can compare the values, i.e. CV, obtained from each area, which area has highest dispersion. If we compute the common CV for k areas (where each area may far away, as the provinces that used in this work), what can we interpret the result. This should clearly explain at the beginning section, and conclude when using the real data, not give only just the values.

2. In the last paragraph of introduction section, the authors note that the CIs introduced in this manuscript allow heterogeneous LN distributions. However, I think one method on page 7 is based on homogeneity case, where the ML estimator for \theta has been derived under \theta_i = \theta for i = 1,2,…,k. This is not the heterogeneity situation. Could the authors explain? On the same page, please check the notation Y_i.

3. The details in the top of page 4 are very common. They have been shown already in papers of Applied Mathematical Sciences, Vol. 7, 2013, no. 77, 3805-3810 and of Journal of Statistical Theory and Applications, Vol. 16, No. 3 (September 2017) 345-353. So, I recommend not to use the theorem, but leave them just information.

In this section, the common CV has been noted as \theta, computed based on the variance in population I, NOT from all available populations. If so it is not the population common CV. Moreover, I do not agree that Eq.7 (\hat{\theta}_i) is the estimator of the common CV for k populations, it is the same reason given before. Also, the formula in Eq.7 is different from the common CV (theta) used in R-code. It must be checked.

4. Please define \bar{X}_i^* and S_i^*

5. This paper lacks of prior knowledge, background, and well explain before derive and construct the CIs. I think it is very important and give a good motivation when we propose the new theory. The authors should provide more details about Eqs.11, 12, and 15: its properties and the conditions of the fiducial generalized pivotal quantity. In a sentence, “…Equation (12) satisfies two conditions …” what is that two conditions? It is so important. For MOVER method, Eqs.17-22 are used for two populations, not for the general k sample cases where it is of interest in this paper. Furthermore, what is the idea to choose prior and posterior distributions in the Bayesian method? Many equations in the statistical methodologies are obtained from previous papers without explication.

6. Is \theta_{BS} in Eq.37 is the parameter of interest?

7. How to set the simulation settings? What are the true common CV of LN distributions used in simulations? The situations considered in this work cover possible values of CV in LN? Moreover, the authors claim that the results obtained from the real data in Example 1 and Example 2 confirm simulation results. I recommend that we can say if those two works (in simulation and in illustrative method) have the same situation or related situation. See the sample sizes, standard deviations and common CVs given in Tables 1-2, compare to Tables 3-4. They seem to be different. Also, I would like to see the true common CV, and please calculate the estimated common CV (point estimator) using simulations.

In many cases in simulations, the average CPs are 1. However, the coverage level of 95% is set. So that the value of simulated CP is too higher than the target value. Could more details be provided here and please the authors explain why the CI in such a case still performs well?

8. The data used in application section are interesting. However, I am wondering why the authors use the data collected from the different time and why not use the data until 2020. Is there any limitation? Could the author explain? A little bit, please re-write the title of histograms and QQ plots. It should be “Histogram of PM2.5 in …… province”

I understand that the lowest AIC means the data fit for the LN distribution than the compared distributions. However, to make sure that the data follow the LN distribution, please use the hypothesis testing (goodness of fit test).

9. I would like the authors check seriously your R-code (provided separately from the main manuscript) for the MOVER method. The simulation results from this approach presented in the main text are not correct, because R-code is not true. So that please carefully take a look at this point and all your R-codes, as well as conclusions.

10. At the beginning of the last section, sample size is defined by n, not k. In your conclusions, the number of case affects the coverage probability. So that I think to the sample case that is greater than 6. What happen with the performance of CIs when k > 6, as k = 10 15 and 20?

NA

NA

NA

Cite this review as

The paper makes an interesting contribution for confidence intervals for the common coefficient of variation for particulate matter as indicator of auir pollution in Thaiand.

data are routinley collected data.

in general, the findings are of interest and it is valuable to know what works and what not.

There are many detailed comments on the annotated manuscript.

The paper needs to be revsied. At the moment, the writing assumes that the reader knows already what this is about.

Annotated reviews are not available for download in order to protect the identity of reviewers who chose to remain anonymous.

Cite this review as

Authors may be referring to smog instead of fog in lines 28-30

All the statements in lines 28-38 would benefit form a reference

Across the article, authors use the term dispersion in its statistical sense (i.e. the spread of the probability distribution, the relative standard deviation) such usage is confusing. Authors are encouraged to use a different syntax given that "dispersion" of particulate matter is a well described physical process -e.g

https://doi.org/10.1175/BAMS-D-14-00110.1 (see line 67 for a standalone example)

It is not clear if the authors are describing their work, or a pre-existing methodology in lines 65-66.

Authors may want to provide more relevant examples of using the log-normal distribution to model particulate matter rather than size of silver particles in a photographic emulsion...etc in lines 44-49. The same comment for examples of using the coefficient of variation in lines 57-64 where the authors cite usage examples in the fields of hematology and serology, finance, and medicine.

It is not evident why the authors go to length stating what studies report on confidence intervals of log-normal distributions (lines 50-56).

Authors state that the log-normal distribution is used to model right-skewed data (39-40) without stating (or citing) that PM2.5 measurements are right skewed.

In line 67, authors assume that measurements of PM2.5 are samples collected from independent log-normal populations with a common coefficient of variation with different variances. It is not clear how and why the authors make such assumptions, authors did not provide enough evidence to justify this assumption in subsequent statements.

The authors did not clearly state how their work fills a knowledge gap. There is no previous literature on applying their methodology to PM2.5 concentration measurements, authors propose "novel approaches for the confidence interval estimation of the common coefficient of variation of lognormal distributions" without considering the log-normality of PM2.5 concentration measurements in literature.

The dataset used is not referenced in time. It is simply a series of measurements (11 measurements in Sop Pat,16 measurements in Ban Dong, and 15 measurements in Mae Mo, and around 25 measurements in each of Chiang Mai, Lampang, Chiang Rai, Nan, Lamphun, and Phrae). Any inference made on such small datasets without taking the temporal spread into account is doubtful.

This work is better placed in statistics. It uses PM2.5 measurements as a dataset like any other to infer on statistical application.

Given measurements of PM2.5, physical dispersion models are able to solve for spacio-temporal PM2.5 levels. This work, and in order to better describe (forcibly atemporal) PM2.5 measurements, presupposes as true, underlying assumptions that need validation. These assumptions are:

- PM2.5 measurements is a stochastic process that can be modeled as a log-normal distribution

- PM2.5 measurements across different areas are independent but have a common variance

- PM2.5 measurements made in different time periods in different geographical locations are comparable

Cite this review as

Download
Original Submission (PDF)
- submitted
Feb 17, 2020

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.