0
Did you account for the variance of low sampling?
Viewed 96 times

I probably overlooked this in the article, but areas of high elevation probably are sparsely populated, and this means less sampling. Less sampling means greater variance. This correlation could actually be due to the variance in the sampling. IE at some later year, we'd find that the regions of high elevation actually have increased rates of cancer. There are several known cases of reported correlations being due to nothing other than small sample sizes, like when the Gates foundation donated millions of dollars towards an effort to break larger schools into smaller schools after finding that smaller schools were more correlated with good grades. Turned out that the correlation was only due to the fact that smaller schools have less samples sizes.

waiting for moderation
1 Answer
0
Accepted answer

We took two steps to avoid the problems you mention related to observational error (also referred to as measurement uncertainty):

%%1. Counties with populations below 10,000 were excluded due to high missingness (values were missing for many of the variables) and observational error (values were present but subject to large margins of error, evidenced by source-reported confidence intervals).%%

%%2. Counties were weighted by their population square root up to a maximum population of 250,000 where measurement uncertainty leveled off to minimal levels. The weighting scheme accounted for increasing measurement uncertainty among low population counties without granting heavily populated counties an overwhelming influence.%%

We believe these steps minimized any biases in cancer rates due to insufficient sample sizes. In our data release, we include the lower and upper bounds of the 95% confidence interval for cancer incidences. You can browse these intervals to get an idea of the margin of error for each county. We found that our filtering and weighting schemes either removed or downweighted the problematic counties.

waiting for moderation