0
How many of the "significant" scatterplots remain so without strong assumptions?

From a quick glance at the code (great that it is available, but it could perhaps use some nice whitespace), it looks like the p-values are computed from a linear model and assuming normality and homoscedasticity (for them to be correct in small samples at least). Without each of these assumptions (normality, homoscedastic errors, and normal errors), how many of the "positive" cases remain so? For example, which were significant with a non-parametric test, or which were significant when using the bootstrap or sandwich errors with OLS, or which were significant when using a robust regression? It is unclear that the instructions to students made this clear.

waiting for moderation
1 Answer
0
Accepted answer

To elaborate on our survey, students were specifically asked

“Suppose you fit a linear model relating the two variables displayed in the plot below. Would the P-value for the regression coefficient be significant at the 0.05 level (P less than 0.05)?”

The students were only exposed to basic ordinary least squares (OLS) regression in this class, and we assumed that they would interpret the question in this context. The errors were also generated from an independent and identically distributed normal distribution, so all of the necessary assumptions were met in this case.

We agree that weaker assumptions would decrease significance, and focused on OLS regression since it is a canonical and commonly implemented method.

waiting for moderation