Teaching stats for data science
- Published
- Accepted
- Subject Areas
- Data Science, Scientific Computing and Simulation
- Keywords
- data science, statistics, r language, education
- Copyright
- © 2017 Kaplan
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Teaching stats for data science. PeerJ Preprints 5:e3205v1 https://doi.org/10.7287/peerj.preprints.3205v1
Abstract
The familiar mathematical topics of introductory statistics --- means, proportions, t-tests, normal and t distributions, chi-squared, etc. --- are a product of the first half of the 20th century. Naturally, they reflect the statistical conditions of that era: scarce, e.g. n < 10, data originating in benchtop or agricultural experiments; algorithms communicated via algebraic formulas. Today, applied statistics relates to a different environment: software is the means of algorithmic communication, observational and "unplanned" data are interpreted for causal relationships, and data are large both in n and the number of variables. This change in situation calls for a thorough rethinking of the topics in and approach to statistics education. This paper presents a set of ten organizing blocks for intro stats that are better suited to today's environment.
Author Comment
This is part of the 'Practical Data Science for Stats' Collection