Teaching stats for data science

Daniel T Kaplan

doi:10.7287/peerj.preprints.3205v1

Teaching stats for data science

Daniel T Kaplan

Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota, United States

DOI: 10.7287/peerj.preprints.3205v1

Published: 2017-08-29
Accepted: 2017-08-29

Subject Areas: Data Science, Scientific Computing and Simulation
Keywords: data science, statistics, r language, education

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Kaplan DT. 2017. Teaching stats for data science. PeerJ Preprints 5:e3205v1 https://doi.org/10.7287/peerj.preprints.3205v1

Abstract

The familiar mathematical topics of introductory statistics --- means, proportions, t-tests, normal and t distributions, chi-squared, etc. --- are a product of the first half of the 20th century. Naturally, they reflect the statistical conditions of that era: scarce, e.g. n < 10, data originating in benchtop or agricultural experiments; algorithms communicated via algebraic formulas. Today, applied statistics relates to a different environment: software is the means of algorithmic communication, observational and "unplanned" data are interpreted for causal relationships, and data are large both in n and the number of variables. This change in situation calls for a thorough rethinking of the topics in and approach to statistics education. This paper presents a set of ten organizing blocks for intro stats that are better suited to today's environment.

Author Comment

This is part of the 'Practical Data Science for Stats' Collection

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article