Two perils of binary categorization: Why the study of concepts can’t afford true/false testing

Department of Neuroscience, Columbia University Medical Center, New York, New York, United States
Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom
DOI
10.7287/peerj.preprints.688v1
Subject Areas
Animal Behavior, Psychiatry and Psychology
Keywords
Machine Learning, Categorization, Animal Cognition, Concepts, Comparative Cognition
Copyright
© 2014 Jensen et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Jensen G, Altschul D. 2014. Two perils of binary categorization: Why the study of concepts can’t afford true/false testing. PeerJ PrePrints 2:e688v1

Abstract

In this opinion piece, we outline two shortcomings in experimental design that limit the claims that can be made about concept learning in animals. On the one hand, most studies of concept learning train too few concepts in parallel to support general claims about their capacity of subsequent abstraction. On the other hand, even studies that train many categories of stimulus in parallel only test one or two stimuli at a time, allowing even a simplistic learning rule to succeed by making informed guesses. To demonstrate these shortcomings, we include simulations performed using an off-the-shelf image classifier. These simulations demonstrate that, when either training or testing are overly simplistic, a classification algorithm that is incapable of abstraction nevertheless yields levels of performance that have been described in the literature as proof of concept learning in animals.

Author Comment

This is a brief opinion piece, supported by a simulation demonstration. Its aim is to draw attention to the limits of currently-accepted experimental methodology in the comparative study of concept learning.

Supplemental Information

Figure 1: Simulation Results

Performance of the bag-of-features classifier using 100 feature clusters. (Left) Classification accuracy given training on the ten largest categories in the Caltech 101 sample set. Colored lines show accuracy for specific categories, while dashed black lines show overall accuracy for each level of training complexity. (Right) Accuracy by a classifier trained on 102 categories during a test in which n stimuli must be classified correctly for a trial to be ‘correct.’ Performance for the classifier’s ten best (black) and worst (white) categories was gauged. Solid lines indicate cases in which classification was done perfectly, while dashed lines indicate cases where correct responses required at least one guess.

DOI: 10.7287/peerj.preprints.688v1/supp-1