Implicit value updating explains transitive inference performance: The betasort model

Greg Jensen; Fabian Muñoz; Yelda Alkan; Vincent P Ferrera; Herbert S Terrace

doi:10.7287/peerj.preprints.954v1

Implicit value updating explains transitive inference performance: The betasort model

Greg Jensen ¹, Fabian Muñoz¹, Yelda Alkan¹, Vincent P Ferrera¹, Herbert S Terrace²

1 Department of Neuroscience, Columbia University, New York, NY, United States

2 Department of Psychology, Columbia University, New York, NY, United States

DOI: 10.7287/peerj.preprints.954v1

Published: 2015-04-02
Accepted: 2015-04-02

Subject Areas: Animal Behavior, Neuroscience, Psychiatry and Psychology
Keywords: reinforcement learning, transitive inference, cognition, rhesus macaques

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Jensen G, Muñoz F, Alkan Y, Ferrera VP, Terrace HS. 2015. Implicit value updating explains transitive inference performance: The betasort model. PeerJ PrePrints 3:e954v1 https://doi.org/10.7287/peerj.preprints.954v1

Abstract

Transitive inference (the ability to infer that “B>D” given that “B>C” and “C>D”) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, the betasort algorithm, and Q-learning (an established RPE model). Of these, only Q-learning failed to respond above chance during critical test trials. Implications for cognitive/associative rivalries, as well as for the model-based/model-free dichotomy, are discussed.

Author Comment

This is the first version of a preprint.