Multi-token code suggestions using statistical language models

Eddie A Santos; Abram Hindle

doi:10.7287/peerj.preprints.1597v1

Multi-token code suggestions using statistical language models

Eddie A Santos , Abram Hindle

Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada

DOI: 10.7287/peerj.preprints.1597v1

Published: 2015-12-19
Accepted: 2015-12-19

Subject Areas: Data Mining and Machine Learning, Natural Language and Speech, Software Engineering
Keywords: naturalness, ngram, language models, atom text editor, code suggestion, code prediction, nlp

Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.

Cite this article: Santos EA, Hindle A. 2015. Multi-token code suggestions using statistical language models. PeerJ PrePrints 3:e1597v1 https://doi.org/10.7287/peerj.preprints.1597v1

Abstract

We present an application of the naturalness of software to provide multi-token code suggestions in GitHub’s Atom text editor. We extended the results of a simple n-gram prediction model using the "mean surprise" metric—the arithmetic mean of the surprisal of several successive single-token predictions. After an error-fraught evaluation, there is not enough evidence to conclude that Gamboge significantly improves programmer productivity. We conclude by discussing several directions for future research in code suggestion and more using naturalness.

Author Comment

This is the paper submitted to my supervisor as part of my undergraduate directed studies. It is fraught with errors, and ripe with informal, non-academic language. That said, we believe the content to be informative, regardless —especially the usage of "mean surprise," and the numerous applications of NLP applied to software ("naturalness of software") that we have listed.