Multi-token code suggestions using statistical language models
- Published
- Accepted
- Subject Areas
- Data Mining and Machine Learning, Natural Language and Speech, Software Engineering
- Keywords
- naturalness, ngram, language models, atom text editor, code suggestion, code prediction, nlp
- Copyright
- © 2015 Santos et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Multi-token code suggestions using statistical language models. PeerJ PrePrints 3:e1597v1 https://doi.org/10.7287/peerj.preprints.1597v1
Abstract
We present an application of the naturalness of software to provide multi-token code suggestions in GitHub’s Atom text editor. We extended the results of a simple n-gram prediction model using the "mean surprise" metric—the arithmetic mean of the surprisal of several successive single-token predictions. After an error-fraught evaluation, there is not enough evidence to conclude that Gamboge significantly improves programmer productivity. We conclude by discussing several directions for future research in code suggestion and more using naturalness.
Author Comment
This is the paper submitted to my supervisor as part of my undergraduate directed studies. It is fraught with errors, and ripe with informal, non-academic language. That said, we believe the content to be informative, regardless —especially the usage of "mean surprise," and the numerous applications of NLP applied to software ("naturalness of software") that we have listed.