Multi-token code suggestions using statistical language models
Author and article information
Abstract
We present an application of the naturalness of software to provide multi-token code suggestions in GitHub’s Atom text editor. We extended the results of a simple n-gram prediction model using the "mean surprise" metric—the arithmetic mean of the surprisal of several successive single-token predictions. After an error-fraught evaluation, there is not enough evidence to conclude that Gamboge significantly improves programmer productivity. We conclude by discussing several directions for future research in code suggestion and more using naturalness.
Cite this as
2015. Multi-token code suggestions using statistical language models. PeerJ PrePrints 3:e1597v1 https://doi.org/10.7287/peerj.preprints.1597v1Author comment
This is the paper submitted to my supervisor as part of my undergraduate directed studies. It is fraught with errors, and ripe with informal, non-academic language. That said, we believe the content to be informative, regardless —especially the usage of "mean surprise," and the numerous applications of NLP applied to software ("naturalness of software") that we have listed.
Sections
Additional Information
Competing Interests
The author declares that they have no competing interests.
Author Contributions
Eddie A Santos conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.
Abram Hindle conceived and designed the experiments, reviewed drafts of the paper, mentorship, original research, methodology, academic resources.
Data Deposition
The following information was supplied regarding data availability:
- atom-gamboge <https://github.com/eddieantonio/atom-gamboge>
- unnaturalcode <https://github.com/orezpraw/unnaturalcode>
Funding
The author received no funding for this work.