Multi-token code suggestions using statistical language models

Eddie A Santos; Abram Hindle

doi:10.7287/peerj.preprints.1597v1

Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

NOT PEER-REVIEWED

"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

Multi-token code suggestions using statistical language models

Eddie A Santos , Abram Hindle

December 19, 2015

Author and article information

Abstract

We present an application of the naturalness of software to provide multi-token code suggestions in GitHub’s Atom text editor. We extended the results of a simple n-gram prediction model using the "mean surprise" metric—the arithmetic mean of the surprisal of several successive single-token predictions. After an error-fraught evaluation, there is not enough evidence to conclude that Gamboge significantly improves programmer productivity. We conclude by discussing several directions for future research in code suggestion and more using naturalness.

Cite this as

Santos EA, Hindle A. 2015. Multi-token code suggestions using statistical language models. PeerJ PrePrints 3:e1597v1 https://doi.org/10.7287/peerj.preprints.1597v1

Author comment

This is the paper submitted to my supervisor as part of my undergraduate directed studies. It is fraught with errors, and ripe with informal, non-academic language. That said, we believe the content to be informative, regardless —especially the usage of "mean surprise," and the numerous applications of NLP applied to software ("naturalness of software") that we have listed.

Sections

Additional Information

Competing Interests

The author declares that they have no competing interests.

Author Contributions

Eddie A Santos conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.

Abram Hindle conceived and designed the experiments, reviewed drafts of the paper, mentorship, original research, methodology, academic resources.

Data Deposition

The following information was supplied regarding data availability:

- atom-gamboge <https://github.com/eddieantonio/atom-gamboge>

- unnaturalcode <https://github.com/orezpraw/unnaturalcode>

Funding

The author received no funding for this work.

Add your feedback

Questions

Ask a question

Learn more about Q&A

Links

Add a link

Content

Alert

Just enter your email

Visitors Views Downloads

Your institution may have Open Access funds available for qualifying authors. See if you qualify

Publish for free

Comment on Articles or Preprints and we'll waive your author fee
Learn more

Five new journals in Chemistry

Free to publish • Peer-reviewed • From PeerJ
Find out more

Sections

Additional Information

Competing Interests

Author Contributions

Data Deposition

Funding

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article

Publish for free

Five new journals in Chemistry