Boa: a link between worlds

Department of Computing Science, University of Alberta, Edmonton, Canada
DOI
10.7287/peerj.preprints.1947v1
Subject Areas
Software Engineering
Keywords
Traceability, Exploratory, Empirical, URI
Copyright
© 2016 Romansky et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Romansky S, Charmchi S, Hindle A. 2016. Boa: a link between worlds. PeerJ Preprints 4:e1947v1

Abstract

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

Author Comment

This is a work that was submitted to the 2016 MSR challenge track.

Supplemental Information

GitHub Repository Listing Dataset: List of GitHub repositories this study tried to replicate

DOI: 10.7287/peerj.preprints.1947v1/supp-1

Git Commit Message URI Dataset: extracted from commit messages of GitHub projects in gitRepoNames.tar.gz

DOI: 10.7287/peerj.preprints.1947v1/supp-2

Extracted URIs from GitHub repositories over time part 2

DOI: 10.7287/peerj.preprints.1947v1/supp-3

Extracted URIs from GitHub repositories over time part 1

DOI: 10.7287/peerj.preprints.1947v1/supp-4

Extracted URIs from GitHub repositories over time part 3

DOI: 10.7287/peerj.preprints.1947v1/supp-5

Extracted URIs from GitHub repositories over time part 4

DOI: 10.7287/peerj.preprints.1947v1/supp-6