Cite landing pages or repo in light of Git{Hub|Lab} Pages

Dear Authors,

Thanks for providing and explaining these important principles. Regarding the "reasons that a commit reference together with a repository URL is not recommended for the purposes of software citation" I wonder about your opinions on the following counter-arguments that I believe have materialised since this publication.

1. Commit hashes can be guaruanteed to be permanent by protecting branches against force-pushing and deletion, see https://docs.gitlab.com/ee/user/project/protected_branches.html for example. The GitLab of my instution even seems to have this on by default.

2. Given the popularity of GitLab as a self-hosted git platform, and their future strategy, might this recommendation change to "institutions, please provide such platforms and ensure long-term availability"?

3. Assuming that

a) the Pages features of GitHub and GitLab are becoming more popular to generate websites from a Git repo,

b1) the mostly-just-1-click distance from there to landing page, documentation, issue tracker, change history, etc. (provided by the hosting software), as opposed to

b2) authors/developers might need to provide the links from a landing page back to these items themselves, and

c) the fact that most code-hosts render a project's README right beside/below the repo,

the repo URL might become the central hub of a software project, and thus the preferred touch-point between software authors and interested users.

Therefore, what are your current opinions on the proposition that citing the repo URL could (maybe not yet, but soon) be more useful to readers who want to follow a reference to a software, than if they were referred to a landing page?

Thank you, and kind regards,

Katrin Leinweber

waiting for moderation
1 Answer
Accepted answer


Thanks for your question, which was the subject of much discussion by working group members as we finalized our first draft of the principles, which are in this article.

We came to the decision that software version control systems are not intended for long term archiving of software, no matter which system or which host. It is possible that your institution has decided to make such a commitment with its GitLab instance, but even if so, this would not be typical.

I do not think that institutions will generally make this commitment, nor that doing so is the best solution, as opposed to using archives that are set up with a primary goal of preservation.

However, a major change since we wrote our article is the emergence of Software Heritage, which has as a goal the preservation of all source code. As I've written, this has the potential to change our future recommendations by linking software in repositories with an archival service. This will be discussed in the current working group; feel free to join if you are interested.

Finally, I believe that using any simple URL (such as a repository URL) in a reference is worse than using a permanent identifier, due to link rot. The landing page is more permanent, and it also should be one-click away from the source code repository itself, which seems to be to be sufficiently close to serve both the purposes of an archive and a live development site.


waiting for moderation

Hello Dan, and thanks for your reply!

I come to agree that a dedicated PID by an archiving entity is preferable, and not worth omitting just to shave off 1 click to the central hub of activity.

However, against "version control systems [not being] intended for long term archiving of software" I would posit, that in practice they are quite capable of doing so, see the time range covered by https://github.com/git/git/commits/master/COPYING for example. Link rot is understandably a concern/risk, but surely no law of nature. Esp. since git web-apps "upgrade" their URLs with intrinsic identifiers from the VCS' content.

I'm looking forward to SoftwareHeritage.org perfecting that, and maybe marrying it with the academic usage of DOIs :-)

waiting for moderation