Where is all the research software? An analysis of software in UK academic repositories

View article
PeerJ Computer Science
Query: https://share.osf.io/discover?q=%20tags%3A(%22Software%22)
Query: https://explore.openaire.eu/search/find/research-outcomes?type=%22software%22
Query: https://www.base-search.net/Search/Results?lookfor=country%3Auk+doctype%3A6l=enoaboost=1ling=0newsearch=1refid=dcadvenname=
Query: https://v2.sherpa.ac.uk/cgi/search/repository/advanced?screen=Searchrepository_name_merge=ALLrepository_name=repository_org_name_merge=ALLrepository_org_name=type=institutionalcontent_types=softwarecontent_types_merge=ANYcontent_subjects_merge=ANYorg_country_browse_merge=ALLorg_country_browse=United+Kingdomsatisfyall=ALLorder=preferred_name_action_search=Search
Query: https://explore.openaire.eu/search/content-providers?datasourcetypename=Institutional%20Repositorydatasourceodlanguages=Englishcountry=GB

Main article text

 

Introduction

‘Treat computer code like any other output of your research... Share your computer code like you would any other research output...Computer code should have a URL or a DOI (digital object identifier)... Always include these when citing the code, including information on the version you used.’ (Research Data Management Toolkit, 2021)

Background and wider context

Software: a definition

Software as research output

  1. Registration: the claim of precedence of a research finding is made.

  2. Certification: the validity of the registered claim is established e.g., peer review.

  3. Awareness: the registered claim is disseminated to others.

  4. Archiving: the record of the registered claim is preserved.

  5. Rewarding: the participants in the communication system somehow benefit from derived metrics.

‘The system should consider datasets, simulations, software, and dynamic knowledge representations as units of communication in their own right.’

“...it is a requirement for in-scope research articles to contain a data access statement. This informs readers where the underlying research materials associated with a paper are available...Underlying research materials are research data, as defined in the Concordat on Open Research Data, and can include code, software... ”(UKRI Open Research Team, 2022, pp. 10).

“...it is vital that the data supporting and underlying published research findings should, as far as possible, be made open by the time the findings are published and be preserved for an appropriate period. This could be achieved by depositing and providing access to relevant data and associated software (where possible) via a repository owned or operated by a discipline-specific research community and its funding bodies, a publisher, a research institution, a subject association, a learned society, national deposit libraries or a commercial organisation; or via other mechanisms that provide appropriate and sustainable services.” (UKRI Concordat on Research Data, Principle 8, pp 16)

“Host OA research on open infrastructure. Host and publish OA texts, data, metadata, code, and other digital research outputs on open, community-controlled infrastructure. Use infrastructure that minimizes the risk of future access restrictions or control by commercial organizations.” (BOAI, 2022)

Software as a REF-returnable item

Repositories

“...digital collections capturing and preserving the intellectual output of a single or multi-university community...”(Crow, 2002, pp. 1)

Metadata & protocols

“In addition, we are starting a pending list of item types that we are considering for inclusion in the future if their usage becomes significant. We have been asked to consider the inclusion of ‘Code’ as an item type. None of our current participants uses that but some use the term ‘Software’ so we have added this to our pending list.” (Reed, 2014, p. 13)

The open archives initiative protocol for metadata harvesting (OAI-PMH)

Method

Preliminary search

Surveying academic institutional repositories

Data Sources

Querying IRs using OAI-PMH

Additional variables

Results

The extent of software in UK Academic IRs

  1. Does not contain software: those that contained software as an explicit type of entry in the repository, but did not have any records matching the type i.e., capable of containing software as a specified category, but doesn’t have any yet.

  2. Contains software: those that contained software as an explicit type of entry in the repository and also have one or more records matching the type i.e., can and does have defined software records.

  3. No direct software search capability: does not have a specified entry type of software or similar i.e., cannot contain an explicit entry of type software. Note, this does not mean there is no software in the repository, as each repository may have catch-all categories, such as ‘other’ or ‘non-categorised’.

Software records per research information system type

Metadata formats

Russell Group membership

RSE Group at institution

Discussion

RQ1: to what extent is software published in academic repositories?

RQ2: are there any explanatory variables associated with whether software is included in a repository?

RQ3: are there barriers preventing repositories from storing records of software as distinct research outputs?

Technical infrastructure

Institutional

Academic culture

Conclusions

Recommendations and future work

Additional Information and Declarations

Competing Interests

The authors declare that there are no competing interests.

Author Contributions

Domhnall Carlin conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Austen Rainer conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

David Wilson conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Data Deposition

The following information was supplied regarding data availability:

The dataset is available at Zenodo and at the QUB Institutional Repository:

- Domhnall Carlin. (2023). A census of research software in 171 academic institutional repositories. (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7603444

- Carlin, D. (Creator) (03 Feb 2023). A census of research software in 171 academic institutional repositories. Queen’s University Belfast. submission_dataset_release_v1(.csv). 10.5281/zenodo.7603444

The code is available at Zenodo: https://doi.org/10.5281/zenodo.7974836

The dataset of manual queries is available at Zenodo: Carlin, Domhnall. (2023). institutional_repo_census (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7974836

Funding

This work was supported as part of Dr Carlin’s UKRI EPSRC Research Software Engineering Fellowship (EP/V052284/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

1 Citation 1,079 Views 48 Downloads