Sci-Hub provides access to nearly all scholarly literature

Systems Pharmacology & Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States
Bidwise, Inc, Miami, Florida, United States
School of Information, University of Texas at Austin, Austin, Texas, United States
Department of Applied Bioinformatics, Goethe University Frankfurt, Frankfurt, Germany
DOI
10.7287/peerj.preprints.3100v2
Subject Areas
Bioinformatics, Legal Issues, Science and Medical Education, Statistics, Computational Science
Keywords
Sci-Hub, publishing, literature, piracy, data-science, LibGen, journals, copyright, paywalls, open-access
Copyright
© 2017 Himmelstein et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Himmelstein DS, Romero AR, McLaughlin SR, Greshake Tzovaras B, Greene CS. (2017) Sci-Hub provides access to nearly all scholarly literature. PeerJ Preprints 5:e3100v2

Abstract

The website Sci-Hub provides access to scholarly literature via full text PDF downloads. The site enables users to access articles that would otherwise be paywalled. Since its creation in 2011, Sci-Hub has grown rapidly in popularity. However, until now, the extent of Sci-Hub’s coverage was unclear. As of March 2017, we find that Sci-Hub’s database contains 68.9% of all 81.6 million scholarly articles, which rises to 85.2% for those published in toll access journals. Coverage varies by discipline, with 92.8% coverage of articles in chemistry journals compared to 76.3% for computer science. Coverage also varies by publisher, with the coverage of the largest publisher, Elsevier, at 97.3%. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. We find Sci-Hub preferentially covers popular, paywalled content, containing 96.2% of citations to toll access journals since 2015. For recently requested articles by Unpaywall users, oaDOI provided access to 48.8% whereas Sci-Hub contained 81.5%. Together, oaDOI and Sci-Hub covered 94.1%, demonstrating that gaps in Sci-Hub’s coverage, especially for open access articles, can be filled using licit services. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection. Sci-Hub’s scope suggests the subscription publishing model is becoming unsustainable.

Author Comment

Version 2 of the Sci-Hub Coverage Study preprint contains major updates based on public feedback and newly available information (https://git.io/vdiUf). We incorporated suggestions provided by Thomas Munro and Ross Mounce via GitHub. We now use the term "toll access" rather than "closed access" to refer to paywalled literature. In addition, we revised our study based on Sci-Hub's Twitter comments. We corrected several sections to reflect that the Sci-Hub logs dataset contains access not request events. As a result, we cannot estimate Sci-Hub's fulfillment rate of requests using these logs. Therefore, we analyzed Sci-Hub's coverage of recent citations as well as articles requested by Unpaywall users. Furthermore, we compared Sci-Hub's coverage to that of oaDOI. We also break down Sci-Hub's coverage by category of access type using data from the State of OA study. Finally, the Sci-Hub Stats Browser now provides detailed pages for each journal.