Well done! Your last preprint in PeerJ is very interesting. Note our study about Gold OA in the last issue of Scientometrics:
Dorta-González, P., González-Betancor, S. M., & Dorta-González, M. I. (2017). Reconsidering the gold open access citation advantage postulate in a multidisciplinary context: an analysis of the subject categories in the Web of Science database 2009–2014. Scientometrics 112 (2), 877-901
This is a very interesting and informative article. I'd like to make a few points about your definitions though.
1) In your bulleted list in the Literature Review section, you don't mention that gold OA involves immediate open access to the article. I think this is an important part of the definition.
2) In your list of definitions in the Methods > OA determination section you seem to be mixing categories. "Gold" refers to an article published under certain conditions (i.e. immediate OA from a publisher with re-use rights). Such an article may appear in a pure OA journal or in a hybrid. "Hybrid" is a type of journal (in which some articles are gold OA and others are paywalled), not a type of article. There is no inherent difference between a gold article appearing in a hybrid and a gold article appearing in an OA journal. They both have the same characteristics; they just appear in different venues.
Very nice work! There has also been interesting research demonstrating the amplification effects of open access to broader society via channels like Wikipedia.
Teplitskiy, M., Lu, G. and Duede, E. (2016), Amplifying the impact of open access: Wikipedia and the diffusion of science. Journal of the Association for Information Science and Technology. doi:10.1002/asi.23687
Thanks for a really interesting piece of work. The trends that you see across time and disciplines are encouraging, and it looks like your approach could be very valuable for studying how access to content progresses in the future. Given the different approaches to OA that are being taken in different parts of the world, I wonder if geographic trends could be studied using the resources that you’ve developed. Here are a few specific comments for consideration.
1) I was puzzled by the data in Figure A4 about licenses. You say that you only found a license on 14.8% of the open access articles. Does this mean that there was no license information at all on the bulk of the articles or that you weren’t able to detect a license? Maybe the problem is the lack of consistency in the way that publishers present this information. Either way, I think this might be better presented and discussed as part of the main text, given that the number seems very low. Also given that ‘having no license’ is a criterion for your bronze category, it could be that this category looks much bigger than it really is.
2) If a fully OA journal is not indexed by DOAJ but does use a CC license, where would that content end up in your categorisation? I wasn’t sure if your gold OA category requires the journal to be indexed in DOAJ. Might be worth clarifying.
3) Hybrid. As I understand it, I think this category might also include some delayed OA articles, where those articles are published with a license. An example is Rockefeller University Press who release all content after 6 months (unless the author pays for immediate OA), and use a CC-BY-NC-SA license for this delayed free access (http://www.rupress.org/content/permissions-and-licensing). I’m not sure how many other publishers combine delayed free access with a CC license. Again, might be worth mentioning.
Thanks for your brilliant work. Your approach to a more centralized and transparent way of large-scale Open Access monitoring will be of great use for the scientific community.
While reading your article as well as the comments and having a look at the underlying data, two points occurred to me that might be worth considering:
1. A closer look at the reported accuracy of oaDOI.
Following the links to the 43 reported false negative OA articles (pp. 7 f. and data in accuracy_analysis.xslx), a lot of those seem to fall in one or several of the following categories:
- miscellaneous document types (letters, editorials, news, tables of content etcetera)
- old publications (10 years old or much older; for some copyright has even expired)
- ephemeral or incomplete PDFs (PDFs were not found or did only show an excerpt of the full
This short test indicates that oaDOI would probably have a much higher recall for a subset of more recent articles, reviews, proceedings and/or a stricter definition of false negative errors.
2. Some thoughts on the OA categorization.
Hybrid. The way I see it, the categorization of Hybrid OA is perfectly fine without any addition and fits well within the established definition of the OA community. Hybridity in the context of OA refers to the coexistence of subscription and publication based pricing schemes with regards to a certain entity, most commonly the journal. While one might equate an OA article in a hybrid journal with an article in a full OA journal based on their accessibility and license, it is not true to say that there is no inherent difference between them, as one of the commentators did. For example, there is a substantial difference in terms of their underlying business models and pricing mechanisms which render their conceptual differentiation and empirical identification a vital contribution to OA monitoring.
Bronze. This novel category is interesting but, as also mentioned in the article, still a bit too blurred. The manual inspection of a small sample of Bronze articles shows that nearly half of the articles are akin to Gold but rather hidden (p. 13).
The rest sounds more like a sort of Grey OA to me. With the characteristics no OA licence, delayed access on commercial plattforms and questionable persistence of content dependent on unforeseeable inhibition activities by publishers they are, from a non-publisher perspective, actually quite similar to the content shared on academic social networks (disregarding the legal differences of course). For the sake of consistency, this Grey OA should possibly be excluded from the oaDOI service or included in its entirety, complete with academic social network content. In order to develop oaDOI into a comprehensive OA monitoring tool, I would personally prefer the latter because it leaves ex post deletion of the Grey OA type at the discretion of the monitoring analyst.
This is a great piece of work because it demonstrates how studying open access publishing on a large-scale can become more transparent and reusable. Particularly, I was exited to see that the underlying code, datasets and analysis were also shared, which allows a better understanding of the methods used in this study.
After re-examining articles tagged as hybrid open access, I feel that more attention could have been drawn on the issue of delayed open access as already suggested by Mark Patterson. Unfortunately, it seems that a statement is missing that explains why the definition of hybrid open access used in this study differs from that mentioned in the literature review. More specifically, it remains unclear why delayed open access was not explicitly excluded from the category hybrid open access as many funders and libraries do.
In the beginning, I thought this was simply because applying licenses with different date than the publication indicating delayed open access is not widespread and was thus not relevant. However, obtaining licensing metadata from Crossref for articles with evidence "hybrid (via crossref license)" from both the Crossref and the WoS samples revealed that around 70 % of these articles were tagged with open licenses, which came into effect a certain period after publication. Because of this large proportion, as well as because of ongoing controversies about open access policies promoting hybrid open access, I think, a revised version would benefit from a more detailed analysis and explanation of delayed content licensing by publishers.
Following your encouraging example, here's the link to my analysis:
You can also choose to receive updates via daily or weekly email digests. If you are following multiple preprints then we will send you no more than one email per day or week based on your preferences.
Note: You are now also subscribed to the subject areas of this preprint and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.
Usage since published - updated daily