“Abstract: The website Sci-Hub enables users to download PDF versions of scholarly articles, including many articles that are paywalled at their journal’s site. Sci-Hub has grown rapidly since its creation in 2011, but the extent of its coverage was unclear. Here we report that, as of March 2017, Sci-Hub’s database contains 68.9% of the 81.6 million scholarly articles registered with Crossref and 85.1% of articles published in toll access journals. We find that coverage varies by discipline and publisher, and that Sci-Hub preferentially covers popular, paywalled content. For toll access articles, we find that Sci-Hub provides greater coverage than the University of Pennsylvania, a major research university in the United States. Green open access to toll access articles via licit services, on the other hand, remains quite limited. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection, suggesting the toll access business model may become unsustainable.”
“In principle, knowledge of an article title allows for rather easy retrival of the associated DOI: It usually involves copy-pasting the string into a search engine, following the first link to a landing page or full text PDF and looking for the DOI there. While this is doable for a small number of data points, it quickly becomes infeasible for larger sets of data, and 3000 articles is clearly too much to look up manually.”
The analyses show that, of 956,050,193 references from journal articles stored at Crossref, 486,041,671 (50.84%) are now in the category “Open”, and are freely available for third parties to download and use for any purpose.
This is a significant milestone for the Initiative for Open Citations (I4OC, https://i4oc.org/), which since early 2017 has been campaigning for scholarly publishers to open their reference lists, and a major gain for the world of open scholarship.
“It’s now four months since we publicly announced the Initiative for Open Citations (I4OC). Since the beginning of this effort, almost half of indexed scholarly citation data have become freely accessible. We’ve also had some amazing initial press coverage and we continue to add new publishers and stakeholders.
Data unlocked by I4OC is already being used by a growing number of projects and platforms. OpenCitations imports citation data into a corpus which now includes more than 9 million citation links, a nearly 200% increase since the beginning of the year. Collaborative databases, such as Wikidata, are already using this data to connect and structure knowledge and to generate citation graphs. These examples provide just an early indication of the potential of open citation data and we would be delighted to hear about other efforts….”
Despite growing interest in Open Access (OA) to scholarly literature, there is an unmet need for large-scale, up-to-date, and reproducible studies assessing the prevalence and characteristics of OA. We address this need using oaDOI, an open online service that determines OA status for 67 million articles.
“I’ve described how CrossRef works – now I’ll show how ContentMine will use it for daily mining. ContentMine sets out to mine the whole scientific literature “100 million facts”. Up till now we’ve been building the technical infrastructure, challenging for our rights, understanding the law, and ordering the kit. We’ve built and deployed a number of prototypes. But we are now ready to start indexing science in earnest. Since ContentMining has been vastly underused, and because publisher actions have often chilled researchers and libraries, we don’t know in detail what people want and how they would tackle it. We think there are many approaches – here are a few …”