Abstract: Wikipedia’s contents are based on reliable and published sources. To this date, little is known about what sources Wikipedia relies on, in part because extracting citations and identifying cited sources is challenging. To close this gap, we release Wikipedia Citations, a comprehensive dataset of citations extracted from Wikipedia. A total of 29.3M citations were extracted from 6.1M English Wikipedia articles as of May 2020, and classified as being to books, journal articles or Web contents. We were thus able to extract 4.0M citations to scholarly publications with known identifiers — including DOI, PMC, PMID, and ISBN — and further labeled an extra 261K citations with DOIs from Crossref. As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI. Scientific articles cited from Wikipedia correspond to 3.5% of all articles with a DOI currently indexed in the Web of Science. We release all our code to allow the community to extend upon our work and update the dataset in the future.
“The Repository Dashboard is a free service for our data providers. The Repository Dashboard has been created in an effort to improve the quality and transparency of the harvesting process of open access content and to create a two way collaboration between the CORE project and our data providers.
The Repository Dashboard provides an online interface offering valuable technical information and statistics to content providers. It is the tool you need to check that your repository is configured correctly to provide maximum visibility to your research outputs. Additional features include identifier enrichments, such as detecting missing DOIs for repository records. The tool also offers REF 2021 Open Access compliance monitoring functionality to UK HEIs, and a RIOXX metadata quality checker….”
“1. Reduce barriers to information access and use in order to increase the opportunity to create new knowledge by shifting the culture of scholarship towards open science and open education. Research libraries have done this by:
Creating and sustaining investment in global and national infrastructures to provide access to open scholarly information through partnerships (such as Confederation of Open Access Repositories, Global Sustainability Coalition for Open Science Services, Invest in Open Infrastructure, and OER Commons)
Making major investments in all forms of open access scholarly publishing, including journals (such as through BioOne and PLOS), monographs (such as through Knowledge Unlatched, Open Humanities Press, and TOME), and research data (such as through Dataverse)
Creating, curating, organizing, and promoting massive collections of open educational resources particularly at a time focused on education affordability
Creating and contributing to open repositories for depositing research outputs, including underlying data and code (such as PubMed)
Negotiating licensing agreements with commercial vendors to significantly increase barrier-free access to information (such as transformative agreements)
Leading (with the scholarly community) the adoption of open metadata standards and infrastructure (such as findable, accessible, interoperable, and reusable (FAIR) data) and persistent identifiers (such as digital object identifiers and ORCID IDs)…”
“We are pleased to announce the launch of the new persistent identifier (PID) services registry available at https://pidservices.org, a new service to find services built upon different PIDs from core technology providers and those who integrate from across a variety of disciplinary areas. This is a combined effort across multiple organizations as part of the EC-funded FREYA project grant (777523) with the aim of furthering discoverability of PIDs and the services that are built upon them….”
“Nature Communications encouraged rapid dissemination of results with the launch of Under Consideration in 2017. Today we take one more step by offering an integrated preprint deposition service to our authors as part of the submission process….
From today, our authors have the option to take advantage of In Review, a free preprint deposition service integrated with the submission process to our journal. The preprint of the author’s original submission will be posted (with a permanent DOI, under a CC-BY licence) on the multidisciplinary platform hosted by our partner Research Square at the same time as the submission is being considered by our editorial team….”
“The COVID-19 pandemic has not only accelerated the already rapid growth in submissions of preprints in the biological sciences, but has brought them to the public’s attention as never before.
For example, the medical sciences preprint server medRxiv has already posted more than 3,200 preprints related to the disease. In April, it recorded 10 million views from scientists and the general public.
Many authors in the biological and medical sciences are new to the format. Nature Index asked five experts for their advice on preprint etiquette and best practice….”
Encourage the use of persistent identifiers or PIDs (for example, DOIs for datasets, ORCIDs for authors, RRIDs for reagents – more information can be found on the ORCID website here)
Engage with journal editors, learned societies and other domain leaders to work out what standards, identifiers and language are appropriate for the community. You could use the RDA policy framework as the outline for the conversation.
It is preferable to upload data to a repository, and include a link within a research article, rather than hosting via a supplementary material facility.
Sometimes data do need to be kept closed, but this doesn’t need to be the default situation. Ask the researcher/author why should it be closed rather than why should it be open.
Where possible, have some information (metadata) in front of any paywall to point to where underlying data can be found. See the following examples:…”
“Considering the wide range of communities and interests that our four speakers represented, there was a remarkable and reassuring level of agreement between them about what’s important, what’s working, and what needs to change. As a PID person myself, I was delighted that persistent identifiers came up (ahem) persistently as key to a robust and open research infrastructure. While there is certainly more work needed to harness their full potential, PIDs like DOIs and ORCID iDs are already enabling interoperability between the many systems and services researchers and their organizations are using. There was also agreement among the speakers that more work is needed to increase community awareness, adoption, and use of PIDs, and of the research infrastructure in general. Fragmentation of the infrastructure is an issue here as it’s virtually impossible for anyone to keep up with the myriad of researcher tools and services out there. Bianca Kramer and Jeroen Bosman’s ongoing work to track these tools makes for a scary read, and the thought of finding a way to make all of them sustainable is even scarier! But at the same time, as Karin pointed out, we need to ensure that the tools we do choose to use and support meet the needs of researchers across all disciplines — and that doesn’t mean just retrofitting tools that were originally developed for scientists, it means developing tools that take the requirements of different disciplines into account from the start….”
“So why should this highly successful national-level policy that could effectively achieve the 100% Open Access objective be an obstacle to a pragmatic approach to Open Science? Because it’s a Green Open Access policy based on the deposit of accepted manuscripts in institutional repositories with widespread embargo periods. Because despite the current and future progresses in enhancing the visibility and discoverability of repository contents, the canonical way to reach a publication for an external stakeholder with little knowledge about the complex scholarly communications landscape (eg Industry) remains and will remain the DOI issued by the publisher. Because a Green OA-based policy does not open the publications sitting behind those DOIs. And because the amount of effort involved in the implementation of the HEFCE policy as it is designed right now is so huge that research libraries lack the physical resources to adopt any other complementary Open Access implementation policy.
Enter Plan S with its highly pragmatic approach to Open Access implementation. Originally strongly based on Gold Open Access, APC payments where needed and deals with the publishers to address the double-dipping issue around hybrid journals, it’s only after considerable pressure has been exerted by the Green Open Access lobby that the zero-embargo Green Open Access policy has found a place in the Plan S implementation guidelines. But with the current scramble for ‘transformative’ deals that will allow most hybrid journals to become eligible under Plan S requirements, the size of the institutional Gold Open Access output pie will only grow in forthcoming years….”