Evolving our support for text-and-data mining – Crossref

“Many researchers want to carry out analysis and extraction of information from large sets of data, such as journal articles and other scholarly content. Methods such as screen-scraping are error-prone, place too much strain on content sites and may be unrepeatable or break if site layouts change. Providing researchers with automated access to the full-text content via DOIs and Crossref metadata reduces these problems, allowing for easy deduplication and reproducibility. Supporting text and data mining echoes our mission to make research outputs easy to find, cite, link, assess, and reuse.

In 2013 Crossref embarked on a project to better support Crossref members and researchers with Text and Data Mining requests and access. There were two main parts to the project:

To collect and make available full-text links and publisher TDM license links in the metadata.

To provide a service (TDM click-through service) for Crossref members to post their additional TDM terms and conditions and for researchers to access, review and accept these terms….

To date, 37.5 million works registered with Crossref have both full-text links and TDM license information. We continue to encourage all members to include full-text links and license information in the metadata they register to assist researchers with TDM. You can see how each member is doing via its participation report (e.g. Wiley’s)….

Members are also making subscription content available for text mining (temporarily or otherwise) for specific purposes, such as to help the research community with its response to COVID-19. Back in April we highlighted how this can be achieved by including:

A “free to read” element in the access indicators section of publisher metadata indicating that the content is being made available free-of-charge (gratis)

An assertion element indicating that the content being made available is available free-of-charge….”

DOAJ to add Crossref compatibility – News Service

“In a series of metadata improvements, publishers will be able to upload XML in the Crossref format to us from 18th February 2020.

In 2018, we asked our publishers what would make their interaction with DOAJ easier and 46% said that they would like us to accept Crossref XML. Today we only accept XML formatted to our proprietary DOAJ format….”

Guest post: a technical update from our development team – News Service

“Here are some major bits of work that we have carried out:

Enhancements to our historical data management system. We track all changes to the body of publicly available objects (Journals and Articles) and we have a better process for handling that.
Introduced a more advanced testing framework for the source code. As DOAJ gains more features, the code becomes larger and more complex. To ensure that it is properly tested for before going into production, we have started to use parameterised testing on the core components. This allows us to carry out broader and deeper testing to ensure the system is defect free.
A weekly data dump of the entire public dataset (Journals and Articles) which is freely downloadable.
A major data cleanup on articles: a few tens of thousands of duplicates, from historical data or sneaking in through validation loopholes, were identified and removed. We closed the loopholes and cleaned up the data.
A complete new hardware infrastructure, using Cloudflare. This resulted in the significant increase in stability mentioned above and allows us to cope with our growing data set (increasing at a rate of around 750,000 records per year at this point).

And here are some projects we have been working on which you will see come into effect over the next few weeks:

A completely new search front-end. It looks very similar to the old one, but with some major improvements under-the-hood (more powerful, more responsive, more accessible), and gives us the capability to build better, cooler interfaces in the future.
Support for Crossref XML as an article upload format. In the future this may also be extended to the API and we may also integrate directly with Crossref to harvest articles for you. We support the current Crossref schema (4.7) and we will be supporting new versions as they come along….”

Plaudit · Open endorsements from the academic community

“Plaudit links researchers, identified by their ORCID, to research they endorse, identified by its DOI….

Because endorsements are publisher-independent and provided by known and trusted members of the academic community, they provide credibility for valuable research….

Plaudit is built on open infrastructure. We use permanent identifiers from ORCID and DOI, and endorsements are fed into CrossRef Event Data.

We’re open source, community-driven, and not for profit….”