An XML Repository of All bioRxiv Articles is Now Available for Text and Data Mining

“bioRxiv and medRxiv provide free and unrestricted access to all articles posted on their servers. We believe this should apply not only to human readers but also to machine analysis of the content. A growing variety of resources have been created to facilitate this access.

bioRxiv and medRxiv metadata are made available via a number of dedicated RSS feeds and APIs. Simplified summary statistics covering the content and usage are also available. For bioRxiv, this information is available here’

Bulk access to the full text of bioRxiv articles for the purposes of text and data mining (TDM) is available via a dedicated Amazon S3 resource. Click here for details of this TDM resource and how to access it….”

What is MEI?

“The Music Encoding Initiative (MEI) is a 21st century community-driven open-source effort to define guidelines for encoding musical documents in a machine-readable structure.

It brings together specialists from various music research communities, including technologists, librarians, historians, and theorists in a common effort to discuss and define best practices for representing a broad range of musical documents and structures. The results of these discussions are then formalized into the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents expressed as an eXtensible Markup Language (XML) schema. This schema is developed and maintained by the MEI Technical Team….”

New business models for the open research agenda | Research Information

“The rise of preprints and the move towards universal open access are potential threats to traditional business models in scholarly publishing, writes Phil Gooch

Publishers have started responding to the latter with transformative agreements[1], but if authors can simply upload their research to a preprint server for immediate dissemination, comment and review, why submit to a traditional journal at all? Some journals are addressing this by offering authors frictionless submission direct from the preprint server. This tackles two problems at once: easing authors’ frustrations with existing journal submission systems[2], and providing a more direct route from the raw preprint to the richly linked, multiformat version of record that readers demand and accessibility standards require….

Dissemination of early-stage research as mobile-unfriendly PDF is arguably a technological step backwards. If preprints are here to stay, the reading experience needs to be improved. A number of vendors have developed native XML or LaTeX authoring environments which enable dissemination in richer formats….”

DOAJ to add Crossref compatibility – News Service

“In a series of metadata improvements, publishers will be able to upload XML in the Crossref format to us from 18th February 2020.

In 2018, we asked our publishers what would make their interaction with DOAJ easier and 46% said that they would like us to accept Crossref XML. Today we only accept XML formatted to our proprietary DOAJ format….”

CORE users can now read articles directly on our site – CORE

“We are happy to announce the release of CORE Reader, which provides a seamless experience for users wishing to read papers hosted by CORE. In this post, we provide an overview of what is new and we encourage you to follow this development as new functionalities in the reader are on our roadmap….

At the beginning of this project, there was a reflection that most open access services do not yet provide a rich user experience for reading research papers. Determined to change this, we originally started looking at whether CORE could render research papers as HTML, as has recently become trendy across publisher platforms. While such rendering remains to be one of the ultimate goals, we realised that this could only be achieved for a small fraction of documents in CORE. More specifically, those that the data provider offers in machine readable formats, such as LaTeX or JATS XML. While we want to encourage more repositories to support such formats (and this remains to be a Plan S recommendation), we wanted to improve the reading experience for all of our users across all of our content….”

Guest post: a technical update from our development team – News Service

“Here are some major bits of work that we have carried out:

Enhancements to our historical data management system. We track all changes to the body of publicly available objects (Journals and Articles) and we have a better process for handling that.
Introduced a more advanced testing framework for the source code. As DOAJ gains more features, the code becomes larger and more complex. To ensure that it is properly tested for before going into production, we have started to use parameterised testing on the core components. This allows us to carry out broader and deeper testing to ensure the system is defect free.
A weekly data dump of the entire public dataset (Journals and Articles) which is freely downloadable.
A major data cleanup on articles: a few tens of thousands of duplicates, from historical data or sneaking in through validation loopholes, were identified and removed. We closed the loopholes and cleaned up the data.
A complete new hardware infrastructure, using Cloudflare. This resulted in the significant increase in stability mentioned above and allows us to cope with our growing data set (increasing at a rate of around 750,000 records per year at this point).

And here are some projects we have been working on which you will see come into effect over the next few weeks:

A completely new search front-end. It looks very similar to the old one, but with some major improvements under-the-hood (more powerful, more responsive, more accessible), and gives us the capability to build better, cooler interfaces in the future.
Support for Crossref XML as an article upload format. In the future this may also be extended to the API and we may also integrate directly with Crossref to harvest articles for you. We support the current Crossref schema (4.7) and we will be supporting new versions as they come along….”

River Valley launch publishing platform, completing their end-to-end publishing solution – River Valley Technologies

“River Valley Technologies announce the launch of RVHost, the innovative content hosting platform, and the final component of their XML-based end-to-end scholarly publishing solution.  

The platform gives unprecedented control to publishers over their content, thus allowing them to create a brand new journal within minutes, or to schedule any publication for a precise date and time, e.g. timed with a press release. RVHost  delivers full analytics, including an innovative graphical “history” of a publication, together with any citations. The system is fully customisable, multi-lingual and content agnostic, allowing it to host any form of data or multimedia. 

RVHost will be launched in partnership with forward-looking STM journal, GigaScience, which will be using River Valley’s complete end-to-end publishing system. …

Kaveh Bazargan added: “RVHost fully supports Open Access publishing, ensuring compliance with Plan S. We ask all participating journals to adopt the COPE/DOAJ principles of transparency to ensure ethical publishing.” …”

Publishing at the speed of research | EurekAlert! Science News

“Today, the open-access, open-data journal GigaScience and the technology and publishing services company River Valley Technologies announce a new partnership to deliver a research publishing process that is extremely rapid, low-cost, and modular. As a pioneer of open data and open science publishing, GigaScience brings editorial expertise in publishing research that includes all components of the research process: data, source code, workflows, and more. River Valley Technologies, with 30 years of expertise in publishing production, delivers an end-to-end publishing solution, including manuscript submission, content management and hosting, using its collaborative online platforms. The collaboration is developing a publishing process that, in addition to providing on-the-fly article production, will create more interactive articles that can be versioned and forked….”

Open Journal Systems (OJS) sets new standards to achieve OpenAIRE compliance with JATS – OpenAIRE Blogs

Open Journal Systems (OJS, https://pkp.sfu.ca/ojs/) is an open source journal management and publishing system, developed by the Public Knowledge Project (PKP, https://pkp.sfu.ca/). Around 10,000 journals worldwide and over a thousand journals published in Europe use Open Journal Systems. The latest major version OJS 3 was released in 2016, and since then hundreds of OJS journals have upgraded including large national journal platforms like Tidsskrift.dk and Journal.fi.Therefore, it is important to help the growing number of OJS 3 journals to become compliant with the OpenAIRE infrastructure in terms of comprehensive metadata descriptions of open access articles on research in Europe and beyond….”

Open Journal Systems (OJS) sets new standards to achieve OpenAIRE compliance with JATS – OpenAIRE Blogs

Open Journal Systems (OJS, https://pkp.sfu.ca/ojs/) is an open source journal management and publishing system, developed by the Public Knowledge Project (PKP, https://pkp.sfu.ca/). Around 10,000 journals worldwide and over a thousand journals published in Europe use Open Journal Systems. The latest major version OJS 3 was released in 2016, and since then hundreds of OJS journals have upgraded including large national journal platforms like Tidsskrift.dk and Journal.fi.Therefore, it is important to help the growing number of OJS 3 journals to become compliant with the OpenAIRE infrastructure in terms of comprehensive metadata descriptions of open access articles on research in Europe and beyond….”