Research Libraries Powering Sustainable Knowledge in the Digital Age: LIBER Europe Strategy 2018-2022

“The 2018-2022 LIBER Strategy, which will steer LIBER’s development over the next five years, will support LIBER libraries in facing coming changes in the European working environment such as the various initiatives in advancing Open Science. It will also enable research in LIBER organisations to be world class. The leading role of LIBER brings added value to the implementation of the Strategy at a European level. …The term Open Science is not mentioned specifically in the Strategy. Instead, we emphasise innovative scholarly communication and digital skills and services, as well as research infrastructures to enable sustainable knowledge in the digital age…. Our Vision for the research landscape in 2022 is that the role of research libraries will lie in Powering Sustainable Knowledge in the Digital Age:

• Open Access is the predominant form of publishing;

• Research Data is Findable, Accessible, Interoperable and Reusable (FAIR);

• Digital Skills underpin a more open and transparent research life cycle;

• Research Infrastructure is participatory, tailored and scaled to the needs of the diverse disciplines;

• The cultural heritage of tomorrow is built on today’s digital information….

Open Access of Research Publications: this theme will encompass developing innovative services on top of the repository network, developments regarding Open Access business models for journals and the role of libraries therein, and the possibilities for libraries as Open Access publishers and innovative publishing…Semantic Interoperability; Open and Linked Data: research libraries are experts in metadata and ontologies and need to take a leadership role and engage with other stakeholders to ensure interoperability and accessibility of content….”

An artificial future | Research Information

“One of the most exciting data projects we [Elsevier] are working on at the moment is with a UK based charity, Findacure. We are helping the charity to find alternative treatment options for rare diseases such as Congenital Hyperinsulinism by offering our informatics expertise, and giving them access to published literature and curated data through our online tools, at no charge.

We are also supporting The Pistoia Alliance, a not-for-profit group that aims to lower barriers to collaboration within the pharmaceutical and life science industry. We have been working with its members to collaborate and develop approaches that can bring benefits to the entire industry. We recently donated our Unified Data Model to the Alliance; with the aim of publishing an open and freely available format for the storage and exchange of drug discovery data. I am still proud of the work I did with them back in 2009 on the SESL project (Semantic Enrichment of Scientific Literature), and my involvement continues as part of the special interest group in AI….”

WarSampo: Publishing and Using Linked Open Data about the Second World War

“The WarSampo system 1) initiates and fosters large scale Linked Open Data (LOD) publication of WW2 data from distributed, heterogeneous data silos and 2) demonstrates and suggests its use in applications and DH research. WarSampo is to our best knowledge the first large scale system for serving and publishing WW2 LOD on the Semantic Web for machine and human users. Its knowledge graph metadata contains over 9 million associations (triples) between data items including, e.g., a complete set of over 95,000 death records of Finnish WW2 soldiers, 160,000 authentic photos taken during the war, 32,000 historical places on historical maps, 23,000 war diaries of army units, and 3,400 memoir articles written by the veterans after the war. WarSampo data comes from several Finnish organizations and sources, such as National Archives, Defense Forces, Land Survey of Finland, Wikipedia/DBpedia, text books, and magazines.

WarSampo has two separate components: 1) WarSampo Data Service for machines and 2) WarSampo Semantic Portal with various applications for human users.”

Science Beam – using computer vision to extract PDF data | Labs | eLife

“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”

Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

Abstract:  Purpose: this paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, i.e. a set tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles, submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops.

Design: RASH has been developed in order to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow

Findings: the evaluation study confirmed that RASH can already be adopted in workshops, conferences and journals and can be quickly learnt by researchers who are familiar with HTML.

Research limitations: the evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technical savvy users. Moreover, additional tools are needed, e.g. for enabling additional conversion from/to existing formats such as OpenXML.

Practical implications: RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitate its automatic discovery, enable its linking to semantically related articles, provide access to data within the article in actionable form, and allow integration of data between papers.

Social implications: RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).

Value: RASH focuses strictly on writing the content of the paper (i.e., organisation of text + semantic annotations) and leaves all the issues about it validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework.

Yewno Announces Partnerships With Top Publishers to Produce Additional Content Discoverable Through Yewno Platform | Business Wire

“Yewno, a provider of a new inference engine that mimics the human brain and increases knowledge discovery, today announced its partnership with top publishers and other research providers including Wiley, Harvard DASH, American Society for Microbiology and BioOne. Content from these distinguished publishers will produce new insights and inferences giving knowledge seekers access to important content across various verticals to enhance discovery….”

Yewno Announces Partnerships With Top Publishers to Produce Additional Content Discoverable Through Yewno Platform | Business Wire

“Yewno, a provider of a new inference engine that mimics the human brain and increases knowledge discovery, today announced its partnership with top publishers and other research providers including Wiley, Harvard DASH, American Society for Microbiology and BioOne. Content from these distinguished publishers will produce new insights and inferences giving knowledge seekers access to important content across various verticals to enhance discovery….”

ODRL Community Group

“The W3C ODRL [Open Digital Rights Language] Community Group’s aim is to develop and promote an open international specification for Policy Language expressions. The ODRL Policy Language provides a flexible and interoperable information model to support transparent and innovative use of digital assets in the publishing, distribution and consumption of content, applications, and services across all sectors and communities. The ODRL Policy model is targeted to support the business models of open, educational, government, and commercial communities through Profiles that enhance the model to align to their requirements whilst providing a common semantic layer for interoperability….”

Improving interoperability using vocabulary linked data – Springer

Abstract:  The concept of Linked Data has been an emerging theme within the computing and digital heritage areas in recent years. The growth and scale of Linked Data has underlined the need for greater commonality in concept referencing, to avoid local redefinition and duplication of reference resources. Achieving domain-wide agreement on common vocabularies would be an unreasonable expectation; however, datasets often already have local vocabulary resources defined, and so the prospects for large-scale interoperability can be substantially improved by creating alignment links from these local vocabularies out to common external reference resources. The ARIADNE project is undertaking large-scale integration of archaeology dataset metadata records, to create a cross-searchable research repository resource. Key to enabling this cross search will be the ‘subject’ metadata originating from multiple data providers, containing terms from multiple multilingual controlled vocabularies. This paper discusses various aspects of vocabulary mapping. Experience from the previous SENESCHAL project in the publication of controlled vocabularies as Linked Open Data is discussed, emphasizing the importance of unique URI identifiers for vocabulary concepts. There is a need to align legacy indexing data to the uniquely defined concepts and examples are discussed of SENESCHAL data alignment work. A case study for the ARIADNE project presents work on mapping between vocabularies, based on the Getty Art and Architecture Thesaurus as a central hub and employing an interactive vocabulary mapping tool developed for the project, which generates SKOS mapping relationships in JSON and other formats. The potential use of such vocabulary mappings to assist cross search over archaeological datasets from different countries is illustrated in a pilot experiment. The results demonstrate the enhanced opportunities for interoperability and cross searching that the approach offers.