Open data: growing pains | Research Information

“In its latest State of Open Data survey, Figshare revealed that a hefty 64 per cent of respondents made their data openly available in 2018.

The percentage, up four per cent from last year and seven per cent from 2016, indicates a healthy awareness of open data and for Daniel Hook, chief executive of Figshare’s parent company, Digital Science, it spells good news….

For example, the majority of respondents – 63 per cent – support national mandates for open data, an eight  per cent rise from 2017. And, at the same time, nearly half of the respondents – 46 per cent – reckon data citations motivate them to make data openly available. This figure is up seven per cent from last year….

Yet, amid the data-sharing success stories, myriad worries remain. Top of the pile is the potential for data misuse….

Inappropriate sharing of data is another key concern….

Results indicated that a mighty 58 per cent of respondents felt they do not receive sufficient credit for sharing data, while only nine per cent felt they do….

Coko recently won funding from the Sloan Foundation to build DataSeer, an online service that will use Natural Language Processing to identify datasets that are associated with a particular article. …”

Project THOR – Technical and Human infrastructure for Open Research

“THOR is a 30 month project funded by the European Commission under the Horizon 2020 programme. It will establish seamless integration between articles, data, and researchers across the research lifecycle. This will create a wealth of open resources and foster a sustainable international e-infrastructure. The result will be reduced duplication, economies of scale, richer research services, and opportunities for innovation. Learn more about the THOR mission….”

X3ML mapping framework for information integration in cultural heritage and beyond | SpringerLink

Abstract:  The aggregation of heterogeneous data from different institutions in cultural heritage and e-science has the potential to create rich data resources useful for a range of different purposes, from research to education and public interests. In this paper, we present the X3ML framework, a framework for information integration that handles effectively and efficiently the steps involved in schema mapping, uniform resource identifier (URI) definition and generation, data transformation, provision and aggregation. The framework is based on the X3ML mapping definition language for describing both schema mappings and URI generation policies and has a lot of advantages when compared with other relevant frameworks. We describe the architecture of the framework as well as details on the various available components. Usability aspects are discussed and performance metrics are demonstrated. The high impact of our work is verified via the increasing number of international projects that adopt and use this framework.


“Exeley Inc. a New York based company established in 2015 that focuses on offering innovative publishing services to Open Access publications worldwide.

The company is run by Dawid Cecula, an experienced manager in the publishing industry. In the last decade, Dawid has built one of the world’s largest collections of Open Access journals. He gained his experience from working for leading international publishers and delivering professional publishing and consulting services to universities, research centers and societies based in Europe, America and Asia.

Exeley Inc. offers journal owners a well-designed and technologically advanced publishing platform that integrates publications with online content, social media, databases and libraries. Users benefit from such solutions as: allocation of DOI numbers and live reference links via cooperation with Crossref; articles enhanced by graphical abstracts and extra supplementary files (including videos, sound files and power point presentations); advanced article metrics powered by PlumX, and responsive web design….”


“The Scholix initiative is a high level interoperability framework for exchanging information about the links between scholarly literature and data. It aims to build an open information ecosystem to understand systematically what data underpins literature and what literature references data. The DLI Service is the first exemplar aggregation and query service fed by the Scholix open information ecosystem. The Scholix framework together with the DLI aggregation are designed to enable other 3rd party services (domain-specific aggregations, integrations with other global services, discovery tools, impact assessments etc).

Scholix is an evolving lightweight set of Guidelines to increase interoperability rather than a normative standard….”

Hirmeos Project – High Integration of Research Monographs in the European Open Science infrastructure

“Several projects, especially in Europe, pursue the aim of  publishing Open Access research monographs. However, not enough has been done yet to integrate Open Access monographs into the open science ecosystem in a systematic and coordinated fashion. That’s the main goal of High Integration of Research Monographs in the European Open Science (HIRMEOS) project. The project addresses the particularities of academic monographs as a specific support for scientific communication in the Social Sciences and the Humanities  and tackles the main obstacles of the full integration  of monographs into the European Open Science Cloud. It aims at prototyping innovative services for monographs in support of Open Science infrastructure by providing additional data, links and interactions to the documents, at the same time paving the way to new potential tools for research assessment, which is still a major challenge in the Humanities and Social Sciences.

By improving already existing publishing platforms and repositories participating in the OpenAIRE infrastructure, the HIRMEOS project will increase its impact and help including more disciplines into the Open Science paradigm, widening its boundaries towards the Humanities and Social Sciences and to reach out new fields up to now poorly integrated….”

Experiences in integrated data and research object publishing using GigaDB | SpringerLink

“In the era of computation and data-driven research, traditional methods of disseminating research are no longer fit-for-purpose. New approaches for disseminating data, methods and results are required to maximize knowledge discovery. The ‘long tail’ of small, unstructured datasets is well catered for by a number of general-purpose repositories, but there has been less support for ‘big data’. Outlined here are our experiences in attempting to tackle the gaps in publishing large-scale, computationally intensive research. GigaScience is an open-access, open-data journal aiming to revolutionize large-scale biological data dissemination, organization and re-use. Through use of the data handling infrastructure of the genomics centre BGI, GigaScience links standard manuscript publication with an integrated database (GigaDB) that hosts all associated data, and provides additional data analysis tools and computing resources. Furthermore, the supporting workflows and methods are also integrated to make published articles more transparent and open. GigaDB has released many new and previously unpublished datasets and data types, including as urgently needed data to tackle infectious disease outbreaks, cancer and the growing food crisis. Other ‘executable’ research objects, such as workflows, virtual machines and software from several GigaScience articles have been archived and shared in reproducible, transparent and usable formats. With data citation producing evidence of, and credit for, its use in the wider research community, GigaScience demonstrates a move towards more executable publications. Here data analyses can be reproduced and built upon by users without coding backgrounds or heavy computational infrastructure in a more democratized manner.”


Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

Abstract:  Purpose: this paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, i.e. a set tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles, submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops.

Design: RASH has been developed in order to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow

Findings: the evaluation study confirmed that RASH can already be adopted in workshops, conferences and journals and can be quickly learnt by researchers who are familiar with HTML.

Research limitations: the evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technical savvy users. Moreover, additional tools are needed, e.g. for enabling additional conversion from/to existing formats such as OpenXML.

Practical implications: RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitate its automatic discovery, enable its linking to semantically related articles, provide access to data within the article in actionable form, and allow integration of data between papers.

Social implications: RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).

Value: RASH focuses strictly on writing the content of the paper (i.e., organisation of text + semantic annotations) and leaves all the issues about it validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework.

Knowen – accelerating discovery

“Knowen is an experiment in progress that is being designed to organize, preserve, and expand human knowledge – placing it in a hierarchical knowledge base (graph) that makes it readily available to learners of all backgrounds and ages.

The rate of scientific and engineering innovation is rapidly increasing. With the advent of the Internet and powerful search engines, millions of research articles are accessible within seconds from anywhere in the world. Despite this, because of the complexity of a typical journal article and the redundancy among these, it is often as difficult today as it was 50 or 100 years ago to find a particularly relevant and verified piece of knowledge….Knowen is completely free for academic and personal use. Please contact us if you are interested in a commercial license….”