Strengthening the Open Science Ecosystem Through Preprints

“When rapid and open sharing occurs, it is usually in venues (like scientific conferences or within networks of collaborators) accessible only to researchers from well-resourced and established institutions, creating additional barriers to researchers from emerging countries or under-resourced areas, preventing them from participating in the scientific discourse.

Preprints are poised to change this. In addition to enabling rapid sharing, preprints also 1) offer novel opportunities for feedback and peer review; 2) improve the overall quality, integrity, and reproducibility of research outputs; and 3) help prevent scooping and incentivize early collaboration.

These benefits can be dramatically enhanced by third-party services (authoring tools, commenting platforms, and machine extraction projects) that act as both inputs and outputs to preprints. As arXiv founder Paul Ginsparg envisioned in the early 1990s, preprints can provide “a relatively complete raw archive, unfettered by any unnecessary delays in availability” on top of which “any type of information could be overlayed… and maintained by any third parties,” including tools for validation, filtering, and communication….”

Europe PMC Integrates Smart Citations from scite – scite – Medium

“scite, an award-winning citation analysis platform, and Europe PMC, an open science discovery tool that provides access to a worldwide collection of life science publications, have partnered to display what scite calls smart citations on the Europe PMC platform.

Smart citations advance regular citations by providing more contextual information beyond the information that one study references another. Specifically, smart citations provide the excerpt of text surrounding the citation, the section of the article in which the reference is mentioned, and indicate whether the citing study provides supporting or contradicting evidence. As a result, one can evaluate a study of interest much faster….”

Is overlay peer review the future of scholarly communications? – COAR

“You may have seen the paper published recently by COAR presenting a distributed framework for open publishing services called Pubfair (version 2, after community input), also available in Spanish.

Pubfair is a conceptual model for a modular, distributed open source publishing framework, which builds on the content contained in the network of repositories to enable the dissemination and quality-control of a range of research outputs including publications, data, and more. 

This idea is not new. It is based on the vision outlined in the COAR Next Generation Repositories report  and builds on earlier conceptual models developed by Paul Ginsparg, Herbert Van de Sompel and others. And there are already overlay journals on arXiv, such as Discrete Analysis and Advances in Combinatorics, and other platforms such as Episcience in France, that demonstrate that this can be done at a very high level of quality, for a low price.

We are proposing to expand on these initiatives by developing a highly distributed architecture for overlay services. With decentralization, comes tremendous power. It takes us beyond an environment with many silos, in which every organization maintains its own separate system; to a global, interoperable architecture for scholarly communication. This model can scale; respond to different needs and priorities related to language, region, and domain; and has the potential to set free scholarly communications….”

The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery | ACS Central Science

Abstract:  Despite rapid evolution in the area of microbial natural products chemistry, there is currently no open access database containing all microbially produced natural product structures. Lack of availability of these data is preventing the implementation of new technologies in natural products science. Specifically, development of new computational strategies for compound characterization and identification are being hampered by the lack of a comprehensive database of known compounds against which to compare experimental data. The creation of an open access, community-maintained database of microbial natural product structures would enable the development of new technologies in natural products discovery and improve the interoperability of existing natural products data resources. However, these data are spread unevenly throughout the historical scientific literature, including both journal articles and international patents. These documents have no standard format, are often not digitized as machine readable text, and are not publicly available. Further, none of these documents have associated structure files (e.g., MOL, InChI, or SMILES), instead containing images of structures. This makes extraction and formatting of relevant natural products data a formidable challenge. Using a combination of manual curation and automated data mining approaches we have created a database of microbial natural products (The Natural Products Atlas, www.npatlas.org) that includes 24?594 compounds and contains referenced data for structure, compound names, source organisms, isolation references, total syntheses, and instances of structural reassignment. This database is accompanied by an interactive web portal that permits searching by structure, substructure, and physical properties. The Web site also provides mechanisms for visualizing natural products chemical space and dashboards for displaying author and discovery timeline data. These interactive tools offer a powerful knowledge base for natural products discovery with a central interface for structure and property-based searching and presents new viewpoints on structural diversity in natural products. The Natural Products Atlas has been developed under FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is integrated with other emerging natural product databases, including the Minimum Information About a Biosynthetic Gene Cluster (MIBiG) repository, and the Global Natural Products Social Molecular Networking (GNPS) platform. It is designed as a community-supported resource to provide a central repository for known natural product structures from microorganisms and is the first comprehensive, open access resource of this type. It is expected that the Natural Products Atlas will enable the development of new natural products discovery modalities and accelerate the process of structural characterization for complex natural products libraries.

The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery | ACS Central Science

Abstract:  Despite rapid evolution in the area of microbial natural products chemistry, there is currently no open access database containing all microbially produced natural product structures. Lack of availability of these data is preventing the implementation of new technologies in natural products science. Specifically, development of new computational strategies for compound characterization and identification are being hampered by the lack of a comprehensive database of known compounds against which to compare experimental data. The creation of an open access, community-maintained database of microbial natural product structures would enable the development of new technologies in natural products discovery and improve the interoperability of existing natural products data resources. However, these data are spread unevenly throughout the historical scientific literature, including both journal articles and international patents. These documents have no standard format, are often not digitized as machine readable text, and are not publicly available. Further, none of these documents have associated structure files (e.g., MOL, InChI, or SMILES), instead containing images of structures. This makes extraction and formatting of relevant natural products data a formidable challenge. Using a combination of manual curation and automated data mining approaches we have created a database of microbial natural products (The Natural Products Atlas, www.npatlas.org) that includes 24?594 compounds and contains referenced data for structure, compound names, source organisms, isolation references, total syntheses, and instances of structural reassignment. This database is accompanied by an interactive web portal that permits searching by structure, substructure, and physical properties. The Web site also provides mechanisms for visualizing natural products chemical space and dashboards for displaying author and discovery timeline data. These interactive tools offer a powerful knowledge base for natural products discovery with a central interface for structure and property-based searching and presents new viewpoints on structural diversity in natural products. The Natural Products Atlas has been developed under FAIR principles (Findable, Accessible, Interoperable, and Reusable) and is integrated with other emerging natural product databases, including the Minimum Information About a Biosynthetic Gene Cluster (MIBiG) repository, and the Global Natural Products Social Molecular Networking (GNPS) platform. It is designed as a community-supported resource to provide a central repository for known natural product structures from microorganisms and is the first comprehensive, open access resource of this type. It is expected that the Natural Products Atlas will enable the development of new natural products discovery modalities and accelerate the process of structural characterization for complex natural products libraries.

Are You Ready to ROR? An Inside Look at this New Organization Identifier Registry – The Scholarly Kitchen

“As a former full-time PID person (until recently I was ORCID’s Director of Communications), I am convinced of the important role that persistent identifiers (PIDs) play in supporting a robust, trusted, and open research information infrastructure. We already have open PIDs for research people (ORCID iDs) and research outputs (DOIs), but what about research organizations? While organization identifiers do already exist (Ringgold identifiers, for example, have been widely adopted; Digital Science’s GRID is still relatively new), until recently there has been no truly open equivalent. But that’s changing, as you will learn in this interview with the team behind the newly launched Research Organization Registry—ROR….”

 

Linked Research on the Decentralised Web

Abstract:  This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving.

The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems.

From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments.

Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers.

The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data.

Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information?. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud).

Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication.

Reaping the benefits of Open Data in public health

Abstract:  Open Data is part of a broad global movement that is not only advancing science and scientific communication but also transforming modern society and how decisions are made. What began with a call for Open Science and the rise of online journals has extended to Open Data, based on the premise that if reports on data are open, then the generated or supporting data should be open as well. There have been a number of advances in Open Data over the last decade, spearheaded largely by governments. A real benefit of Open Data is not simply that single databases can be used more widely; it is that these data can also be leveraged, shared and combined with other data. Open Data facilitates scientific collaboration, enriches research and advances analytical capacity to inform decisions. In the human and environmental health realms, for example, the ability to access and combine diverse data can advance early signal detection, improve analysis and evaluation, inform program and policy development, increase capacity for public participation, enable transparency and improve accountability. However, challenges remain. Enormous resources are needed to make the technological shift to open and interoperable databases accessible with common protocols and terminology. Amongst data generators and users, this shift also involves a cultural change: from regarding databases as restricted intellectual property, to considering data as a common good. There is a need to address legal and ethical considerations in making this shift. Finally, along with efforts to modify infrastructure and address the cultural, legal and ethical issues, it is important to share the information equitably and effectively. While there is great potential of the open, timely, equitable and straightforward sharing of data, fully realizing the myriad of benefits of Open Data will depend on how effectively these challenges are addressed.

Pubfair – A Framework for Sustainable, Distributed, Open Science Publishing Services

“This white paper provides the rationale and describes the high level architecture for an innovative publishing framework that positions publishing functionalities on top of the content managed by a distributed network of repositories. The framework is inspired by the vision and use cases outlined in the COAR Next Generation Repositories work, first published in November 2017 and further articulated in a funding proposal developed by a number of European partners.

By publishing this on Comments Press, we are seeking community feedback about the Pubfair framework in order to refine the functionalities and architecture, as well as to gauge community interest….

The idea of Pubfair is not to create another new system that competes with many others, but rather to leverage, improve and add value to existing institutional and funder investments in research infrastructures (in particular open repositories and open journal platforms). Pubfair positions repositories (and the content managed by repositories) as the foundation for a distributed, globally networked infrastructure for scholarly communication. It moves our thinking beyond the artificial distinction between green and gold open access by combining the strengths of open repositories with easy-to-use review and publishing tools for a multitude of research outputs….”

Pubfair – A Framework for Sustainable, Distributed, Open Science Publishing Services

“This white paper provides the rationale and describes the high level architecture for an innovative publishing framework that positions publishing functionalities on top of the content managed by a distributed network of repositories. The framework is inspired by the vision and use cases outlined in the COAR Next Generation Repositories work, first published in November 2017 and further articulated in a funding proposal developed by a number of European partners.

By publishing this on Comments Press, we are seeking community feedback about the Pubfair framework in order to refine the functionalities and architecture, as well as to gauge community interest….

The idea of Pubfair is not to create another new system that competes with many others, but rather to leverage, improve and add value to existing institutional and funder investments in research infrastructures (in particular open repositories and open journal platforms). Pubfair positions repositories (and the content managed by repositories) as the foundation for a distributed, globally networked infrastructure for scholarly communication. It moves our thinking beyond the artificial distinction between green and gold open access by combining the strengths of open repositories with easy-to-use review and publishing tools for a multitude of research outputs….”