“In the continuing quest to make my PhD research comply with the ideals of open science, I’m uploading my protocols to protocols.io. This will create a detailed, publicly available, citable methods record (with a DOI!) for my research which aids with transparency, peer review, replication and re-use.”
“In 2016, within the FP7 Post-Grant Open Access Pilot, a sub-project focused on Alternative Funding Mechanisms for APC-free Open Access Journals was launched. Approximately one year later, we would like to share the main results of this workline with the public – as we believe these findings can be of interest for other initiatives and publishing platforms.”
“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”
“Scholarly document creation continues to face various obstacles. Scholarly text production requires more complex word processors than other forms of texts because of the complex structures of citations, formulas and figures. The need for peer review, often single-blind or double-blind, creates needs for document management that other texts do not require. Additionally, the need for collaborative editing, security and strict document access rules means that many existing word processors are imperfect solutions for academics. Nevertheless, most papers continue to be written using Microsoft Word (Sadeghi et al. 2017). We here analyze some of the problems with existing academic solutions and then present an argument why we believe that running an open source academic writing solution for academic purposes, such as Fidus Writer, on a Network Attached Storage (NAS) server could be a viable alternative.”
“As a political scientist who regularly encounters so-called “open data” in PDFs, this problem is particularly irritating. PDFs may have “portable” in their name, making them display consistently on various platforms, but that portability means any information contained in a PDF is irritatingly difficult to extract computationally.”
Abstract: Purpose: this paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, i.e. a set tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles, submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops.
Design: RASH has been developed in order to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow
Findings: the evaluation study confirmed that RASH can already be adopted in workshops, conferences and journals and can be quickly learnt by researchers who are familiar with HTML.
Research limitations: the evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technical savvy users. Moreover, additional tools are needed, e.g. for enabling additional conversion from/to existing formats such as OpenXML.
Practical implications: RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitate its automatic discovery, enable its linking to semantically related articles, provide access to data within the article in actionable form, and allow integration of data between papers.
Social implications: RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).
Value: RASH focuses strictly on writing the content of the paper (i.e., organisation of text + semantic annotations) and leaves all the issues about it validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework.
“As with all good innovators, Peter [Krautzberger, project lead for MathJax] is frustrated. He feels, for example, that advocates of open science focus heavily on sharing of supposedly neutral data, but are still not able to see beyond the PDF. For him open science should be more about how the Web can facilitate communications….”
“Our mission is to develop a universal format for scientific projects. A record of the scientific process that includes everything from preliminary ideas and research to methods and analysis, from laboratory notebooks and data to null results and proposed contributions. By integrating an array of software and data formats used in science, Guaana can provide a uniform digital footprint of the scientific process that is human and machine readable….”