“What happens to these social practices and imaginaries of quantification when readily available digital technologies facilitate the creation, analysis and reproduction of data by different publics? What kinds of shifts, dynamics, controversies, visions and programmes can be observed when data goes digital? One recent answer to these questions can be found in the phenomenon of open data, which can be understood as set of ideas and conventions aiming to turn information into a re-usable public resource.”
“The release of the “Higher Education Student Statistics: UK, 2016/2017” (Statistical First Release 247) by HESA was accompanied around the sector by a series of sudden sharp intakes of breath in institutional data offices. It represents a brave and bold move into new ways of presenting and sharing data, and showed off a new format that will delight some and disappoint others. In this article I look at what has changed, and why.
The dash for designation. In applying for Designated Data Body status in England, HESA has made a move towards offering “open data”, suggesting that “From 2021 all of our publications will be available in open data format, allowing additional access to the information we enrich.” The Open Data Institute defines open data as “data that anyone can access, use or share,” which sounds like a pretty good thing. In many cases, though, open data has simply meant data that is available under an open (usually Creative Commons) licence – good to have legal clarity, but not at all the same as providing easily usable data.. HESA should be lauded for making this move for SFR248, but it is only a starting point….”
“1. I support its call to move beyond PDFs. This is necessary to bypass publisher locks and facilitate reuse, text mining, access by the visually impaired, and access in bandwidth-poor parts of the world.
2. I applaud its recognition of no-fee or no-APC open-access journals, their existence, their value, and the fact that a significant number of authors will always depend on them.
3. I join its call for redirecting funds now spent on subscription journals to support OA alternatives.
4. I endorse its call to reform methods of research evaluation. If we want to assess quality, we must stop assuming that impact and prestige are good proxies for quality. If we want to assess impact, we must stop using metrics that measure it badly and create perverse incentives to put prestige ahead of both quality and access.
5. I support its call for infrastructures that are proof against privatization. No matter how good proprietary and closed-source platforms may initially be, they are subject to acquisition and harmful mutation beyond the control of the non-profit academic world. Even without acquisition, their commitment to OA is contingent on the market, and they carry a permanent risk of trapping rather than liberating knowledge. The research community cannot afford to entrust its research to platforms carrying that risk.
6. Finally I support what it terms bibliodiversity. While we must steer clear of closed-source infrastructure, subject to privatization and enclosure, we must also steer clear of platform monocultures, subject to rigidity, stagnation, and breakage. Again, no matter how good a monoculture platform may initially be, in the long run it cannot be better than an ecosystem of free and open-source, interoperable components, compliant with open standards, offering robustness, modularity, flexibility, freedom to create better modules without rewriting the whole system, freedom to pick modules that best meet local needs, and freedom to scale up to meet global needs without first overcoming centralized constraints or unresponsive decision-makers. …”
“In the continuing quest to make my PhD research comply with the ideals of open science, I’m uploading my protocols to protocols.io. This will create a detailed, publicly available, citable methods record (with a DOI!) for my research which aids with transparency, peer review, replication and re-use.”
“In 2016, within the FP7 Post-Grant Open Access Pilot, a sub-project focused on Alternative Funding Mechanisms for APC-free Open Access Journals was launched. Approximately one year later, we would like to share the main results of this workline with the public – as we believe these findings can be of interest for other initiatives and publishing platforms.”
“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”
“Scholarly document creation continues to face various obstacles. Scholarly text production requires more complex word processors than other forms of texts because of the complex structures of citations, formulas and figures. The need for peer review, often single-blind or double-blind, creates needs for document management that other texts do not require. Additionally, the need for collaborative editing, security and strict document access rules means that many existing word processors are imperfect solutions for academics. Nevertheless, most papers continue to be written using Microsoft Word (Sadeghi et al. 2017). We here analyze some of the problems with existing academic solutions and then present an argument why we believe that running an open source academic writing solution for academic purposes, such as Fidus Writer, on a Network Attached Storage (NAS) server could be a viable alternative.”