Abstract: There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
“Developed by the UK OpenPlant Synthetic Biology Research Centre and the BioBricks Foundation, OpenMTA honors the rights of researchers and promotes safe, responsible laboratory practices. In addition, the tool is designed to work within the practical realm of tech transfer and to be adaptable to the needs of multiple groups globally.
Goals for OpenMTA include:
- Free access to the tool, with no royalties or other fees except for appropriate and nominal fees for preparation and distribution;
- The ability for researchers to modify or repurpose materials available through OpenMTA;
- Unrestricted selling and sharing of materials, whether it’s part of a collaboration or derivative work;
- Availability to all kinds of institutions including academic, industrial, federal and community research centers
In its approach to tech transfer, Open MTA is designed to reduce transaction costs, support research collaboration across institutions and even nations, and provide a way for researchers and their labs to be credited for the materials they share.”
“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”
“It is increasingly common for researchers to make their data freely available. This is often a requirement of funding agencies but also consistent with the principles of open science, according to which all research data should be shared and made available for reuse. Once data is reused, the researchers who have provided access to it should be acknowledged for their contributions, much as authors are recognised for their publications through citation. Hyoungjoo Park and Dietmar Wolfram have studied characteristics of data sharing, reuse, and citation and found that current data citation practices do not yet benefit data sharers, with little or no consistency in their format. More formalised citation practices might encourage more authors to make their data available for reuse.”
At ScienceOpen, we offer a range of next-generation indexing services. This includes a package especially for institutes, to help them gain the maximum visibility and re-use of the articles their researchers publish.
“Learning ecosystems must be agile enough to support the practices of the future. In using tools and platforms like LMS, educators have a desire to unbundle all of the components of a learning experience to remix open content and educational apps in unique and compelling ways….While emerging technological developments such as digital courseware and open educational resources (OER) have made it easier to engage with learning resources, significant issues of access and equity persist among students from low-income, minority, single-parent families, and other disadvantaged groups….”
“While there is significant progress with policy and a lively debate regarding the potential impact of open access publishing, few studies have examined academics’ behavior and attitudes to open access publishing (OAP) in scholarly journals. This article seeks to address this gap through an international and interdisciplinary survey of academics. Issues covered include: use of and intentions regarding OAP, and perceptions regarding advantages and disadvantages of OAP, journal article publication services, peer review, and reuse. Despite reporting engagement in OAP, academics were unsure about their future intentions regarding OAP. Broadly, academics identified the potential for wider circulation as the key advantage of OAP, and were more positive about its benefits than they were negative about its disadvantages. As regards services, rigorous peer review, followed by rapid publication were most valued. Academics reported strong views on reuse of their work; they were relatively happy with noncommercial reuse, but not in favor of commercial reuse, adaptations, and inclusion in anthologies. Comparing science, technology, and medicine with arts, humanities, and social sciences showed a significant difference in attitude on a number of questions, but, in general, the effect size was small, suggesting that attitudes are relatively consistent across the academic community.”
“Quantitative analysis of digitized text represents an exciting and challenging frontier of data science across a broad spectrum of disciplines. From the analysis of physicians’ notes to identify patients with diabetes, to the assessment of global happiness through the analysis of speech on Twitter, patterns in massive text corpora have led to important scientific advancements.
In this course we will cover several central computational and statistical methods for the analysis of text as data. Topics will include the manipulation and summarization of text data, dictionary methods of text analysis, prediction and classification with textual data, document clustering, text reuse measurement, and statistical topic models….”
“QDR selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry. The repository develops and publicizes common standards and methodologically informed practices for these activities, as well as for the reusing and citing of qualitative data. Four beliefs underpin the repository’s mission: data that can be shared and reused should be; evidence-based claims should be made transparently; teaching is enriched by the use of well-documented data; and rigorous social science requires common understandings of its research methods….”
“Digital scholarly book files should be open and flexible. This is as much a design question as it is a business question for publishers and libraries. The working group returned several times to the importance of scholarly book files being available in nonproprietary formats that allow for a variety of uses and re-uses…. Another pointed out that the backlist corpus of scholarly books in the humanities and social sciences is an invaluable resource for text-mining, but the ability to carry out that research at scale means that the underlying text of the books has to be easy to extract. “It’s so important to be able to ‘scrape’ the text,” one participant said, using a common term for gathering machine-readable characters from a human-readable artifact (for example, a scanned page image)….Whether a wider group of publishers and technology vendors will feel that they can enable these more expansive uses of a book file without upending the sustainability of the scholarly publishing system is a larger question than this project sought to answer….Our working group also pointed to other challenges for the future of the monograph that have little to do with its visual representation in a user interface: for example, what might be a viable long-term business model for monographs, and whether a greater share of the publishing of monographs in a free-to-read, open-access model can be made sustainable….As interest continues to grow in extending the open-access publishing model from journals to scholarly books, publishers and librarians are working to understand better the upfront costs that must be covered in order to operate a self-sustaining open-access monograph publishing program—costs that have been complicated to pin down because the production of any given scholarly book depends on partial allocations of staff time from many different staff members at a press, and different presses have different cost bases, as well….”