Uses and Reuses of Scientific Data: The Data Creators’ Advantage · Harvard Data Science Review

Abstract:  Open access to data, as a core principle of open science, is predicated on assumptions that scientific data can be reused by other researchers. We test those assumptions by asking where scientists find reusable data, how they reuse those data, and how they interpret data they did not collect themselves. By conducting a qualitative meta-analysis of evidence on two long-term, distributed, interdisciplinary consortia, we found that scientists frequently sought data from public collections and from other researchers for comparative purposes such as “ground-truthing” and calibration. When they sought others’ data for reanalysis or for combining with their own data, which was relatively rare, most preferred to collaborate with the data creators. We propose a typology of data reuses ranging from comparative to integrative. Comparative data reuse requires interactional expertise, which involves knowing enough about the data to assess their quality and value for a specific comparison such as calibrating an instrument in a lab experiment. Integrative reuse requires contributory expertise, which involves the ability to perform the action, such as reusing data in a new experiment. Data integration requires more specialized scientific knowledge and deeper levels of epistemic trust in the knowledge products. Metadata, ontologies, and other forms of curation benefit interpretation for any kind of data reuse. Based on these findings, we theorize the data creators’ advantage, that those who create data have intimate and tacit knowledge that can be used as barter to form collaborations for mutual advantage. Data reuse is a process that occurs within knowledge infrastructures that evolve over time, encompassing expertise, trust, communities, technologies, policies, resources, and institutions.

High Court narrowly backs Ordnance Survey in ‘address wars’ case | UKAuthority

“In particular, the judgment will give ammunition to businesses wishing to re-use datasets created under the EU INSPIRE directive and those published under the Open Government Licence.

The litigation began in 2016 when 77m, a small business registered in Surrey, sought a declaration from Ordnance Survey that a product called Matrix, which contains some 28 million residential and non-residential addresses, did not infringe any Ordnance Survey intellectual property rights.

 OS responded with a defence and a counterclaim, claiming infringement of both copyright and database rights. The case was transferred from the Intellectual Property Enterprise Court to the High Court, where it was heard last summer by one of England and Wales’ most experienced patents judges, Sir Colin Birss (Mr Justice Birss)….”

Announcing the Journal of the Medical Library Association’s data sharing policy | Akers | Journal of the Medical Library Association

Abstract:  As librarians are generally advocates of open access and data sharing, it is a bit surprising that peer-reviewed journals in the field of librarianship have been slow to adopt data sharing policies. Starting October 1, 2019, the Journal of the Medical Library Association (JMLA) is taking a step forward and implementing a firm data sharing policy to increase the rigor and reproducibility of published research, enable data reuse, and promote open science. This editorial explains the data sharing policy, describes how compliance with the policy will fit into the journal’s workflow, and provides further guidance for preparing for data sharing.

 

Announcing the Journal of the Medical Library Association’s data sharing policy | Akers | Journal of the Medical Library Association

Abstract:  As librarians are generally advocates of open access and data sharing, it is a bit surprising that peer-reviewed journals in the field of librarianship have been slow to adopt data sharing policies. Starting October 1, 2019, the Journal of the Medical Library Association (JMLA) is taking a step forward and implementing a firm data sharing policy to increase the rigor and reproducibility of published research, enable data reuse, and promote open science. This editorial explains the data sharing policy, describes how compliance with the policy will fit into the journal’s workflow, and provides further guidance for preparing for data sharing.

 

Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study | Jones | Journal of Medical Internet Research

Abstract. Background: The literature abounds with increasing numbers of research studies using genomic data in combination with health data (eg, health records and phenotypic and lifestyle data), with great potential for large-scale research and precision medicine. However, concerns have been raised about social acceptability and risks posed for individuals and their kin. Although there has been public engagement on various aspects of this topic, there is a lack of information about public views on data access models.

Objective: This study aimed to address the lack of information on the social acceptability of access models for reusing genomic data collected for research in conjunction with health data. Models considered were open web-based access, released externally to researchers, and access within a data safe haven.

Methods: Views were ascertained using a series of 8 public workshops (N=116). The workshops included an explanation of benefits and risks in using genomic data with health data, a facilitated discussion, and an exit questionnaire. The resulting quantitative data were analyzed using descriptive and inferential statistics, and the qualitative data were analyzed for emerging themes.

Results: Respondents placed a high value on the reuse of genomic data but raised concerns including data misuse, information governance, and discrimination. They showed a preference for giving consent and use of data within a safe haven over external release or open access. Perceived risks with open access included data being used by unscrupulous parties, with external release included data security, and with safe havens included the need for robust safeguards.

Conclusions: This is the first known study exploring public views of access models for reusing anonymized genomic and health data in research. It indicated that people are generally amenable but prefer data safe havens because of perceived sensitivities. We recommend that public views be incorporated into guidance on models for the reuse of genomic and health data.

Credit data generators for data reuse

“Much effort has gone towards crafting mandates and standards for researchers to share their data13. Considerably less time has been spent measuring just how valuable data sharing is, or recognizing the scientific contributions of the people responsible for those data sets. The impact of research continues to be measured by primary publications, rather than by subsequent uses of the data….

To incentivize the sharing of useful data, the scientific enterprise needs a well-defined system that links individuals with reuse of data sets they generate4….

A system in which researchers are regularly recognized for generating data that become useful to other researchers could transform how academic institutions evaluate faculty members’ contributions to science….”

Internal Contradictions with Open Access Books – The Scholarly Kitchen

Knowledge Unlatched (KU) is back in the news. Founded as a not-for-profit open access (OA) book publisher by Dr. Frances Pinter, the organization has gone through a couple iterations until re-emerging as a for-profit company headed by Dr. Sven Fund. (Despite its for-profit status, KU continues to use its old URL, with a .org domain.) KU is now hard at work on developing its program, including its business model. A major piece of this, recently announced in an interview by Fund, is the Open Research Library (ORL), which aims to be a comprehensive collection of all OA books, of which there are now (according to KU) about 15,000-20,000, with approximately 4,000 more being added every year. KU can aggregate all these books, which have many publishers, because of the terms of their Creative Commons (CC) licenses, which encourage reuse and sharing. And that is what has set off a seismic disturbance.

 

Do Scientists Reuse Open Data? – Sage Bionetworks

Briefly, these are my recommendations for those of you who are engaging in open data/data sharing efforts for the purpose of reuse:

  • First, no matter what data you collect, keep in mind that reuse is only one reason for data sharing. Data should be released for transparency as much as for reuse.
  • Give up on the idea that all the data you are collecting, curating, and releasing will be widely reused. Some will, some will not, and some will but in unexpected ways.
  • If you truly want to maximize reuse, first assess potential for reuse, then start data collection. Open datasets can be reused in many ways, by different sets of users. What can your data be reused for and by whom?
  • Hire or consult with data curators who understand the curation needs of your potential users, and (equally important) their science workflows, agendas, and interests.
  • Do not try to curate the data “for the entire world.” First, focus on the needs of your immediate users.
  • Facilitate the formation of a community of practice around your data. Once you have identified potential users, bring them together by promoting community norms, encouraging collaboration, and adopting ad hoc curation practices. But, remember that communities of practice are not built out of the blue. Potential users should share a pre-existing interest in a kind of data, or in a specific method, sub-discipline, process, etc.
  • Once you have identified which datasets might be reused for which goals, you can assign different levels of curation and access, accordingly.
  • Encourage collaboration (and co-authorship) between data creators, data curators, and data re-users….”

Sage Bionetworks Executive Urges Adoption of Standards to Create ‘Open Science’ | GenomeWeb

Since All of Us is collecting samples and health data from 1 million people at healthcare facilities all over the country, the only way this information dissemination will work is because NIH and its partners are standardizing the results according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model. All of Us also is normalizing phenotypic information on the Substitutable Medical Apps, Reusable Technology (SMART) on FHIRframework, based on the Fast Healthcare Interoperability Resources (FHIR) standard….

In a keynote address to open the annual Bio-IT World Conference & Expo here yesterday, John Wilbanks, chief commons officer at Sage Bionetworks, was clear about his preference for those standards to promote interoperability.  

“Choose OMOP or SMART on FHIR and don’t choose anything else,” he said. The openness of standards and of data itself is key, according to Wilbanks, a longtime advocate of open data….

Sometimes that is because scientists tend to strip out many of the insights before they report results, but often it is due to the fact that researchers do not have or will not make the time to annotate their data in a way that would make their findings more useful to others.

“Until, in my opinion, we figure out how to get machine learning and [artificial intelligence] to do that annotation for us, it’s going to be really hard to have data get as reusable as open-source software is,” Wilbanks said. “But we will eventually get there.” …”