Credit data generators for data reuse

“Much effort has gone towards crafting mandates and standards for researchers to share their data13. Considerably less time has been spent measuring just how valuable data sharing is, or recognizing the scientific contributions of the people responsible for those data sets. The impact of research continues to be measured by primary publications, rather than by subsequent uses of the data….

To incentivize the sharing of useful data, the scientific enterprise needs a well-defined system that links individuals with reuse of data sets they generate4….

A system in which researchers are regularly recognized for generating data that become useful to other researchers could transform how academic institutions evaluate faculty members’ contributions to science….”

Internal Contradictions with Open Access Books – The Scholarly Kitchen

Knowledge Unlatched (KU) is back in the news. Founded as a not-for-profit open access (OA) book publisher by Dr. Frances Pinter, the organization has gone through a couple iterations until re-emerging as a for-profit company headed by Dr. Sven Fund. (Despite its for-profit status, KU continues to use its old URL, with a .org domain.) KU is now hard at work on developing its program, including its business model. A major piece of this, recently announced in an interview by Fund, is the Open Research Library (ORL), which aims to be a comprehensive collection of all OA books, of which there are now (according to KU) about 15,000-20,000, with approximately 4,000 more being added every year. KU can aggregate all these books, which have many publishers, because of the terms of their Creative Commons (CC) licenses, which encourage reuse and sharing. And that is what has set off a seismic disturbance.

 

Do Scientists Reuse Open Data? – Sage Bionetworks

Briefly, these are my recommendations for those of you who are engaging in open data/data sharing efforts for the purpose of reuse:

  • First, no matter what data you collect, keep in mind that reuse is only one reason for data sharing. Data should be released for transparency as much as for reuse.
  • Give up on the idea that all the data you are collecting, curating, and releasing will be widely reused. Some will, some will not, and some will but in unexpected ways.
  • If you truly want to maximize reuse, first assess potential for reuse, then start data collection. Open datasets can be reused in many ways, by different sets of users. What can your data be reused for and by whom?
  • Hire or consult with data curators who understand the curation needs of your potential users, and (equally important) their science workflows, agendas, and interests.
  • Do not try to curate the data “for the entire world.” First, focus on the needs of your immediate users.
  • Facilitate the formation of a community of practice around your data. Once you have identified potential users, bring them together by promoting community norms, encouraging collaboration, and adopting ad hoc curation practices. But, remember that communities of practice are not built out of the blue. Potential users should share a pre-existing interest in a kind of data, or in a specific method, sub-discipline, process, etc.
  • Once you have identified which datasets might be reused for which goals, you can assign different levels of curation and access, accordingly.
  • Encourage collaboration (and co-authorship) between data creators, data curators, and data re-users….”

Sage Bionetworks Executive Urges Adoption of Standards to Create ‘Open Science’ | GenomeWeb

Since All of Us is collecting samples and health data from 1 million people at healthcare facilities all over the country, the only way this information dissemination will work is because NIH and its partners are standardizing the results according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model. All of Us also is normalizing phenotypic information on the Substitutable Medical Apps, Reusable Technology (SMART) on FHIRframework, based on the Fast Healthcare Interoperability Resources (FHIR) standard….

In a keynote address to open the annual Bio-IT World Conference & Expo here yesterday, John Wilbanks, chief commons officer at Sage Bionetworks, was clear about his preference for those standards to promote interoperability.  

“Choose OMOP or SMART on FHIR and don’t choose anything else,” he said. The openness of standards and of data itself is key, according to Wilbanks, a longtime advocate of open data….

Sometimes that is because scientists tend to strip out many of the insights before they report results, but often it is due to the fact that researchers do not have or will not make the time to annotate their data in a way that would make their findings more useful to others.

“Until, in my opinion, we figure out how to get machine learning and [artificial intelligence] to do that annotation for us, it’s going to be really hard to have data get as reusable as open-source software is,” Wilbanks said. “But we will eventually get there.” …”

Assessment of Postdonation Outcomes in US Living Kidney Donors Using Publicly Available Data Sets. | Hypertension | JAMA Network Open | JAMA Network

Abstract:  Importance  There are limited resources providing postdonation conditions that can occur in living donors (LDs) of solid-organ transplant. Consequently, it is difficult to visualize and understand possible postdonation outcomes in LDs.

Objective  To assemble an open access resource that is representative of the demographic characteristics in the US national registry, maintained by the Organ Procurement and Transplantation Network and administered by the United Network for Organ Sharing, but contains more follow-up information to help to examine postdonation outcomes in LDs.

Design, Setting, and Participants  Cohort study in which the data for the resource and analyses stemmed from the transplant data set derived from 27 clinical studies from the ImmPort database, which is an open access repository for clinical studies. The studies included data collected from 1963 to 2016. Data from the United Network for Organ Sharing Organ Procurement and Transplantation Network national registry collected from October 1987 to March 2016 were used to determine representativeness. Data analysis took place from June 2016 to May 2018. Data from 20 ImmPort clinical studies (including clinical trials and observational studies) were curated, and a cohort of 11?263 LDs was studied, excluding deceased donors, LDs with 95% or more missing data, and studies without a complete data dictionary. The harmonization process involved the extraction of common features from each clinical study based on categories that included demographic characteristics as well as predonation and postdonation data.

Main Outcomes and Measures  Thirty-six postdonation events were identified, represented, and analyzed via a trajectory network analysis.

Results  The curated data contained 10?869 living kidney donors (median [interquartile range] age, 39 [31-48] years; 6175 [56.8%] women; and 9133 [86.6%] of European descent). A total of 9558 living kidney donors with postdonation data were analyzed. Overall, 1406 LDs (14.7%) had postdonation events. The 4 most common events were hypertension (806 [8.4%]), diabetes (190 [2.0%]), proteinuria (171 [1.8%]), and postoperative ileus (147 [1.5%]). Relatively few events (n?=?269) occurred before the 2-year postdonation mark. Of the 1746 events that took place 2 years or more after donation, 1575 (90.2%) were nonsurgical; nonsurgical conditions tended to occur in the wide range of 2 to 40 years after donation (odds ratio, 38.3; 95% CI, 4.12-1956.9).

Conclusions and Relevance  Most events that occurred more than 2 years after donation were nonsurgical and could occur up to 40 years after donation. Findings support the construction of a national registry for long-term monitoring of LDs and confirm the value of secondary reanalysis of clinical studies.

Could Remixing Old MOOCs Give New Life to Free Online Education? | EdSurge News

It’s common these days to hear that free online mega-courses, called MOOCs, failed to deliver on their promise of educating the masses. But one outcome of that push towards open online courses was plenty of high-quality teaching material.

Now, one of the first professors to try out MOOCs says he has a way to reuse bits and pieces of the courses created during that craze in a way that might deliver on the initial promise.

The idea comes from Robert Lue, a biology professor at Harvard University who was the founding faculty director of HarvardX, the college’s effort to build MOOCs. He’s leading a new platform called LabXChange that aims to let professors, teachers or anyone mix together their own free online course from pieces of other courses….”

Blockchain and OECD data repositories: opportunities and policymaking implications | Library Hi Tech | Vol 37, No 1

Abstract:  The purpose of this paper is to employ the case of Organization for Economic Cooperation and Development (OECD) data repositories to examine the potential of blockchain technology in the context of addressing basic contemporary societal concerns, such as transparency, accountability and trust in the policymaking process. Current approaches to sharing data employ standardized metadata, in which the provider of the service is assumed to be a trusted party. However, derived data, analytic processes or links from policies, are in many cases not shared in the same form, thus breaking the provenance trace and making the repetition of analysis conducted in the past difficult. Similarly, it becomes tricky to test whether certain conditions justifying policies implemented still apply. A higher level of reuse would require a decentralized approach to sharing both data and analytic scripts and software. This could be supported by a combination of blockchain and decentralized file system technology.

Design/methodology/approach

 

The findings presented in this paper have been derived from an analysis of a case study, i.e., analytics using data made available by the OECD. The set of data the OECD provides is vast and is used broadly. The argument is structured as follows. First, current issues and topics shaping the debate on blockchain are outlined. Then, a redefinition of the main artifacts on which some simple or convoluted analytic results are based is revised for some concrete purposes. The requirements on provenance, trust and repeatability are discussed with regards to the architecture proposed, and a proof of concept using smart contracts is used for reasoning on relevant scenarios.

Findings

 

A combination of decentralized file systems and an open blockchain such as Ethereum supporting smart contracts can ascertain that the set of artifacts used for the analytics is shared. This enables the sequence underlying the successive stages of research and/or policymaking to be preserved. This suggests that, in turn, and ex post, it becomes possible to test whether evidence supporting certain findings and/or policy decisions still hold. Moreover, unlike traditional databases, blockchain technology makes it possible that immutable records can be stored. This means that the artifacts can be used for further exploitation or repetition of results. In practical terms, the use of blockchain technology creates the opportunity to enhance the evidence-based approach to policy design and policy recommendations that the OECD fosters. That is, it might enable the stakeholders not only to use the data available in the OECD repositories but also to assess corrections to a given policy strategy or modify its scope.

Blockchain and OECD data repositories: opportunities and policymaking implications | Library Hi Tech | Vol 37, No 1

Abstract:  The purpose of this paper is to employ the case of Organization for Economic Cooperation and Development (OECD) data repositories to examine the potential of blockchain technology in the context of addressing basic contemporary societal concerns, such as transparency, accountability and trust in the policymaking process. Current approaches to sharing data employ standardized metadata, in which the provider of the service is assumed to be a trusted party. However, derived data, analytic processes or links from policies, are in many cases not shared in the same form, thus breaking the provenance trace and making the repetition of analysis conducted in the past difficult. Similarly, it becomes tricky to test whether certain conditions justifying policies implemented still apply. A higher level of reuse would require a decentralized approach to sharing both data and analytic scripts and software. This could be supported by a combination of blockchain and decentralized file system technology.

Design/methodology/approach

 

The findings presented in this paper have been derived from an analysis of a case study, i.e., analytics using data made available by the OECD. The set of data the OECD provides is vast and is used broadly. The argument is structured as follows. First, current issues and topics shaping the debate on blockchain are outlined. Then, a redefinition of the main artifacts on which some simple or convoluted analytic results are based is revised for some concrete purposes. The requirements on provenance, trust and repeatability are discussed with regards to the architecture proposed, and a proof of concept using smart contracts is used for reasoning on relevant scenarios.

Findings

 

A combination of decentralized file systems and an open blockchain such as Ethereum supporting smart contracts can ascertain that the set of artifacts used for the analytics is shared. This enables the sequence underlying the successive stages of research and/or policymaking to be preserved. This suggests that, in turn, and ex post, it becomes possible to test whether evidence supporting certain findings and/or policy decisions still hold. Moreover, unlike traditional databases, blockchain technology makes it possible that immutable records can be stored. This means that the artifacts can be used for further exploitation or repetition of results. In practical terms, the use of blockchain technology creates the opportunity to enhance the evidence-based approach to policy design and policy recommendations that the OECD fosters. That is, it might enable the stakeholders not only to use the data available in the OECD repositories but also to assess corrections to a given policy strategy or modify its scope.