COVID?19 and the boundaries of open science and innovation: Lessons of traceability from genomic data sharing and biosecurity: EMBO reports: Vol 0, No 0

“While conventional policies and systems for data sharing and scholarly publishing are being challenged and new Open Science policies are being developed, traceability should be a key function for guaranteeing socially responsible and robust policies. Full access to the available data and the ability to trace it back to its origins assure data quality and processing legitimacy. Moreover, traceability would be important for other agencies and organisations – funding agencies, database managers, institutional review boards and so on – for undertaking systematic reviews, data curation or process oversights. Thus, the term “openness” means much more than just open access to published data but must include all aspects of data generation, analysis and dissemination along with other organisations and agencies than just research groups and publishers. The COVID?19 crisis has highlighted the challenges and shortfalls of the current notions of openness and it should serve as an impetus to further advance towards real Open Science.”


Improving access and delivery of academic content – a survey of current & emerging trends | Musings about librarianship

“While allowing users to gain access to paywalled academic content aka delivery services is often seen to be less sexy than discovery it is still an important part of the researcher workflow that is worth looking at. In particular, I will argue that in the past few years we have seen a renewed interest in this part of the workflow and may potentially start to see some big changes in the way we provide access to academic content in the near future.

Note: The OA discovery and delivery front has changed a lot since 2017, with Unpaywall been a big part of the story, but for this blog post I will focus on delivery aspects of paywalled content. 1.0 Access and delivery – an age old problem


1.1 RA21, Seamless Access and getFTR


1.2 Campus Activated Subscriber Access (CASA)

1.3 Browser extensions/”Access Brokers” 1.4 Content syndication partnership between Springer Nature and ResearchGate (new) 1.5 Is the sun slowing setting on library link resolvers? 1.6 The Sci-hub effect?

1.7 Privacy implications …”

A pseudonymisation protocol with implicit and explicit consent routes for health records in federated ledgers – IEEE Journals & Magazine

Abstract:  Healthcare data for primary use (diagnosis) may be encrypted for confidentiality purposes; however, secondary uses such as feeding machine learning algorithms requires open access. Full anonymity has no traceable identifiers to report diagnosis results. Moreover, implicit and explicit consent routes are of practical importance under recent data protection regulations (GDPR), translating directly into break-the-glass requirements. Pseudonymisation is an acceptable compromise when dealing with such orthogonal requirements and is an advisable measure to protect data. Our work presents a pseudonymisation protocol that is compliant with implicit and explicit consent routes. The protocol is constructed on a (t,n)-threshold secret sharing scheme and public key cryptography. The pseudonym is safely derived from a fragment of public information without requiring any data-subject’s secret. The method is proven secure under reasonable cryptographic assumptions and scalable from the experimental results.


Dataverse and OpenDP: Tools for Privacy-Protective Analysis in the Cloud | Mercè Crosas

“When big data intersects with highly sensitive data, both opportunity to society and risks abound. Traditional approaches for sharing sensitive data are known to be ineffective in protecting privacy. Differential Privacy, deriving from roots in cryptography, is a strong mathematical criterion for privacy preservation that also allows for rich statistical analysis of sensitive data. Differentially private algorithms are constructed by carefully introducing “random noise” into statistical analyses so as to obscure the effect of each individual data subject.    OpenDP is an open-source project for the differential privacy community to develop general-purpose, vetted, usable, and scalable tools for differential privacy, which users can simply, robustly and confidently deploy. 

Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.  A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).

This session examines ongoing efforts to realize a combined use case for these projects that will offer academic researchers privacy-preserving access to sensitive data. This would allow both novel secondary reuse and replication access to data that otherwise is commonly locked away in archives.  The session will also explore the potential impact of this work outside the academic world.”

OpenDP Hiring Scientific Staff | OpenDP

“The OpenDP project seeks to hire 1-2 scientists to work with faculty directors Gary King and Salil Vadhan and the OpenDP Community to formulate and advance the scientific goals of OpenDP and solve research problems that are needed for its success. Candidates should have a graduate-level degree (preferably a PhD), familiarity with differential privacy, and one or both of the following: 

Experience with implementing software for data science, privacy, and/or security, and an interest in working with software engineers to develop the OpenDP codebase.
Experience with applied statistics, and an interest in working with domain scientists to apply OpenDP software to data-sharing problems in their field. In particular, we are looking for a researcher to engage on an immediate project on Covid-19 and mobility and epidemiology.  See HDSI Fellow for more details….”

Computational social science: Obstacles and opportunities | Science

“An alternative has been to use proprietary data collected for market research (e.g., Comscore, Nielsen), with methods that are sometimes opaque and a pricing structure that is prohibitive to most researchers.

We believe that this approach is no longer acceptable as the mainstay of CSS, as pragmatic as it might seem in light of the apparent abundance of such data and limited resources available to a research community in its infancy. We have two broad concerns about data availability and access.

First, many companies have been steadily cutting back data that can be pulled from their platforms (5). This is sometimes for good reasons—regulatory mandates (e.g., the European Union General Data Protection Regulation), corporate scandal (Cambridge Analytica and Facebook)—however, a side effect is often to shut down avenues of potentially valuable research. The susceptibility of data availability to arbitrary and unpredictable changes by private actors, whose cooperation with scientists is strictly voluntary, renders this system intrinsically unreliable and potentially biased in the science it produces.

Second, data generated by consumer products and platforms are imperfectly suited for research purposes (6). Users of online platforms and services may be unrepresentative of the general population, and their behavior may be biased in unknown ways. Because the platforms were never designed to answer research questions, the data of greatest relevance may not have been collected (e.g., researchers interested in information diffusion count retweets because that is what is recorded), or may be collected in a way that is confounded by other elements of the system (e.g., inferences about user preferences are confounded by the influence of the company’s ranking and recommendation algorithms). The design, features, data recording, and data access strategy of platforms may change at any time because platform owners are not incentivized to maintain instrumentation consistency for the benefit of research.

For these reasons, research derived from such “found” data is inevitably subject to concerns about its internal and external validity, and platform-based data, in particular, may suffer from rapid depreciation as those platforms change (7). Moreover, the raw data are often unavailable to the research community owing to privacy and intellectual property concerns, or may become unavailable in the future, thereby impeding the reproducibility and replication of results….

Despite the limitations noted above, data collected by private companies are too important, too expensive to collect by any other means, and too pervasive to remain inaccessible to the public and unavailable for publicly funded research (8). Rather than eschewing collaboration with industry, the research community should develop enforceable guidelines around research ethics, transparency, researcher autonomy, and replicability. We anticipate that many approaches will emerge in coming years that will be incentive compatible for involved stakeholders….

Privacy-preserving, shared data infrastructures, designed to support scientific research on societally important challenges, could collect scientifically motivated digital traces from diverse populations in their natural environments, as well as enroll massive panels of individuals to participate in designed experiments in large-scale virtual labs. These infrastructures could be driven by citizen contributions of their data and/or their time to support the public good, or in exchange for explicit compensation. These infrastructures should use state-of-the-art security, with an escalation checklist of security measures depending on the sensitivity of the data. These efforts need to occur at both the university and cross-university levels. Finally, these infrastructures should capture and document the metadata that describe the data collection process and incorporate sound ethical principles for data collection and use….”

Patients grow more open with their health data during pandemic – Axios

“Americans are more willing in the wake of the coronavirus to share their medical data in order to take advantage of the benefits of telemedicine.

Why it matters: For telemedicine to succeed, patients have to be open to sharing possibly sensitive personal health information online — and the demands of the COVID-19 pandemic seem to have helped lower that bar….”

Responsible, practical genomic data sharing that accelerates research | Nature Reviews Genetics

Abstract:  Data sharing anchors reproducible science, but expectations and best practices are often nebulous. Communities of funders, researchers and publishers continue to grapple with what should be required or encouraged. To illuminate the rationales for sharing data, the technical challenges and the social and cultural challenges, we consider the stakeholders in the scientific enterprise. In biomedical research, participants are key among those stakeholders. Ethical sharing requires considering both the value of research efforts and the privacy costs for participants. We discuss current best practices for various types of genomic data, as well as opportunities to promote ethical data sharing that accelerates science by aligning incentives.


As data-sharing becomes more crucial, agencies say industry can help with privacy issues

“Agencies like the Census Bureau want better commercial off-the-shelf (COTS) technologies for protecting data privacy and computation, so they can securely link datasets and make predictions about the coronavirus pandemic….

Agencies like the Census Bureau want better commercial off-the-shelf (COTS) technologies for protecting data privacy and computation, so they can securely link datasets and make predictions about the coronavirus pandemic….”


Assessing Open Access Audio – Full Text View –

Abstract:  The medical encounter can be overwhelming in term of the amount of information discussed, its technical nature, and the anxiety it can generate. Easy access to a secure audio recording from any internet enabled device is an available low cost technology that allows patients to “revisit the visit” either alone or sharing with caretakers and family. It has been introduced and tested outside the VA with evidence that it increases patient recall and understanding and may even improve physician performance. Little is known, however, about whether and to what extent these effects lead to better outcomes, such as improved treatment plan adherence and chronic disease self-management. This study is a randomized controlled trial designed ascertain whether easy access to audio recordings of the medical visit improves patients perception that they understand and can manage their own care, and leads to a variety of improved outcomes, such as better blood pressure and diabetes control, and fewer emergency department visits and hospitalizations.