“Although the practice is in vogue, it is complicated and requires an understanding of legal, ethical and scientific considerations. Here are six ways to avoid common data-sharing mistakes….”
“If you’ve spent any time stoking your curiosity with the UC Berkeley Library’s new online Digital Collections website, you’ve likely discovered all types of treasures digitized from the Library’s collections. The Library has already scanned and made available a virtual mountain of materials, from a photo of folk icon Joan Baez singing in front of Sproul Hall in 1964, to (almost) the entire run of the Daily Californian student newspaper.
The effort is part of the Library’s moonshot goal of wanting to make its estimated 200 million items from its special collections (rare books, manuscripts, photographs, archives, and ephemera) available online for the world to discover and use. But there’s a catch: Before institutions can reproduce materials and publish them online for worldwide access, they have to sort out complicated legal and ethical questions — ones that often stop libraries and other cultural heritage organizations from being able to move forward in setting these treasures free.
The good news? It just got easier to navigate these challenges, thanks to newly released responsible access workflows developed by the Library, which stand to benefit not only UC Berkeley’s digitization efforts, but also those of cultural heritage institutions such as museums, archives, and libraries throughout the nation….”
“OpenSAFELY is a new secure analytics platform for electronic health records in the NHS, created to deliver urgent results during the global COVID-19 emergency. It is now successfully delivering analyses across more than 24 million patients’ full pseudonymised primary care NHS records, with more to follow shortly. All our analytic software is open for security review, scientific review, and re-use. OpenSAFELY uses a new model for enhanced security and timely access to data: we don’t transport large volumes of potentially disclosive pseudonymised patient data off-site; instead, trusted analysts can run large scale computation across live pseudonymised patient records inside the data centre of the electronic health records software company. This pragmatic and secure approach has allowed us to deliver our first analyses in just five weeks from project start.”
“First, deductive disclosure—discerning an individual’s identity and associated information in a dataset—is a major concern that needs to be taken very seriously. In human biology, we often ask participants to volunteer potentially sensitive or embarrassing information…
Second, we have to be careful about imposing expectations for data sharing that become overly expensive or burdensome. In particular, I worry about the potential impacts on students and junior scholars who are often short on time and money. Of the data repositories recommended in the article, many impose user fees. Furthermore, we should not underestimate the amount of effort it takes to prepare and upload datasets, codebooks, summary statistics, and analysis files for each publication….”
“In increasingly knowledge-based societies and economies, data are a key resource. Enhanced access to publicly funded data enables research and innovation, and has far-reaching effects on resource efficiency, productivity and competitiveness, creating benefits for society at large. Yet these benefits must also be balanced against associated risks to privacy, intellectual property, national security and the public interest. This report presents current policy practice to promote access to publicly funded data for science, technology and innovation, as well as policy challenges for the future. It examines national policies and international initiatives, and identifies seven issues that require policy attention….”
“The Environmental Protection Agency’s (EPA) independent board of science advisers had harsh words for an agency plan to limit the types of studies it considers when crafting regulations, saying the EPA had failed to justify the need for the policy.
The policy was first proposed by former EPA Administrator Scott Pruitt in 2018 to battle “secret science.” He argued that in order to increase transparency, the agency should limit consideration of studies that don’t share their underlying data….
The SAB’s review is consistent with longstanding criticism of the proposal, as science and medical groups have argued it will lead the EPA to ignore important public health research that must protect the privacy of human subjects….”
“As you may be aware, we have been laying the groundwork to launch OpenDP. It is a community effort to build a suite of trustworthy tools for privacy-protective analysis of sensitive personal data, focused on an open-source library of algorithms for generating statistical releases with the strong protections of differential privacy.
On May 13th — 15th from 11 AM — 3 PM EDT each day, we will hold an online workshop to share detailed plans for OpenDP and obtain community feedback on them. We will cover topics such as the programming framework, governance, system integrations, use cases, statistical functionality, and collaborations.
A detailed agenda and a registration form for the workshop, breakout sessions, and the OpenDP mailing list are available at OpenDP registration. Please register by May 4….”
“All the above facts and issues contribute to the reluctance towards data sharing that is a prominent problem ever since, as in general nobody is tempted to invite severe critique. In addition, data sharing also takes away ownership of the data from the individual scientist, which also does not raise the interest in data sharing, even if this practice in fact is ethical, especially if data are generated using public or social security money.
This situation has recently substantially worsened, as a result of the introduction of the general data protection rule (GDPR) in Europe, which is echoed by similar yet not as detrimental legal frameworks in other countries. The introduction of GDPR, while theoretically likely starting from the positive aim to protect individuals from exploitation, by now has had a severe negative impact on science (and likely also on other areas, not the topic of this article) . GDPR has been (ab)used to refuse sharing of raw data. As a result, interpretation of the actual data collected in a study is left solely to the scientists conducting the study, any re-evaluation or attempt to reproduce results by the scientific community is not possible in such a case, e.g. [2,3]. Additional negative side-effects are that the data, even though generated with public funding, are not accessible for the public who actually paid for the generation of these data in the first place. The consequence is also that similar data have to be generated again, in case they are required for any further experiment. Such an approach is inappropriate, and should neither be tolerated by (public) funders of studies, nor be supported by publishers….”
“Academic medical journals that publish clinical studies or case reports may contain images from individual patients. In many cases, such as photographs, these permit patients to be identified. Twenty years ago, medical journals were available only in academic libraries, but nowadays almost all academic journals are available online and many use an open-access model of publishing, which means that the content is freely available. The use of licences such as the Creative Commons system (which is promoted by researchers, funders, policy makers and patient groups) also permits reuse of material, including images, on any other platform, which might include republication of patient photographs in a totally different context . For example, the CCBY licence allows anybody to reuse an image without permission, and for any purpose, so long as the source is acknowledged. Although clinicians and patients increasingly use the Internet to seek for medical information, it is not clear whether they are aware of the fact that, once published under a CCBY licence (unlike under traditional copyright laws), the reuse of a clinical photograph cannot be controlled. Although patients still have a high level of confidence in health professionals, the patient-doctor relationship has shifted from a paternalistic to a shared-decision model, in which health professionals have responsibility to provide the best information to patients to permit them to make an informed choice. This applies not only to treatment choices, but also to participation in research, and to the publication of individual photographs. Therefore, journals should have clear policies and provide guidance for authors to respect essential ethical principles to preserve patients’ privacy, confidentiality and anonymity, and editors should make sure that those policies are implemented. Patients’ consent for the publication of any individual images should be given freely and be based on appropriate information about how the images may be used. In addition, authors and editors must follow legal requirements such as the recently implemented EU directive (GDPR) requiring strict patient data protection….”
“Although issues such as data deidentification and the potential for unauthorized reidentification have prevented some from considering sharing and even collaborating, progress has been made in these areas. Analyses of distributed data sets require technical infrastructure and funding to support and maintain the compute environment. It should be possible to achieve meaningful data sharing with embedded research that encourages rather than discourages the growth of a learning health system.6 Although there is often a mismatch between the explicit motivations, unstated or implicit motivations, and the design of an actual data?sharing policy, this should not dissuade us from pursuing such agreements.7 As has been observed already, the shift from an aim of changing behavior to changing culture has both subtle and profound implications for policy design and implementation.8…”