“Global-south scientists say that an open-access movement led by wealthy nations deprives them of credit and undermines their efforts….
But a growing faction of scientists, mostly from wealthy nations, argues that sequences should be shared on databases with no gatekeeping at all. They say this would allow huge analyses combining hundreds of thousands of genomes from different databases to flow seamlessly, and therefore deliver results more rapidly.
The debate has caught the attention of the US National Institutes of Health (NIH) — which runs its own genome repository, called GenBank — and the Bill & Melinda Gates Foundation, which has considered encouraging grantees to share on sites without such strong protections, Nature has learnt.
But many researchers — particularly those in resource-limited countries — are pushing back. They tell Nature that they see potential for exploitation in this no-strings-attached approach — and that GISAID’s gatekeeping is one of its biggest attractions because it ensures that users who analyse sequences from GISAID acknowledge those who deposited them. The database also requests that users seek to collaborate with the depositors….
Fears of inequitable data use are amplified by the fact that only 0.3% of COVID-19 vaccines have gone to low-income countries. “Imagine Africans working so hard to contribute to a database that’s used to make or update vaccines, and then we don’t get access to the vaccines,” says Christian Happi, a microbiologist at the African Centre of Excellence for Genomics of Infectious Diseases in Ede, Nigeria. “It’s very demoralizing.” …”
Abstract: DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.
“genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis….”
ARL is heartened to see Congress acknowledge the necessity of machine-readable data management plans (DMPs) and open repositories in supporting the academic research enterprise. At a National Science Foundation–funded conference on effective data practices in December 2019, ARL, along with the Association of American Universities, the Association of Public and Land-grant Universities, and the California Digital Library, convened stakeholders including university research officers, scientists, and librarians. Conference participants agreed that data management planning is important for sharing and use of research data and outputs. Participants suggested that the ability to update plans (“just in time”) across the project life cycle and as part of progress reporting would accelerate the value and adoption of DMPs among researchers, beyond what is required for compliance.
ARL encourages the development of a collaborative set of data repository criteria. Coordination among federal agencies will be necessary, as will stakeholder input from researchers, repository managers, librarians, and others. ARL looks forward to continuing these conversations and building upon work already underway within groups such as the Confederation of Open Access Repositories, the Research Data Alliance, and the World Data System….”
Abstract: A group of publishers came together to discuss how we could reduce the complexity and inconsistency provided in publisher’s advice to researchers when selecting an appropriate data repository. It is a shared goal among publishers and other stakeholders to increase repository use – which remains far from optimal – and we assume that helping researchers find a suitable repository more easily will help achieve this.
To address this a list of features has been created and it is intended only as a framework within which publishers can make recommendations to researchers, not as a way to restrict which repositories researchers may choose for their data. Our intention is that the features we highlight will act to initiate engagement and collaboration among publishers, repositories and the RPOs, government and funders that ultimately make the policies around Open Research. As we start this conversation, it is important that we act together with other stakeholders to raise awareness of the challenges involved around FAIR data and to prevent any perverse consequences.
From the RDA FAIRsharing WG point of view, the ultimate objective is to map repository features across all existing initiatives, and to identify a common core set of metadata fields that all stakeholders want to see in registry of repositories. The FAIRsharing registry in particular is agnostic as to the selection process of standards, repositories and policies, as part of its commitment to working with and for all stakeholder groups.
“As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository….
Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations….”
“PsychOpen CAMA enables accessing meta-analytic datasets, reproducing meta-analyses and dynamically updating evidence from new primary studies collaboratively….
A CAMA (Community Augmented Meta Analysis) is an open repository for meta-analytic data, that provides meta-analytic analysis tools….
PsychOpen CAMA enables easy access and automated reproducibility of meta-analyses in psychology and related fields. This has several benefits for the research community:
Evidence can be kept updated by adding new studies published after the meta-analysis.
Researchers with special research questions can use subsets of the data or rerun meta-analyses using different moderators.
Flexible analyses with the datasets enable the application of new statistical procedures or different graphical displays.
The cumulated evidence in the CAMA can be used to get a quick overview of existing research gaps. This may give an idea of which study designs or moderators may be especially interesting for future studies to use limited resources for research in a way to enhance evidence.
Given existing meta-analytic evidence, the necessary sample size of future studies to detect an effect of a reasonable size can be estimated. Moreover, the effect of possible future studies on the results of the existing meta-analytic evidence can be simulated.
PsychOpen CAMA offers tutorials to better understand the reasoning behind meta-analyses and to learn the basic steps of conducting a meta-analysis to empower other researchers to contribute to our project for the benefit of the research community….”
Abstract: Over the past three years, “Data Repository Selection-Criteria That Matter” – “a set of criteria for the identification and selection of those data repositories that accept research data submissions” – were developed by a group of publishers facilitated by the FAIRsharing initiative. Throughout this time, a large number of organizations and individuals have formulated responses and expressed concern about the criteria and the process through which the criteria were developed. Collectively, our organizations consider that the “Data Repository: Selection Criteria that Matter” recommendations – as currently conceived – will act as an impediment to achieving these aims. As such, we are issuing this Joint Position Statement to highlight the community’s concerns and request that the authors of these criteria respond with specific actions.
“Cryogenic electron microscopy (cryo-EM) methods began to be used in the mid-1970s to study thin and periodic arrays of proteins. Following a half-century of development in cryo-specimen preparation, instrumentation, data collection, data processing and modeling software, cryo-EM has become a routine method for solving structures from large biological assemblies to small biomolecules at near to true atomic resolution. This review explores the critical roles played by the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) in partnership with the community to develop the necessary infrastructure to archive cryo-EM maps and associated models. Public access to cryo-EM structure data has in turn facilitated better understanding of structure-function relationships and advancement of image processing and modeling tool development. The partnership between the global cryo-EM community and PDB and EMDB leadership has synergistically shaped the standards for metadata, one-stop deposition of maps and models, and validation metrics to assess the quality of cryo-EM structures. The advent of cryo-electron tomography (cryo-ET) for in situ molecular cell structures at a broad resolution range and their correlations with other imaging data introduces new data archival challenges in terms of data size and complexity in the years to come.
Abstract: There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support. This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data. This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit. To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories. Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.