Wide-Open: Accelerating public data release by automating detection of overdue datasets

Abstract

 

 

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

The Center for Open Science Releases Another Branded Preprint Service With LawArXiv

“The Center for Open Science (COS) is pleased to announce that it has added another branded service to its open source preprints service, OSF Preprints. The new service, called LawArXiv,  provides free, open access, open source archives for legal research. LawArXiv is an open access legal repository supported and maintained by members of the scholarly legal community.”

Open Science Prize announces epidemic tracking tool as grand prize winner | National Institutes of Health (NIH)

“A prototype online platform that uses real-time visualization and viral genome data to track the spread of global pathogens such as Zika and Ebola is the grand prize winner of the Open Science Prize

(link is external). The international team competition is an initiative by the National Institutes of Health, in collaboration with the Wellcome Trust and the Howard Hughes Medical Institute (HHMI). The winning team, Real-time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation(link is external), created its nextstrain.org(link is external)prototype to pool data from researchers across the globe, perform rapid phylogenetic analysis, and post the results on the platform’s website. The winning team will receive $230,000 to fully develop their prototype with NIH awarding $115,000 to the U.S. members of the winning team, and the Wellcome Trust and HHMI also contributing $115,000 to the winning team.”