Post by the PLOS ONE Editors on behalf of the PLOS Data Team Since 2015, the PLOS journals have maintained a list of repositories that we have determined to be suitable for authors depositing datasets that
Access to research results, immediately and without restriction, has always been at the heart of PLOS’ mission and the wider Open Access movement. However, without similar access to the data underlying the findings, the article can be of limited use. For this reason, PLOS has always required that authors make their data available to other academic researchers who wish to replicate, reanalyze, or build upon the findings published in our journals.
In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.
What do we mean by data?
“Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances.” Examples could include spreadsheets of original measurements (of cells, of fluorescent intensity, of respiratory volume), large datasets such as
next-generation sequence reads, verbatim responses from qualitative studies, software code, or even image files used to create figures. Data should be in the form in which it was originally collected, before summarizing, analyzing or reporting.
What do we mean by publicly available?
All data must be in one of three places:
- the body of the manuscript; this may be appropriate for studies where the dataset is small enough to be presented in a table
- in the supporting information; this may be appropriate for moderately-sized datasets that can be reported in large tables or as compressed files, which can then be downloaded
- in a stable, public repository that provides an accession number or digital object identifier (DOI) for each dataset; there are many repositories that specialize in specific data types, and these are particularly suitable for very large datasets
Do we allow any exceptions?
Yes, but only in specific cases. We are aware that it is not ethical to make all datasets fully public, including private patient data, or specific information relating to endangered species. Some authors also obtain data from third parties and therefore do not have the right to make that dataset publicly available. In such cases, authors must state that “Data is available upon request”, and identify the person, group or committee to whom requests should be submitted. The authors themselves should not be the only point of contact for requesting data.
Where can I go for more information?
The revised data sharing policy, along with more information about the issues associated with public availability of data, can be reviewed in full at:
Image: Open Data stickers by Jonathan Gray
Last month PLOS ONE attended the ISMB/ECCB 2013 conference in Berlin on Intelligent Systems for Molecular Biology. More than 1,500 delegates attended what is the largest conference on computational biology in the world to discuss the latest developments in computational methods that address biological questions.
The opening keynote from PLOS ONE Academic Editor Gil Ast focused on alternative splicing, a mechanism by which several mRNA transcripts are generated from the same mRNA precursor, thus enhancing transcriptome and proteome diversity. He mentioned a paper his group published earlier this year in PLOS ONE, in which they showed that pre-mRNA splicing influences nucleosome organization, suggesting that there is a bi-directional interplay between chromatin organization and splicing. While it is widely accepted that chromatin organization and DNA modification regulate transcription, it is intriguing that splicing can in turn affect chromatin organization, and this may constitute an additional layer of regulation of gene expression. He also presented exciting recent findings showing how pre-mRNA splicing and the creation of new exons in the human genome may be linked to certain genetic disorders and types of cancers.
Understanding the biology of complex human disease is also one of Goncalo Abecasis’s objectives, winner of the ISCB 2013 Overton Prize. Specifically, he is interested in better understanding genetic variation and its connections to human diseases using computational methods and statistical tools. In his talk, he emphasized that the identification and characterization of the genetic variants that affect human traits may be achieved by examining the link between these traits and the complete genome sequences of thousands of individuals. To collect DNA from as many people as possible, he wondered whether we should make use of social media to call for volunteers to send their DNA samples. Are Facebook and Twitter the key to understanding human genetics?
One topic that generated much discussion at the meeting was data sharing. In her talk, Carole Goble called for all scientists to share their data widely as to enable reproducibility, a principle underpinning the scientific method. Several journals, including PLOS ONE, require that all data (including all relevant raw data) described in the manuscript be made freely available to any scientist wishing to use them for the purpose of academic, non-commercial research. Well established and widely supported public repositories already exist for certain types of data such as nucleic acid sequences, and in cases where an appropriate repository does not exist, there are also general data repositories such as Dryad. Assigned accession numbers or digital object identifiers (DOIs) facilitate data citation and ensure accountability. An increasing number of research funding agencies also now support data sharing in the life sciences. Whilst there is indeed increasing discussion to make primary data from published research publicly available, Goble mentioned a paper by Ioannidis and colleagues showing that a substantial proportion of articles published in high-impact journals do not comply (or only weakly comply) with data availability requirements. According to Goble, a lack of data sharing, and thus reproducibility, could lead to an increase in retracted scientific papers.
She also urged the computational biology community to release their “dark data”, i.e. data that is not published and remains hidden on various USB drives and computers, the point being that if shared more people will be able to use these results, increasing visibility, accountability and reproducibility. As highlighted by a recent study, data sharing is not an end in itself, but rather a crucial form of scientific knowledge dissemination.
Keren-Shaul H, Lev-Maor G, Ast G (2013) Pre-mRNA Splicing Is a Determinant of Nucleosome Organization. PLoS ONE 8(1): e53506. doi:10.1371/journal.pone.0053506
Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA (2011) Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9): e24357. doi:10.1371/journal.pone.0024357
Wallis JC, Rolando E, Borgman CL (2013) If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology. PLoS ONE 8(7): e67332. doi:10.1371/journal.pone.0067332
Wikimedia by Angelineri
Modified from Schwartz S, Oren R, Ast G (2011) Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads. PLoS ONE 6(1): e16685. doi:10.1371/journal.pone.0016685
Music may be the newest addition to a science communicator’s toolbox. A PLOS ONE paper published today describes an algorithm that represents terabytes of microbial and environmental data in tunes that sound remarkably like modern jazz.
“Microbial bebop”, as the authors describe it, is created using five years’ worth of consecutive measurements of ocean microbial life and environmental factors like temperature, dissolved salts and chlorophyll concentrations. These diverse, extensive data are only a subset of what scientists have been recording at the Western Channel Observatory since 1903.
As first author Larsen explained to the Wired blogs, “It’s my job to take complex data sets and find ways to represent that data in a way that makes the patterns accessible to human observations. There’s no way to look at 10,000 rows and hundreds of columns and intuit what’s going on.”
Each of the four compositions in the paper is derived from the same set of data, but highlights different relationships between the environmental conditions of the ocean and the microbes that live in these waters.
“There are certain parameters like sunlight, temperature or the concentration of phosphorus in the water that give a kind of structure to the data and determine the microbial populations. This structure provides us with an intuitive way to use music to describe a wide range of natural phenomena,” explains Larsen in an Argonne National Laboratories article.
Speaking to Living on Earth, Larsen describes how their music highlights the relationship between different kinds of data. “In most of the pieces that we have posted, the melody is derived from a numerical measurement, such that the lowest measure is the lowest note and the highest measure is the highest note. The other component is the chords. And the chords map to a different component of the data.”
As a result, the music generated from microbial abundance data played to chords generated from phosphorus concentration data will sound quite different from the same microbial data played to chords derived from temperature data.
“Songs themselves probably are never going to actively replace, you know, the bar graph for data analysis, but I think that this kind of translation of complex data into a very accessible format is an opportunity to lead people who probably aren’t highly aware of the importance of microbial ecology in the ocean, and give them a very appealing entry into this kind of data”, explained Larsen in the same interview with Living on Earth.
Though their primary intent was to create novel way to symbolize the interactions of microbes in the ocean, the study also suggests that microbial bebop may eventually have applications in crowd-sourcing solutions to complex environmental issues.
For further reading, a PLOS ONE paper in 2010 demonstrated that the metaphors used to explain a problem could have a powerful impact on people’s thoughts and decisions when designing solutions. Could re-phrasing complex environmental data in music lead to solutions we haven’t heard yet? As you ponder the question, listen to some microbial bebop!
Citations: Larsen P, Gilbert J (2013) Microbial Bebop: Creating Music from Complex Dynamics in Microbial Ecology. PLoS ONE 8(3): e58119. doi:10.1371/journal.pone.0058119
Thibodeau PH, Boroditsky L (2011) Metaphors We Think With: The Role of Metaphor in Reasoning. PLoS ONE 6(2): e16782. doi:10.1371/journal.pone.0016782
Image: sheet music by jamuraa on Flickr