On the 12 Days of Science, PLOS ONE Gave to Me…

22728762473_caae76cf26_oIt’s starting to get colder in San Francisco, and the year-end holidays are soon to be upon us. This has made all of us on the PLOS ONE team excited to spend some time with

Pumas, Wolves, and Eagles, Oh My! Early Captive Carnivore Remains Found in Ancient Mexican Ruins

Fig 4From Roman gladiatorial combat to Egyptian animal mummies, capturing and manipulating wild carnivores has long been a way for humans to demonstrate state or individual power. Historians and scientists alike have attempted to determine when

“Elementary, My Dear Watson!” Clues Revealed About an Ancient Case of Leprosy

Unidentified remains found in the English countryside and all signs point to the untimely death of a young man. Researchers examined the bones of a supposed victim, which showed signs of leprosy, to search for clues about the arrival of … Continue reading »

The post “Elementary, My Dear Watson!” Clues Revealed About an Ancient Case of Leprosy appeared first on EveryONE.

Announcing the PLOS Text Mining Collection

Text mining

Post authored by Casey M. Bergman, Lawrence E. Hunter, Andrey Rzhetsky

Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text. Over the last few decades, there has been increasing interest in text mining research because of the potential commercial and academic benefits this technology might enable. However, as with the promises of many new technologies, the benefits of text mining are still not clear to most academic researchers.

This situation is now poised to change for several reasons. First, the rate of growth of the scientific literature has now outstripped the ability of individuals to keep pace with new publications, even in a restricted field of study. Second, text-mining tools have steadily increased in accuracy and sophistication to the point where they are now suitable for widespread application. Finally, the rapid increase in availability of digital text in an Open Access format now permits text-mining tools to be applied more freely than ever before.

To acknowledge these changes and the growing body of work in the area of text mining research, today PLOS launches the Text Mining Collection, a compendium of major reviews and recent highlights published in the PLOS family of journals on the topic of text mining. As one of the major publishers of the Open Access scientific literature, it is perhaps no coincidence that research in text mining in PLOS journals is flourishing. As noted above, the widespread application and societal benefits of text mining is most easily achieved under an Open Access model of publishing, where the barriers to obtaining published articles are minimized and the ability to remix and redistribute data extracted from text is explicitly permitted. Furthermore, PLOS is one of the few publishers who is actively promoting text mining research by providing an open Application Programming Interface to mine their journal content.

Text Mining in PLOS

Over the years, PLOS has published several reviews, opinions, tutorials and dozens of primary research articles in this area in PLOS Biology, PLOS Computational Biology and, increasingly, PLOS ONE. Because of the large number of text mining papers in PLOS journals, we are only able to highlight a subset of these works in the first instance of the PLOS Text Mining Collection. These include major reviews and tutorials published over the last decade [1-6], plus a selection of research papers from the last two years [7-19] and three new papers arising from the call for papers for this collection [20-22].

The research papers included in the collection at launch provide important overviews of the field and reflect many exciting contemporary areas of research in text mining, such as:

  • methods to extract textual information from figures [7];
  • methods to cluster [8] and navigate [15] the burgeoning biomedical literature;
  • integration of text-mining tools into bioinformatics workflow systems [9];
  • use of text-mined data in the construction of biological networks [10];
  • application of text-mining tools to non-traditional textual sources such as electronic patient records [11] and social media [12];
  • generating links between the biomedical literature and genomic databases [13];
  • application of text-mining approaches in new areas such as the Environmental Sciences [14] and Humanities [16-17];
  • named entity recognition [18];
  • assisting the development of ontologies [19];
  • extraction of biomolecular interactions and events [20-21]; and
  • assisting database curation [22].

 Looking Forward

As this is a living collection, it is worth discussing two issues we hope to see addressed in articles that are added to the PLOS text mining collection in the future: scaling up and opening up. While application of text mining tools to abstracts of all biomedical papers in the MEDLINE database is increasingly common, there have been remarkably few efforts that have applied text mining to the entirety of the full text articles in a given domain, even in the biomedical sciences [4][23]. Therefore, we hope to see more text mining applications scaled up to use the full text of all Open Access articles. Scaling up will maximize the utility of text-mining technologies and the uptake by end users, but also demonstrate that demand for access to full text articles exists by the text mining and wider academic communities.

Likewise, we hope to see more text-mining software systems made freely or openly available in the future. As an example of the state of affairs in the field, only 25% of the research articles highlighted in the PLOS text mining collection at launch provide source code or executable software of any kind [13, 16, 19, 21]. The lack of availability of software or source code accompanying published research articles is, of course, not unique to the field of text mining. It is a general problem limiting progress and reproducibility in many fields of science, which authors, reviewers and editors have a duty to address. Making release of open source software the rule, rather than the exception, should further catalyze advances in text mining, as it has in other fields of computational research that have made extremely rapid progress in the last decades (such as genome bioinformatics).

By opening up the code base in text mining research, and deploying text-mining tools at scale on the rapidly growing corpus of full-text Open Access articles, we are confident this powerful technology will make good on its promise to catalyze scholarly endeavors in the digital age.

To view all the articles or read more about this collection, please visit: The PLOS Text Mining Collection (2013)


1.   Dickman S (2003) Tough mining: the challenges of searching the scientific literature. PLoS biology 1: e48. doi:10.1371/journal.pbio.0000048.

2.   Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from Text—Is Text Mining Ready to Deliver? PLoS Biol 3: e65. doi:10.1371/journal.pbio.0030065.

3.   Cohen B, Hunter L (2008) Getting started in text mining. PLoS computational biology 4: e20. doi:10.1371/journal.pcbi.0040020.

4.   Bourne PE, Fink JL, Gerstein M (2008) Open access: taking full advantage of the content. PLoS computational biology 4: e1000037+. doi:10.1371/journal.pcbi.1000037.

5.   Rzhetsky A, Seringhaus M, Gerstein M (2009) Getting Started in Text Mining: Part Two. PLoS Comput Biol 5: e1000411. doi:10.1371/journal.pcbi.1000411.

6.   Rodriguez-Esteban R (2009) Biomedical Text Mining and Its Applications. PLoS Comput Biol 5: e1000597. doi:10.1371/journal.pcbi.1000597.

7.   Kim D, Yu H (2011) Figure text extraction in biomedical literature. PloS one 6: e15338. doi:10.1371/journal.pone.0015338.

8.   Boyack K, Newman D, Duhon R, Klavans R, Patek M, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6: e18029. doi:10.1371/journal.pone.0018029.

9.   Kolluru B, Hawizy L, Murray-Rust P, Tsujii J, Ananiadou S (2011) Using workflows to explore and optimise named entity recognition for chemistry. PloS one 6: e20181. doi:10.1371/journal.pone.0020181.

10.       Hayasaka S, Hugenschmidt C, Laurienti P (2011) A network of genes, genetic disorders, and brain areas. PloS one 6: e20907. doi:10.1371/journal.pone.0020907.

11.       Roque F, Jensen P, Schmock H, Dalgaard M, Andreatta M, et al. (2011) Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS computational biology 7: e1002141. doi:10.1371/journal.pcbi.1002141.

12.       Salathé M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7: e1002199. doi:10.1371/journal.pcbi.1002199.

13.       Baran J, Gerner M, Haeussler M, Nenadic G, Bergman C (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PloS one 6: e24716. doi:10.1371/journal.pone.0024716.

14.       Fisher R, Knowlton N, Brainard R, Caley J (2011) Differences among major taxa in the extent of ecological knowledge across four major ecosystems. PloS one 6: e26556. doi:10.1371/journal.pone.0026556.

15.       Hossain S, Gresock J, Edmonds Y, Helm R, Potts M, et al. (2012) Connecting the dots between PubMed abstracts. PloS one 7: e29509. doi:10.1371/journal.pone.0029509.

16.       Ebrahimpour M, Putni?š TJ, Berryman MJ, Allison A, Ng BW-H, et al. (2013) Automated authorship attribution using advanced signal classification techniques. PLoS ONE 8: e54998. doi:10.1371/journal.pone.0054998.

17.       Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8: e59030. doi:10.1371/journal.pone.0059030.

18.       Groza T, Hunter J, Zankl A (2013) Mining Skeletal Phenotype Descriptions from Scientific Literature. PLoS ONE 8: e55656. doi:10.1371/journal.pone.0055656.

19.       Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8: e55674. doi:10.1371/journal.pone.0055674.

20.       Van Landeghem S, Bjorne J, Wei C-H, Hakala K, Pyysal S, et al. (2013) Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization. PLOS ONE 8:  e55814. doi: 10.1371/journal.pone.0055814

21.       Liu H, Hunter L, Keselj V, Verspoor K (2013) Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLOS ONE 8: e60954. doi: 10.1371/journal.pone.0060954

22.       Davis A, Weigers T, Johnson R, Lay J, Lennon-Hopkins K, et al. (2013) Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLOS ONE 8: e58201. doi: 10.1371/journal.pone.0058201

23.       Bergman CM (2012) Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central? http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/.



A geologist finds his way in a fossil forest (and makes a map for the rest of us)


The rich colors and textures of Petrified Forest National Park represent millions of years of geological time, climate change, and peculiar plants and animals. Now you can find your way through these historic layers with the first digitized map of the region, created largely based on research published in a 2010 PLOS ONE study by William Parker and Jeffrey Martz.

In the winter of 2001, Parker had just found the remains of a large dinosaur in the park under a ledge of sandstone; the trouble was establishing when it had lived since existing geological maps were unclear on what period of time the rocks around him represented. Maps available at that time divided the park into two parts separated by a layer of sandstone called the Sonsela Member, but researchers had differing opinions about which bits of sandstone were part of this formation. Though several previous studies had tried to improve these maps, the changes they made on paper didn’t always match up to the real distances and measurements that field researchers encountered.

Parker and his colleague Jeffrey Martz began to map the Sonsela Member as accurately as possible, walking over large sections of the park to take their measurements. As Parker told the National Parks Traveler, his colleague Jeff Martz “literally wore his boots down to ‘sandals’” before they finished the project. The results of their study were published in 2010, one of the first strictly geology studies to appear in PLOS ONE.

PLOS ONE academic editor Andy Farke noted on his blog that their paper helped resolve several questions about the geological events that shaped the Sonsela Member. The research also provided additional explanation for a layer in the geological record that marks a sudden extinction of plants and animals, and has implications for further research studying this major event.

Perhaps most importantly, however, anyone interested can check whether their results are correct.

“One of the things we have tried to do with the PLOS ONE paper is to make our study completely reproducible”, Parker explained in his blog post. “To this end we have provided (and advocate that all future studies also do this) GPS coordinates as well as photos of all measured outcrops. (..) Furthermore, any proposed mistakes in our work can be easily verified or refuted by future workers by using the map. Very important!”

Their study from 2010 formed the foundation for a now-completed geological map that covers 93,000 acres of the park, and is freely available on the Arizona Geological Survey website. According to the National Parks Traveler, the new map is a “rock star” in geological circles, with over 1100 recorded site visitors in a week. Follow the trail back to where it began by reading the PLOS ONE study here. But if you still find yourself getting lost, it might not hurt to carry a carp on your next hike.

Citation: Martz JW, Parker WG (2010) Revised Lithostratigraphy of the Sonsela Member (Chinle Formation, Upper Triassic) in the Southern Part of Petrified Forest National Park, Arizona. PLoS ONE 5(2): e9329. doi:10.1371/journal.pone.0009329

Image: Owl Rock Member, Chinle Formation, Petrified Forest National Wilderness Area. (NPS) by PetrifiedForestNPS on Flickr