Science Beam – using computer vision to extract PDF data | Labs | eLife

“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”

VizioMetrics

“Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this project, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. We find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda – viziometrics – to study the organization and presentation of visual information in the scientific literature….”

Open Academic Search

“Open Academic Search (OAS) is a working group aiming to advance scientific research and discovery, promote technology that assists the scientific and academic communities, and make research available worldwide for the good of all humanity….Our core principles: [1] Collaboration drives innovation in academic search. [2] AI plays a unique role in surfacing and analyzing information in millions of research papers and academic journals. [3] Our core mission is advancing the pace of research and aiding breakthroughs in critical research areas….”

What might peer review look like in 2030?

“‘What might peer review look like in 2030’ examines how  peer review can be improved for future generations of academics and offers key recommendations to the academic community. The report is based on the lively and progressive sessions at the SpotOn London conference held at Wellcome Collection Conference centre in November 2016

It includes a collection of reflections on the history of peer review, current issues such as sustainability and ethics, while also casting a look into the future including advances such as preprint servers and AI applications. The contributions cover perspectives from the researcher, a librarian, publishers and others.   …”

Peer Review Has Its Shortcomings, But AI Is a Risky Fix | WIRED

“Imagine for a moment that publishers…embrace AI in peer review. AI performance will increase precisely where human editors today invest most of their time: choosing reviewers and judging whether to publish a manuscript. I don’t see why learning algorithms couldn’t manage the entire review from submission to decision by drawing on publishers’ databases of reviewer profiles, analyzing past streams of comments by reviewers and editors, and recognizing the patterns of change in a manuscript from submission to final editorial decision. What’s more, disconnecting humans from peer review would ease the tension between the academics who want open access and the commercial publishers who are resisting it….”

Opening Meta – Hypothesis

“A serious piece of scholarly infrastructure is being made open, free and effectively non-profit. Meta has built a cutting edge system to mine scholarly papers new and old, and allow the data to be employed in diverse ways–predicting discoveries before they’re made, projecting the future impact of papers just hours old, and unlocking the potential for innumerable applications applying computation at scale across scientific literature. In what must have taken extraordinary patience, persistence and a lot of finesse, they managed to secure access to some of the most strategic closed content in the scholarly world.”

RSA: Eric Schmidt shares deep learning on AI

[In the area of AI [Schmidt] wants to see the industry push to make sure research stays out in the open and not controlled by military labs. Addressing the hall packed with security professionals, Schmidt made the case for open research, noting that historically companies never want to share anything about their research. “We’ve taken opposite view to build a large ecosystem that is completely transparent because it will get fixed faster,” he said. “Maybe there are some weaknesses, but I would rather do it that way because there are thousands of you who will help plug it….”

PLOS ONE: Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science

Abstract: “One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets.”

PLOS ONE: Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science

Abstract: “One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets.”