“The industry and government have spent many years collecting data, and now there is finally a tool [AI] to derive insight from it. The panel [at a House Oversight and Government Reform committee hearing Wednesday] agreed that open data policies are needed so the industry can begin making use of it, but it won’t be an easy process. Most of the government’s data is still unstructured and needs to be organized in a meaningful way.”
“We’ve pulled over 40 million scientific papers from sources like PubMed, Nature, and ArXiv….Our AI analyzes research papers and pulls out authors, references, figures, and topics….We link all of this information together into a comprehensive picture of cutting-edge research….What if a cure for an intractable cancer is hidden within the results of thousands of clinical studies? We believe that in 20 years’ time, AI will be able to connect the dots between studies to identify hypotheses and suggest experiments that would otherwise be missed. That’s why we’re building Semantic Scholar and making it free and open to researchers everywhere.
Semantic Scholar is a project at the Allen Institute for Artificial Intelligence (AI2). AI2 was founded to conduct high-impact research and engineering in the field of artificial intelligence. We’re funded by Paul Allen, Microsoft co-founder, and led by Dr. Oren Etzioni, a world-renowned researcher and professor in the field of artificial intelligence….”
“One of the most exciting data projects we [Elsevier] are working on at the moment is with a UK based charity, Findacure. We are helping the charity to find alternative treatment options for rare diseases such as Congenital Hyperinsulinism by offering our informatics expertise, and giving them access to published literature and curated data through our online tools, at no charge.
We are also supporting The Pistoia Alliance, a not-for-profit group that aims to lower barriers to collaboration within the pharmaceutical and life science industry. We have been working with its members to collaborate and develop approaches that can bring benefits to the entire industry. We recently donated our Unified Data Model to the Alliance; with the aim of publishing an open and freely available format for the storage and exchange of drug discovery data. I am still proud of the work I did with them back in 2009 on the SESL project (Semantic Enrichment of Scientific Literature), and my involvement continues as part of the special interest group in AI….”
“This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications full text. We analyse a range of features that have been previously used in this task. Our experimental results confirm that the number of in text references are highly predictive of influence. Contrary to the work of Valenzuela et al. we find abstract similarity one of the most predictive features. Overall, we show that many of the features previously described in literature are not particularly predictive. Consequently, we discuss challenges and potential improvements in the classification pipeline, provide a critical review of the performance of individual features and address the importance of constructing a large scale gold standard reference dataset.”
“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”
“Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this project, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. We find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda – viziometrics – to study the organization and presentation of visual information in the scientific literature….”
“Open Academic Search (OAS) is a working group aiming to advance scientific research and discovery, promote technology that assists the scientific and academic communities, and make research available worldwide for the good of all humanity….Our core principles:  Collaboration drives innovation in academic search.  AI plays a unique role in surfacing and analyzing information in millions of research papers and academic journals.  Our core mission is advancing the pace of research and aiding breakthroughs in critical research areas….”
“‘What might peer review look like in 2030’ examines how peer review can be improved for future generations of academics and offers key recommendations to the academic community. The report is based on the lively and progressive sessions at the SpotOn London conference held at Wellcome Collection Conference centre in November 2016
It includes a collection of reflections on the history of peer review, current issues such as sustainability and ethics, while also casting a look into the future including advances such as preprint servers and AI applications. The contributions cover perspectives from the researcher, a librarian, publishers and others. …”
“Imagine for a moment that publishers…embrace AI in peer review. AI performance will increase precisely where human editors today invest most of their time: choosing reviewers and judging whether to publish a manuscript. I don’t see why learning algorithms couldn’t manage the entire review from submission to decision by drawing on publishers’ databases of reviewer profiles, analyzing past streams of comments by reviewers and editors, and recognizing the patterns of change in a manuscript from submission to final editorial decision. What’s more, disconnecting humans from peer review would ease the tension between the academics who want open access and the commercial publishers who are resisting it….”