“In principle, knowledge of an article title allows for rather easy retrival of the associated DOI: It usually involves copy-pasting the string into a search engine, following the first link to a landing page or full text PDF and looking for the DOI there. While this is doable for a small number of data points, it quickly becomes infeasible for larger sets of data, and 3000 articles is clearly too much to look up manually.”
“The Publishing group at the California Digital Library is pleased to announce the launch of a major redesign of eScholarship, the University of California’s Open Access repository and publishing platform.
With this release, eScholarship now offers a robust consortial repository solution, with custom access layers and a strong brand identity for each of our ten UC campuses and 70+ academic journals. The new eScholarship site is designed to meet the WCAG 2.0 AA standard for ADA accessibility, scales automatically for mobile and tablet devices, and features a flexible, modular design that allows for multiple content display options and customizable landing pages.
The eScholarship redesign represents a significant departure from previous technology approaches—away from custom builds and toward more widely adopted, open source technology solutions used both inside and outside the academic library domain, including node.js for the server-side code and React for the front-end framework. The code behind the new eScholarship site is located in GitHub. Post-release, the team will turn its attention to creating a public API. …”
“At rOpenSci we are creating packages that allow access to data repositories through the R statistical programming environment that is already a familiar part of the workflow of many scientists. Our tools not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers….We develop open source R packages that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories that provide real-time metrics of scholarly impact. …Use our packages to acquire data (both your own and from various data sources), analyze it, add in your narrative, and generate a final publication in any one of widely used formats such as Word, PDF, or LaTeX. Combine our tools with the rich ecosystem of existing R packages….”
“There’s a vast trove of science out there locked inside the PDF format. From preprints to peer-reviewed literature and historical research, millions of scientific manuscripts today can only be found in a print-era format that is effectively inaccessible to the web of interconnected online services and APIs that are increasingly becoming the digital scaffold of today’s research infrastructure….Extracting key information from PDF files isn’t trivial. …It would therefore certainly be useful to be able to extract all key data from manuscript PDFs and store it in a more accessible, more reusable format such as XML (of the publishing industry standard JATS variety or otherwise). This would allow for the flexible conversion of the original manuscript into different forms, from mobile-friendly layouts to enhanced views like eLife’s side-by-side view (through eLife Lens). It will also make the research mineable and API-accessible to any number of tools, services and applications. From advanced search tools to the contextual presentation of semantic tags based on users’ interests, and from cross-domain mash-ups showing correlations between different papers to novel applications like ScienceFair, a move away from PDF and toward a more open and flexible format like XML would unlock a multitude of use cases for the discovery and reuse of existing research….We are embarking on a project to build on these existing open-source tools, and to improve the accuracy of the XML output. One aim of the project is to combine some of the existing tools in a modular PDF-to-XML conversion pipeline that achieves a better overall conversion result compared to using individual tools on their own. In addition, we are experimenting with a different approach to the problem: using computer vision to identify key components of the scientific manuscript in PDF format….To this end, we will be collaborating with other publishers to collate a broad corpus of valid PDF/XML pairs to help train and test our neural networks….”
“Poster presented at OAI10, University of Geneva, 21 -23 June 2017.”
“The number of scholarly research papers being published is gradually growing; it is estimated that approximately 1.5 million of research papers are produced each year and about 4% of them are offered via Open Access journals. The high volume of scientific papers introduces new opportunities for content discoverability and facilitates a growth in various scientific disciplines via text and data mining (TDM). One of the greatest barriers to TDM is caused by the difficulty of programmatically accessing open access content from a wide range of publishers…”
“Open Academic Society is formed by a group of institutions to create a shared, open and expanding knowledge graph of research and education-focused entities and relationships. With the initial contributions from the Microsoft Academic by Microsoft Research and the AMiner graph from Tsinghua University, the reach and depth of the knowledge graph will come through the Society members’ contributions. The data set is available under a freely accessible cloud API, and the society will organize workshops, challenges, and data sharing activities for the benefit of the larger computer science community….”
“Finally, Ravel Law’s access to the Harvard case law content and PDF images of original case opinions will enrich the already expansive case law collection available from LexisNexis. LexisNexis is committed to continuing Ravel Law’s open access to this historical collection, giving the American public, and anyone with an internet connection, access to this vital collection of legal information….”
“LexisNexis Legal & Professional has acquired legal research and litigation analytics firm Ravel Law, and will integrate Ravel’s data visualization and profiling technology into LexisNexis services….In the next few months, the team will complete its project with Harvard University to digitize the school’s case law library, and Lewis notes that LexisNexis will consider ways to support the effort. “We’ll continue to provide public access and expand it with APIs,” Lewis says, referring to the application program interfaces that developers use to distribute information….”
“On May 8, 2017, several regional and national repository networks and stakeholder groups, including the Association of Research Libraries (ARL), formally endorsed an international accord that will lead to the greater alignment of repository networks around the world. The aim of the accord is to improve cooperation between national and regional repository networks by identifying common principles and areas of collaboration that will lead to the development of global services. The accord was developed by COAR, the Confederation of Open Access Repositories, a global organization of which ARL is a member….”