Abstract: While institutional repositories (IRs) often include a built-in search tool and/or are indexed by web search engines, some patrons go directly to the online library catalog with their information need. Rather than hope that users will stumble on the IR from the library website or assume that they will start their research with a Google search, librarians can enhance IR discoverability and usage by integrating its content into the library catalog. With strong teamwork, good communication, and a shared vision, this endeavor transforms the IR and library catalog from separate, siloed platforms into a more cohesive collections package. At the University of San Diego, librarians and administrators across three departments came together to share information and work in concert to explore the benefits of auto-harvesting IR content into the library catalog. Driven by a vision of enhancing discoverability and access, as well as promoting the IR and enriching the catalog, the team members worked cooperatively to identify specific IR collections appropriate for harvest, investigate technical logistics, consult outside vendors (including Innovative Interfaces, Inc./III and bepress), and experiment with implementation.
Category Archives: oa.harvesting
Releasing 1.8 million open access publications from publisher systems for text and data mining
“Text and data mining offers an opportunity to improve the way we access and analyse the outputs of academic research. But the technical infrastructure of the current scholarly communication system is not yet ready to support TDM to its full potential, even for open access outputs. To address this problem, Petr Knoth, Nancy Pontika and Lucas Anastasiou have developed the CORE Publisher Connector, a toolkit service designed to assist text miners in accessing content though a single machine interface. The Connector aims to solve the heterogeneity among publisher APIs and assist text miners with data collection, provide a centralised point of access to all openly available scientific publications, and provide a high-performance, constantly updated access interface.“
Harvesting the Academic Landscape: Streamlining the Ingestion of Professional Scholarship Metadata into the Institutional Repository
“Although librarians initially hoped institutional repositories (IRs) would grow through researcher self-archiving, practice shows that growth is much more likely through library-directed deposit. Libraries must then find efficient ways to ingest material into their IR to ensure growth and relevance.”
LA Referencia
“LA Referencia gives visibility to the scientific production of higher education and research institutions in Latin America, promotes open and free access to the full text, with special emphasis on publicly financed results….We are a network of repositories of open access to science in Latin America.”
EBSCO Open Dissertations – Libraries and Universities
“EBSCO and BiblioLabs understand that institutions are seeking to balance the desire to share open research produced on their campus with transparency and choice for their students. EBSCO Open Dissertations is free for authors of ETDs as well as the participating institutions and is meant to increase traffic to individual IRs….
The project is open for metadata submissions from research universities and libraries around the world. In its initial developmental phase, OpenDissertations.org includes ETD metadata from the British Library’s EThOS Service, the University of Florida, the University of Michigan, Michigan State University and the University of Kentucky. More than 20 libraries are expected to participate in the initial product phase….
There are three steps to adding ETDs to EBSCO Open Dissertations:
1. Your ETD metadata is harvested via OAI and integrated into EBSCO’s platform, where pointers send traffic to your IR.
2. EBSCO integrates this data into their current subscriber environments and makes the data available on the open web via opendissertations.org.
3. EBSCO sends you monthly reports on record views and outbound traffic to your IR….”
Digital Public Library of America » Blog Archive » DPLA Launches Open-Source Spark OAI Harvester
“The DPLA is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping us improve our internal workflows and provide better service to our hubs. The Spark OAI Harvester is freely available and we hope that others working with interoperable cultural heritage or science data will find uses for it in their own projects.”
ScienceOpen is a resource for the community – ScienceOpen Blog
“We harvest content from across platforms like PubMed Central, arXiv, SciELO and bring it all together in one place
One of the main features of ScienceOpen is that we are a research aggregator. We don’t select what we index based on discipline, publisher, or geography, as that just creates another silo. Enough of those exist already. What we need, and what we do, is to bring together research articles from across publishers and other platforms and into one space, where it is all treated in exactly the same way….”
Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
Abstract: Providing local access to locally produced content is a primary goal of the Institutional Repository (IR). Guidelines, requirements, and workflows are among the ways in which institutions attempt to ensure this content is deposited and preserved, but some content is always missed. At Los Alamos National Laboratory, the library implemented a service called LANL Research Online (LARO), to provide public access to a collection of publicly shareable LANL researcher publications authored between 2006 and 2016. LARO exposed the fact that we have full text for only about 10% of eligible publications for this time period, despite a review and release requirement that ought to have resulted in a much higher deposition rate. This discovery motivated a new effort to discover and add more full text content to LARO. Autoload attempts to locate and harvest items that were not deposited locally, but for which archivable copies exist. Here we describe the Autoload pipeline prototype and how it aggregates and utilizes Web services including Crossref, SHERPA/RoMEO, and oaDOI as it attempts to retrieve archivable copies of resources. Autoload employs a bootstrapping mechanism based on the ResourceSync standard, a NISO standard for resource replication and synchronization. We implemented support for ResourceSync atop the LARO Solr index, which exposes metadata contained in the local IR. This allowed us to utilize ResourceSync without modifying our IR. We close with a brief discussion of other uses we envision for our ResourceSync-Solr implementation, and describe how a new effort called Signposting can replace cumbersome screen scraping with a robust autodiscovery path to content which leverages Web protocols.
Harvesting Repositories: DPLA, Europeana, & Other Case Studies | The Lone Wolf Librarian
FASTR to be Considered by Senate Committee | SPARC
“After a month of intense conversations and negotiations, the Senate Homeland Security and Governmental Affairs Committee (HSGAC) will bring the ‘Fair Access to Science and Technology Research (FASTR) Act’ up for mark-up on Wednesday, July 29th. The language that will be considered is an amended version of FASTR, officially known as the ‘Johnson-Carper Substitute Amendment,’ which was officially filed by the HSGAC leadership late on Friday afternoon, per committee rules. There are two major changes from the original bill language to be particularly aware of. Specifically, the amendment Replaces the six month embargo period with ‘no later than 12 months, but preferably sooner’ as anticipated; and Provides a mechanism for stakeholders to petition federal agencies to ‘adjust’ the embargo period if the12 months does not serve ‘the public, industries, and the scientific community.’ We understand that these modifications were made in order accomplish a number of things: Satisfy the requirement of a number of Members of HSGAC that the language more closely track that of the OSTP Directive; Meet the preference of the major U.S. higher education associations for a maximum 12 month embargo; Ensure that, for the first time, a number of scientific societies will drop their opposition for the bill; and Ensure that any petition process an agency may enable is focused on serving the interests of the public and the scientific community …”