From archive to analysis: accessing web archives at scale through a cloud-based interface | SpringerLink

Abstract:  This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally involve using the command line and writing code. This access gap means that subject-matter experts, as opposed to developers and programmers, have few options to directly work with web archives beyond the page-by-page paradigm of the Wayback Machine. Drawing on first-hand research and analysis of how scholars use web archives, we present the interface design and underpinning architecture of the Archives Unleashed Cloud. We also discuss the sustainability implications of providing a cloud-based service for researchers to analyze their collections at scale.

 

 

cOAlition S and repositories (part I) | Plan S

“When Plan S was launched in 2018, it gained a reputation as a Gold Open Access (OA) initiative focused on paying for Article Processing Charges (APC). See for example “Plan S: A mandate for Gold OA with lots of strings attached.” That may have been fair criticism at the time. However, Plan S was revised in 2019 in response to community feedback (>600 responses). As a result, greater emphasis was given to Open Access via repositories.

Put simply, Plan S has as its core principle that the results from research funded by its organisations must be published in immediate OA with a public open licence. That can be achieved via one of three routes. One of those three routes to OA is publication in a subscription journal with a copy of the peer reviewed work (Author’s Accepted Manuscript – AAM) made immediately available in a repository. This is commonly referred to as ‘Green’ OA. Plan S also states that “cOAlition S strongly encourages the deposition of all publications in a repository, irrespective of the chosen route to compliance. Several cOAlition S members require deposition of all attributed research articles in a repository.” …

[2012.13117] Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Abstract:  Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines.

 

Marygrove College Library : Free Texts : Free Download, Borrow and Streaming : Internet Archive

“When Marygrove College closed in 2019, the Board of Trustees donated the library to the Internet Archive for digitization and preservation. With more than 70,000 books and nearly 3,000 journal volumes, the Geschke Library is a well-curated, world class collection with strengths in the humanities, education, and social justice.  Video about the reopening online. …”

Library’s Web Archiving: COVID-19 Challenges | Library of Congress Blog

“The COVID-19 pandemic has presented challenges to the Library’s web archiving program not seen since the terrorist attacks against the U.S. on Sept. 11, 2001. The program had just begun in 2000, and the Library rushed to pull together online material from all across the country after the attacks. The resulting archive is part of the Library’s permanent collection.

Since then, the web archiving program has collected an enormous amount of materials (more than two petabytes of data and over 21 billion files) primarily in event or theme-based collections that are proposed, approved and set up in a process that can take several weeks to complete….

The team has been highly selective regarding new nominations, with a primary focus on the U.S. The team is also planning for the eventual public launch of the collection, which has a working title of the “Coronavirus Web Archive.” Since the Library’s web archives program observes a one-year embargo on harvested content, that collection will likely be made fully available in the latter half of 2021. Small parts of it will be available before the full launch….”

‘Our history is contained there’: loss of archive threatens Native American tribes | Native Americans | The Guardian

“In 1969, a clerical error resulted in the Samish Indian Nation in Washington state suddenly being dropped from the federal government’s list of recognized tribes. It took almost three decades of wading through piles of historical documents and painstaking litigation before its members were able to regain that recognition, along with the federal benefits and protections that come with it….

But the archive, which sits on a 10-acre site at the edge of Lake Washington, is under threat. It is among a dozen federal properties across the US expected to be put up for sale next year after being identified as “high value assets”, a move that could deprive the Native American community in the Pacific north-west of access to critical resources….

In a statement sent to the Guardian, the National Archives said it was committed to digitising its records in Seattle so they are available free no matter where a person is located.

Records that have not yet been digitized can be scanned and sent to people unable to visit in person at a cost of 80 cents per page, explained Susan Karren, the director of the National Archives at Seattle….”

NRPF grant awarded to digitize ISU lectures | University Library | Iowa State University

“Iowa State University has received $15,000 in grant funding from the National Recording Preservation Foundation (NRPF) to digitize 991 audio recordings of University Lectures….

The ISU Special Collections and University Archives will utilize the NRPF funds to outsource the digitization of 259 reel-to-reel audiotapes and 732 audiocassettes to Preserve South. The ISU Library will match the funds received to outsource captioning to Rev.com, create metadata and provide open access to the digitized files. To aid in discoverability and accessibility, copies will be added to the Special Collections and University Archives YouTube channel. Items will be added into the ISU Library’s digital collections platform, as well as Aviary for full-text searching and syncing of captions….”

Government Information: Readily Accessible yet Also Grey Literature: The Serials Librarian: Vol 0, No 0

Abstract:  Government information is a unique subset of grey literature. Often governments are the only source of information because they are the only entities that collect or create specific information, such as census data. Government information is usually categorized by level of government (local, state, federal, international), as well as by agency, and is often in the form of serials, such as annual reports. For the larger jurisdictions, there are often repositories or depository programs that index publications, but local government information often must be actively acquired. Currently, government information is published online. Some agencies are conscientious of their historical information and digitize and post older materials, but other agencies focus on access to current, born-digital information, and may not be archiving older material. There are several library community initiatives to combat grey government information. A couple of examples are the Federal Depository Library Program’s Lost Docs Reporting mechanism, and the End of Term Archive that collects federal websites at the change of each presidential term. These are at the federal level, and more needs to be done to index and preserve state, and especially local, government information. This is because issues such as copyright affect the accessibility and preservation of non-federal government information.

 

Government Information: Readily Accessible yet Also Grey Literature: The Serials Librarian: Vol 0, No 0

Abstract:  Government information is a unique subset of grey literature. Often governments are the only source of information because they are the only entities that collect or create specific information, such as census data. Government information is usually categorized by level of government (local, state, federal, international), as well as by agency, and is often in the form of serials, such as annual reports. For the larger jurisdictions, there are often repositories or depository programs that index publications, but local government information often must be actively acquired. Currently, government information is published online. Some agencies are conscientious of their historical information and digitize and post older materials, but other agencies focus on access to current, born-digital information, and may not be archiving older material. There are several library community initiatives to combat grey government information. A couple of examples are the Federal Depository Library Program’s Lost Docs Reporting mechanism, and the End of Term Archive that collects federal websites at the change of each presidential term. These are at the federal level, and more needs to be done to index and preserve state, and especially local, government information. This is because issues such as copyright affect the accessibility and preservation of non-federal government information.

 

Cyber AI firm helps Vatican digitize its library archives – Axios

“A cybersecurity firm is working with the Vatican to defend its priceless collection of digitized writings from hacking efforts.

Why it matters: Digitizing library archives can provide an invaluable backup should the originals be lost or destroyed, but they’re also vulnerable to cyberattacks. Without stout defenses, digital libraries can be looted or even vandalized….”