Publishers Sue Internet Archive over Open Library

“Is the Internet Archive’s Open Library a vital channel that democratizes information access, or is it a large-scale digital piracy operation? That’s the question raised in a lawsuit filed by four major book publishers against the nonprofit information vault’s Open Library online-lending project.

The Internet Archive perhaps is best known for its Wayback Machine®, which allows users to go back in time and access a 10-petabyte collection of internet history—that’s over 330 billion web pages. For lawyers, the website and its records have been a unique source of information in some legal disputes, as they enable users to see web history records dating back to 1996.

The Internet Archive’s Open Library project scans libraries’ collections and allows users to digitally borrow books under a system of Controlled Digital Lending (CDL). This limits access to the actual number of physical books and puts users on a waiting list if a book is already checked out.

In March 2020, the Internet Archive temporarily eased Open Library’s lending restrictions amid the COVID-19 pandemic as part of its National Emergency Library project. The change enabled multiple people to check out the same digital copy of a book at the same time in light of physical libraries being shuttered. In response, Hachette, Penguin Random House, Wiley and HarperCollins® filed a copyright infringement lawsuit in New York federal court on June 1 against the Internet Archive, calling both the regular Open Library and the National Emergency Library “digital piracy on an industrial scale.” The Internet Archive ended the Emergency Library project on June 16, but the lawsuit remains in place.

The publishers allege that the Internet Archive’s business model involves freely disseminating scanned copies of physical books through its website, which is “parasitic and illegal” and exploits the work of authors and publishers without paying any of the costs associated with creating the books. It asks the court for damages for publishers’ copyrighted works, and both a preliminary and permanent injunction of the Internet Archive’s digitization and lending processes. It also asks the court to order all unlawful copies of derivative works to be destroyed—more than 1.5 million volumes.

In its response to the lawsuit, the Internet Archive denies it has violated copyright laws and says its CDL program is fundamentally the same as traditional library lending and is protected by U.S. copyright law’s fair use doctrine because it serves the public interest in preservation, access and research. And in a blog post, Internet Archive founder Brewster Kahle called on the publishers to drop the lawsuit and to work with his group to “help solve the pressing challenges to access to knowledge during this pandemic.”

While the lawsuit only focuses on the Internet Archive’s Open Library and doesn’t take issue with the Wayback Machine or digitization of materials in the public domain, the fear is that a victory for the publishers could financially harm the Internet Archive, and thus destroy the Wayback Machine….”

Vanished open access journals | Zenodo

“This dataset provides data on 192 scholarly journals that were identified to have vanished without having a currently active website for providing access to their content. Also included is a list of 984 OA journals which were identified as being inactive. Appended is a brief documentation covering the contents of the various data points. A description of the data collection method and analysis will be provided in a separate research article.”

Here’s a preprint of the companion article.

Open is not forever: a study of vanished open access journals

Abstract:  The preservation of the scholarly record has been a point of concern since the beginning of knowledge production. With print publications, the responsibility rested primarily with librarians, but the shift towards digital publishing and, in particular, the introduction of open access (OA) have caused ambiguity and complexity. Consequently, the long-term accessibility of journals is not always guaranteed, and they can even disappear from the web completely. The purpose of this exploratory study is to systematically study the phenomenon of vanished journals, something that has not been done before. For the analysis, we consulted several major bibliographic indexes, such as Scopus, Ulrichsweb, and the Directory of Open Access Journals, and traced the journals through the Internet Archive’s Wayback Machine. We found 192 OA journals that vanished from the web between 2000 and 2019, spanning all major research disciplines and geographic regions of the world. Our results raise vital concern for the integrity of the scholarly record and highlight the urgency to take collaborative action to ensure continued access and prevent the loss of more scholarly knowledge. We encourage those interested in the phenomenon of vanished journals to use the public dataset for their own research.


Executive Director – CLOCKSS

“CLOCKSS (Controlled LOCKSS) is a not-for-profit joint venture between the world’s leading scholarly publishers and academic libraries, whose mission is to build and operate a sustainable, geographically distributed digital preservation service with which to ensure the long-term survival of web-based scholarly publications for the benefit of the greater global research community. CLOCKSS is an international, mission-driven, partnership organization with technical support from the LOCKSS team at Stanford University.

The role of Executive Director is crucial to CLOCKSS. In this role you will run the organization, reporting to the Board via its Co-Chairs, and be responsible for continuing to establish the organization as a major international scholarly archive and as a valuable collaborative community of scholarly publishers and academic libraries….”

Final Report and Recommendations of the Data Rescue Project at the National Agricultural Library

“The National Agricultural Library (NAL) identified a need for a framework of guidance to support rapid appraisal and processing for scientific researchers’ collections after being offered collections of scientific data and data-rich materials that required immediate appraisal before acquisition. To this end, the NAL partnered with the University of Maryland’s College of Information Studies (iSchool) to support two Data Rescue Digital Curation Fellows to investigate processes for efficiently identifying, appraising, and processing scientific data out of legacy collections, to support data use and reuse….

The data being ‘rescued’ is intended for inclusion in the USDA’s Agricultural Research Service (ARS) open access data repository, Ag Data Commons….”

Destroyed Ancient Temple Now Open for Virtual Exploration

“Five years after its destruction, the ancient Temple of Bel in Palmyra, Syria has been digitally reconstructed by the UC San Diego Library’s Digital Media Lab (DML) using cutting-edge 3D methods and artificial intelligence (AI) applications. Inspired by a past collaboration between the Library and UC San Diego’s Levantine Archaeology Laboratory, this project has resulted in the digital preservation of more than a dozen lost reliefs, sculptures, frescos and paintings, all made publicly available on the Library’s Digital Collections website….”

[2008.04541] Comprehensiveness of Archives: A Modern AI-enabled Approach to Build Comprehensive Shared Cultural Heritage

Abstract:  Archives play a crucial role in the construction and advancement of society. Humans place a great deal of trust in archives and depend on them to craft public policies and to preserve languages, cultures, self-identity, views and values. Yet, there are certain voices and viewpoints that remain elusive in the current processes deployed in the classification and discoverability of records and archives.

In this paper, we explore the ramifications and effects of centralized, due process archival systems on marginalized communities. There is strong evidence to prove the need for progressive design and technological innovation while in the pursuit of comprehensiveness, equity and justice. Intentionality and comprehensiveness is our greatest opportunity when it comes to improving archival practices and for the advancement and thrive-ability of societies at large today. Intentionality and comprehensiveness is achievable with the support of technology and the Information Age we live in today. Reopening, questioning and/or purposefully including others voices in archival processes is the intention we present in our paper.

We provide examples of marginalized communities who continue to lead “community archive” movements in efforts to reclaim and protect their cultural identity, knowledge, views and futures. In conclusion, we offer design and AI-dominant technological considerations worth further investigation in efforts to bridge systemic gaps and build robust archival processes.

48. On Planetary in 2020: curatorial activism and open sourcing in service of digital preservation – Fresh and New

“Perhaps the most experimental aspect of Planetary’s acquisition was the fact that the museum released the source code online with an open license, allowing anyone to copy the code and modify it and adapt their copy to suit their interests. The intention of open-sourcing the code was to open the door to passionate fans of Planetary so they could aid in its long-term preservation and maintenance….

The longer term value of open sourcing the code, rather than inheriting the default closed source model (which is the case with almost all software acquisitions into museum collections) also lies in clarity that it provides for future generations….

During the acquisition process of Planetary, substantial work was done with Smithsonian’s General Counsel and the developers formerly Bloom LLC, to enable the open sourcing which has overall benefits for future preservation activities. The generosity of the developers and their efforts to prepare the source for release cannot be underestimated. By defaulting to open source at the time of acquisition means that future presertvation or presentation activities, emulation or other efforts cannot be stymied in the future by more conservative legal counsel, curators, or conservators at the museum….”

GitHub Archive Program: the journey of the world’s open source code to the Arctic – The GitHub Blog

“At GitHub Universe 2019, we introduced the GitHub Archive Program along with the GitHub Arctic Code Vault. Our mission is to preserve open source software for future generations by storing your code in an archive built to last a thousand years.

On February 2, 2020, we took a snapshot of all active public repositories on GitHub to archive in the vault. Over the last several months, our archive partners Piql, wrote 21TB of repository data to 186 reels of piqlFilm (digital photosensitive archival film). Our original plan was for our team to fly to Norway and personally escort the world’s open source code to the Arctic, but as the world continues to endure a global pandemic, we had to adjust our plans. We stayed in close contact with our partners, waiting for the time when it was safe for them to travel to Svalbard. We’re happy to report that the code was successfully deposited in the Arctic Code Vault on July 8, 2020. …”