Introducing Lawvocado: The Caselaw Access Project Newsletter | Library Innovation Lab

“Today we’re sharing Lawvocado, our newsletter from the Caselaw Access Project.

Delivered right to your inbox, Lawvocado will be the source for news and developments from the Caselaw Access Project and stories in our orbit.

Subscribe and catch up with our first issue….”

The In/Visible, In/Audible Labor of Digitizing the Public Domain

Abstract:  In this article I call for more recognition of and scholarly engagement with public, volunteer digital humanities projects, using the example of LibriVox.org to consider what public, sustainable, digital humanities work can look like beyond the contexts of institutional sponsorship. Thousands of volunteers are using LibriVox to collaboratively produce free audiobook versions of texts in the US public domain. The work of finding, selecting, and preparing texts to be digitized and published in audio form is complex and slow, and not all of this labor is ultimately visible, valued, or rewarded. Drawing on an ethnographic study of 12 years of archived discourse and documentation, I interrogate digital traces of the processes by which several LibriVox versions of Anne of Green Gables have come into being, watching for ways in which policies and infrastructure have been influenced by variously visible and invisible forms of work. Making visible the intricate, unique, archived experiences of the crowdsourcing community of LibriVox volunteers and their tools adds to still-emerging discussions about how to value extra-institutional, public, distributed digital humanities work.

Open Access 2.0: Rethinking Open Access

“The open access movement has empowered museums to connect with their audiences by providing unprecedented access to digital collections. Now that a number of museums have had an open access policy for the better part of a decade, how have their policies stood the test of time? How have their policies made an impact on their institutions and communities? Have standards of “openness” changed? How can policies be updated to address changes in community practice? What lessons can those still advocating for an initial open access policy at their institution learn from early innovators? Representatives from several museums with open access policies will share how their policies are evolving and lessons learned from their experiences implementing open access, and a representative from Creative Commons will give an update on the work the OpenGLAM community is doing to support open access policies….Key Outcomes: After attending this session, participants from institutions with open access policies will be ready to review their policies for areas that may need updating. Participants who are still lobbying for open access at their museum will come away with strategies for gaining institutional support for open access and crafting a policy that reflects current practice.”

Expanding Access to U.S. Law: Harvard’s Caselaw Access Project

“For more than six years a team at Harvard University Law School’s Library Innovation Lab has been busy working on the Caselaw Access Project (CAP), an initiative to digitize a collection of 360 years worth of United States court cases dating from 1658 to 2018. The project was initiated in an effort to make case law freely and easily available to legal scholars and the public. Last month, the fruits of the team’s labors were realized with the official launch of CAP. The published CAP corpus comprises 6.4 million unique cases and over 40 million pages of U.S. federal, state, and territorial case law documents from the Law School library.

CAP was funded and made possible by Harvard Law School and, in part, through a partnership with legal research and analytics startup Ravel. The new digital repository will help lower the cost of accessing historical court cases and it opens up new opportunities for legal scholars and programmers to process large sets of legal data via the CAP API and bulk data service. The CAP API enables users to browse and download cases using a few short commands and through its “bulk data” feature users can download whole zip files of content.

In the interview below, Kelly Fitzpatrick, Research Associate at Harvard University’s Berkman Klein Center for Internet & Society, discusses how CAP got started and the goals of the project….”

Expanding Access to U.S. Law: Harvard’s Caselaw Access Project

“For more than six years a team at Harvard University Law School’s Library Innovation Lab has been busy working on the Caselaw Access Project (CAP), an initiative to digitize a collection of 360 years worth of United States court cases dating from 1658 to 2018. The project was initiated in an effort to make case law freely and easily available to legal scholars and the public. Last month, the fruits of the team’s labors were realized with the official launch of CAP. The published CAP corpus comprises 6.4 million unique cases and over 40 million pages of U.S. federal, state, and territorial case law documents from the Law School library.

CAP was funded and made possible by Harvard Law School and, in part, through a partnership with legal research and analytics startup Ravel. The new digital repository will help lower the cost of accessing historical court cases and it opens up new opportunities for legal scholars and programmers to process large sets of legal data via the CAP API and bulk data service. The CAP API enables users to browse and download cases using a few short commands and through its “bulk data” feature users can download whole zip files of content.

In the interview below, Kelly Fitzpatrick, Research Associate at Harvard University’s Berkman Klein Center for Internet & Society, discusses how CAP got started and the goals of the project….”

Data-mining reveals that 80% of books published 1924-63 never had their copyrights renewed and are now in the public domain / Boing Boing

“But there’s another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed. The problem has been to figure out which of these works were in the public domain, because the US Copyright Office’s records were not organized in a way that made it possible to easily cross-check a work with its registration and renewal.

For many years, the Internet Archive has hosted an archive of registration records, which were partially machine-readable.

Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.

Now, Leonard Richardson (previously) has done the magic data-mining work to affirmatively determine which of the 1924-63 books are in the public domain, which turns out to be 80% of those books; what’s more, many of these books have already been scanned by the Hathi Trust (which uses a limitation in copyright to scan university library holdings for use by educational institutions, regardless of copyright status)….”

Data-mining reveals that 80% of books published 1924-63 never had their copyrights renewed and are now in the public domain / Boing Boing

“But there’s another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed. The problem has been to figure out which of these works were in the public domain, because the US Copyright Office’s records were not organized in a way that made it possible to easily cross-check a work with its registration and renewal.

For many years, the Internet Archive has hosted an archive of registration records, which were partially machine-readable.

Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.

Now, Leonard Richardson (previously) has done the magic data-mining work to affirmatively determine which of the 1924-63 books are in the public domain, which turns out to be 80% of those books; what’s more, many of these books have already been scanned by the Hathi Trust (which uses a limitation in copyright to scan university library holdings for use by educational institutions, regardless of copyright status)….”

HLS Caselaw Access Project helps researchers draw new connections between ideas, people and organizations – Harvard Law Today

Through the Caselaw Access Project, Harvard Law School has made millions of legal decisions more accessible to researchers than ever before. On campus last week, the inaugural Caselaw Research Summit, hosted by the Harvard Library Innovation Lab, brought to light the diversity of research that the project is making possible.

The Caselaw Access Project (CAP) was the result of five years of work by the Harvard Library Innovation Lab at Harvard Law School. Between 2013-18, the HLS Library digitized more than 40 million pages of data covering 6.5 million individual cases; the most comprehensive database of American law available anywhere outside the Library of Congress. But unlike the latter it gives nationwide researchers free, immediate access to judicial decisions from each of the 50 states, dating back to their founding. Tweaks are still being made to CAP, notably a new Historical Trends app that can trace the number of a times a word was used in legal cases over the years, with a timeline pointing to the relevant cases.

The day-long summit at Milstein West brought together research teams from as far away as Oxford, England, who gave a variety of presentations on their use of the dataset, enhancing research that was already underway, with faster, more comprehensive access to data. Presenters explored the contents of court opinions and the evolution of language, and examined  themes like link rot and connecting legal data with other digital collections….”