Team Awarded Grant to Help Digital Humanities Scholars Navigate Legal Issues of Text Data Mining – UC Berkeley Library Update

“We are thrilled to share that the National Endowment for the Humanities (NEH) has awarded a $165,000 grant to a UC Berkeley-led team of legal experts, librarians, and scholars who will help humanities researchers and staff navigate complex legal questions in cutting-edge digital research….

Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine the researchers needed to scrape content about Egyptian artifacts from online sites or databases, or download videos about Egyptian tomb excavations, in order to conduct their automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining. 

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. …

On the limitations of recent lawsuits against Sci?Hub, OMICS, ResearchGate, and Georgia State University – Manley – – Learned Publishing – Wiley Online Library

“Key points

 

The 2017 Sci?Hub judgement has, to date, proven unenforceable, and it appears that enforcing the 2019 OMICS judgement will similarly prove challenging.
Business developments and changing expectations over sharing digital content may also undermine the impact of the ongoing cases against ResearchGate and Georgia State University.
Stakeholders should consider these limitations when deciding how to resolve scholarly publishing disputes….”

The In/Visible, In/Audible Labor of Digitizing the Public Domain

Abstract:  In this article I call for more recognition of and scholarly engagement with public, volunteer digital humanities projects, using the example of LibriVox.org to consider what public, sustainable, digital humanities work can look like beyond the contexts of institutional sponsorship. Thousands of volunteers are using LibriVox to collaboratively produce free audiobook versions of texts in the US public domain. The work of finding, selecting, and preparing texts to be digitized and published in audio form is complex and slow, and not all of this labor is ultimately visible, valued, or rewarded. Drawing on an ethnographic study of 12 years of archived discourse and documentation, I interrogate digital traces of the processes by which several LibriVox versions of Anne of Green Gables have come into being, watching for ways in which policies and infrastructure have been influenced by variously visible and invisible forms of work. Making visible the intricate, unique, archived experiences of the crowdsourcing community of LibriVox volunteers and their tools adds to still-emerging discussions about how to value extra-institutional, public, distributed digital humanities work.

Publications | Free Full-Text | The Impact of Open Access on Teaching—How Far Have We Come? | HTML

Abstract:  This article seeks to understand how far the United Kingdom higher education (UK HE) sector has progressed towards open access (OA) availability of the scholarly literature it requires to support courses of study. It uses Google Scholar, Unpaywall and Open Access Button to identify OA copies of a random sample of articles copied under the Copyright Licensing Agency (CLA) HE Licence to support teaching. The quantitative data analysis is combined with interviews of, and a workshop with, HE practitioners to investigate four research questions. Firstly, what is the nature of the content being used to support courses of study? Secondly, do UK HE establishments regularly incorporate searches for open access availability into their acquisition processes to support teaching? Thirdly, what proportion of content used under the CLA Licence is also available on open access and appropriately licenced? Finally, what percentage of content used by UK HEIs under the CLA Licence is written by academics and thus has the potential for being made open access had there been support in place to enable this? Key findings include the fact that no interviewees incorporated OA searches into their acquisitions processes. Overall, 38% of articles required to support teaching were available as OA in some form but only 7% had a findable re-use licence; just 3% had licences that specifically permitted inclusion in an ‘electronic course-pack’. Eighty-nine percent of journal content was written by academics (34% by UK-based academics). Of these, 58% were written since 2000 and thus could arguably have been made available openly had academics been supported to do so.

2019:GLAM/Public Domain Awareness Project: enhancing use of CC’s Public Domain tools to serve the needs of GLAM institutions and reusers – Wikimania

“Making assessments about the copyright status of a work remains a challenge notwithstanding the tools that CC has developed over the years, such as the Public Domain Mark and CC0. It is also hard to communicate to end users about the laws that apply to their particular use of a work. Copyright is jurisdiction based, which means each country has their own copyright and public domain rules. These differing laws presents challenges for digitizers of content and reusers of digital online surrogates.

Several efforts and projects offer partial solutions for these challenges; however they tend to serve single jurisdiction or regional needs, are loosely coordinated, and are not integrated into a unified solution that works starting from the moment of digitization and continuing through to the public that encounters them over the Internet. Ideally, the public domain is the easiest part of the knowledge commons to assess and reuse, but the current environment makes it challenging at each stage in the process of getting that content to a public.

Creative Commons and other key stakeholders such as Wikimedia brought forth this Project for initial discussion with our community and stakeholders at the CC 2019 Global Summit in Lisbon. The outcomes of the 4 hours session at the Summit can be found here.

At this session, we expect to be able to follow on some of the data modelling challenges in relationship with the Help:Copyrights page on Wikidata. We want to gather feedback and input from the community that is working in the intersection of GLAM institutions and Wikidata.

Creative Commons will bring some of its legal expertise on copyright and open licensing, and we expect to engage more with the Wikidata community to leverage the different languages and community needs, and better refine our initial project….”

Freies Wissen: EU-Kommission stellt ihre Publikationen unter offene Lizenzen – netzpolitik.org

From Google’s English: “The EU Commission places its contents under Creative Commons licenses and supports the organization in the translation of license texts. She is thus ahead of the federal government with a good role model….

Since the beginning of this year, many contents and publications of the EU Commission have been standardized under two Creative Commons licenses. Both allow a largely free use of such content, which can now virtually arbitrarily remix, pass on and commercially reuse.

At the end of February, the EU Commission announced that it would place most of the knowledge it produced under a “CC BY 4.0” license . Therefore, everyone is free to share, modify and use such content for any purpose as long as the author is named. For metadata, raw data and “other documents of a similar nature”, the EU Commission even goes one step further and places it under the even more liberal CC public domain license ….”

Freies Wissen: EU-Kommission stellt ihre Publikationen unter offene Lizenzen – netzpolitik.org

From Google’s English: “The EU Commission places its contents under Creative Commons licenses and supports the organization in the translation of license texts. She is thus ahead of the federal government with a good role model….

Since the beginning of this year, many contents and publications of the EU Commission have been standardized under two Creative Commons licenses. Both allow a largely free use of such content, which can now virtually arbitrarily remix, pass on and commercially reuse.

At the end of February, the EU Commission announced that it would place most of the knowledge it produced under a “CC BY 4.0” license . Therefore, everyone is free to share, modify and use such content for any purpose as long as the author is named. For metadata, raw data and “other documents of a similar nature”, the EU Commission even goes one step further and places it under the even more liberal CC public domain license ….”

Where to Download the Millions of Free eBooks that Secretly Entered the Public Domain

“Everyone is paying for books when they don’t have to. There’s so many ways to read almost anything ever published, for free, that it borders on the obscene. Libraries: They’re good! Sure, if you want the latest release from your favorite author you either have to pay or wait for a copy from the library, but for millions of older books, you can get a digital version, legally, for free. One secret of the publishing industry is that most American books published before 1964 never extended their copyright, meaning they’re in the public domain today….”

Where to Download the Millions of Free eBooks that Secretly Entered the Public Domain

“Everyone is paying for books when they don’t have to. There’s so many ways to read almost anything ever published, for free, that it borders on the obscene. Libraries: They’re good! Sure, if you want the latest release from your favorite author you either have to pay or wait for a copy from the library, but for millions of older books, you can get a digital version, legally, for free. One secret of the publishing industry is that most American books published before 1964 never extended their copyright, meaning they’re in the public domain today….”

Elsevier sends copyright threat to site for linking to Sci-Hub / Boing Boing

“Sci-Hub (previously) is a scrappy, nonprofit site founded in memory of Aaron Swartz, dedicated to providing global access to the world’s scholarship — journal articles that generally report on publicly-funded research, which rapacious, giant corporations acquire for free, and then charge the very same institutions that paid for the research millions of dollars a year to access.

 

In a field of giant, corrupt monopolists, Elsevier is still notable for its rapacious conduct, so it’s not surprising to learn that the company has sent a copyright threat to a to Citationsy, a service that helps scholars and others create citations to scientific and scholarly literature, alleging that merely linking to Sci-Hub is a copyright infringement.

Citationsy points out that Elsevier owns one of its competitors, the “very mediocre” Mendeley….”