Team Awarded Grant to Help Digital Humanities Scholars Navigate Legal Issues of Text Data Mining – UC Berkeley Library Update

“We are thrilled to share that the National Endowment for the Humanities (NEH) has awarded a $165,000 grant to a UC Berkeley-led team of legal experts, librarians, and scholars who will help humanities researchers and staff navigate complex legal questions in cutting-edge digital research….

Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine the researchers needed to scrape content about Egyptian artifacts from online sites or databases, or download videos about Egyptian tomb excavations, in order to conduct their automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining. 

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. …

The In/Visible, In/Audible Labor of Digitizing the Public Domain

Abstract:  In this article I call for more recognition of and scholarly engagement with public, volunteer digital humanities projects, using the example of LibriVox.org to consider what public, sustainable, digital humanities work can look like beyond the contexts of institutional sponsorship. Thousands of volunteers are using LibriVox to collaboratively produce free audiobook versions of texts in the US public domain. The work of finding, selecting, and preparing texts to be digitized and published in audio form is complex and slow, and not all of this labor is ultimately visible, valued, or rewarded. Drawing on an ethnographic study of 12 years of archived discourse and documentation, I interrogate digital traces of the processes by which several LibriVox versions of Anne of Green Gables have come into being, watching for ways in which policies and infrastructure have been influenced by variously visible and invisible forms of work. Making visible the intricate, unique, archived experiences of the crowdsourcing community of LibriVox volunteers and their tools adds to still-emerging discussions about how to value extra-institutional, public, distributed digital humanities work.

Decolonizing International Research Groups: Prototyping a Digital Audio Repository from South to North

Abstract:  This article reflects on what it means to create a digital humanities (DH) project in the “Global South,” while it ponders some lessons it can offer to DH practitioners across the world, particularly from English-speaking academia. As a case study it considers the Digital Audio Repository for Latin American Sound Art and Poetry an initiative coordinated by PoéticaSonora, a research group formed by faculty members and students from Universidad Nacional Autónoma de México (UNAM, Mexico City) and Concordia University (Montreal). The prototyping process has brought out some reflections on the correlation between access and participation through information and communication technologies (here termed “knowledge democratization”), in order to expound PoéticaSonora’s theoretical-political positioning, drawing not only from decolonial thinkers and their critics but also from feminist, new materialist, and border studies on technology, art, and society. Then it discusses how the coloniality of knowledge pervades the international distribution of labour in the digital world and academic milieus, particularly through what Leanne Simpson calls “cognitive extractivism.” After proposing some strategies to avoid an extractivist workflow while designing a DH project, it finishes by offering three insightful lessons learned from the PoéticaSonora prototype: online access does not equal universal access; well-intended digital projects are not beneficial per se for the target community; and we must bring back to discussion the political dimension of digital labor and the social practices around it.

Open Islamicate Texts Initiative (OpenITI)

“The written heritage of the “Islamicate” cultures that stretch from modern Bengal to Spain is as vast as it is understudied and underrepresented in the digital humanities. The sheer volume and diversity of the surviving works produced in Persian and Arabic by denizens of these lands in the premodern period makes this body of texts ideal for computational forms of analysis. Efforts to utilize these new digital forms of analysis, however, have been stymied by poor OCR technology for Arabic-script languages and the lack of a open-access, standards-compliant Islamicate corpus.

The Open Islamicate Texts Initiative (OpenITI) is a multi-institutional effort to construct the first machine-actionable scholarly corpus of premodern Islamicate texts. Led by researchers at the Aga Khan University (AKU), Universität Wien (UW), and the Roshan Institute for Persian Studies at the University of Maryland (College Park) and an interdisciplinary advisory board of leading digital humanists and Islamic, Persian, and Arabic studies scholars, OpenITI aims to develop the digital infrastructure necessary to achieve this goal, including improved Arabic-script OCR, Arabic-script standards for OCR output and text encoding, and platforms for collaborative corpus creation (e.g., CorpusBuilder). In the process, OpenITI will enable new synergies between Digital Humanities and the inter-related Islamicate fields of Islamic, Persian, and Arabic Studies….”

Open access ‘seems such a seismic change’ | Research Information

“There isn’t a single challenge that runs evenly across all of the disciplines, but the biggest one we’re facing is how we can make open access work in a way that preserves what’s good about current scholarly publishing activities, and is also sustainable and allows for innovation. It’s very difficult to move past open access at the moment. It seems such a seismic change in how we think about the way we publish. 

In the UK open access has largely been implemented through hybrid journals, and the recent Plan S announcement is very firmly positioned against hybrid journals – so the system is still clearly being shaken up. There may have been a sense that journal publishing had settled down into this hybrid model, but it didn’t deliver entirely on the promise of open access and allowed publishers to preserve what they were doing without having to innovate quite so much. We’re going to have to find ways of working around that. 

A particular concern for people like me, a historian working in digital humanities, is how we accommodate books in all of this. The business models for book publishing are not really there yet, although there are some interesting experiments. It’s also the case that digital and open book content is largely excluded from ways of measuring usage. The price of a lot of academic books is an issue as well. Are there ways that we can work together to try to bring cost down?  That’s not an easy problem to fix either, but it’s an ongoing challenge in terms of recommending books to students and inequalities of access to this material….”

DARIAH | Digital Research Infrastructure for the Arts and Humanities

DARIAH is an ERIC, a pan-european infrastructure for arts and humanities scholars working with computational methods. It supports digital research as well as the teaching of digital research methods.

How does DARIAH work?
DARIAH is a network. It connects several hundreds of scholars and dozens of research facilities in currently 17 european countries, the DARIAH member countries. In addition DARIAH has several cooperating partner institutions in countries not being a member of DARIAH, and strong ties to many research projects across Europe. People in DARIAH provide digital tools and share data as well as know-how. They organize learning opportunities for digital research methods, like workshops and summer schools, and offer training materials for Digital Humanities.

Working groups
The DARIAH community also works together in working groups, with subjects ranging from the assessment of digital tools to the development of standards and the long term accessibility of research materials. Their activities may vary but they all share one common goal: Providing services to scholars in the arts and humanities and therefore helping to do research at its best.

Want to become part of the network?
DARIAH is open to everyone. Whether you would like to participate in one of DARIAH’s working groups, work towards your country becoming a DARIAH partner, like to see your institution cooperate with DARIAH, or you are just looking for someone to share know-how and to support your research project, get in touch with us: info@dariah.eu….”

News – Liverpool University Press journal, Francosphères, to flip to full open-access with funding from the Open Library of Humanities

“We are extremely pleased to announce that our international library partners have voted to accept Francosphères’ application to join the Open Library of Humanities. This is part of our partnership with Liverpool University Press, and is the second journal – following Quaker Studies last year– that has moved from a subscription model to our full open-access model. Francosphères is a highly respected journal devoted to transcultural and intercultural French Studies edited by an international team based in Paris, Oxford and London. Established in 2012 to support recent advances in postcolonial and gender theory, the journal has been publishing articles in English and French that seek to explore and interrogate the presence of French language and culture across frontiers and borders and how this is legitimated in ‘Francophone’ culture.”

Cultural Observatory – Culturomics

The Cultural Observatory at Harvard is working to enable the quantitative study of human culture across societies and across centuries. We do this in three ways:

  • Creating massive datasets relevant to human culture
  • Using these datasets to power wholly new types of analysis
  • Developing tools that enable researchers and the general public to query the data …”

Building Capacity for Digital Humanities: A Framework for Institutional Planning | EDUCAUSE

Abstract:  A growing number of researchers in the humanities are using computational tools and methods that are more typically associated with social and scientific research. These tools and techniques enable researchers to pursue new forms of inquiry and new questions and bring more attention to—and cultivate broader interest in—traditional humanities and humanities data. This paper from ECAR and the Coalition for Networked Information (CNI) outlines a practical framework for capacity building to develop institutional digital humanities support for IT staff, librarians, administrators, and faculty with administrative responsibilities.