Team Awarded Grant to Help Digital Humanities Scholars Navigate Legal Issues of Text Data Mining – UC Berkeley Library Update

“We are thrilled to share that the National Endowment for the Humanities (NEH) has awarded a $165,000 grant to a UC Berkeley-led team of legal experts, librarians, and scholars who will help humanities researchers and staff navigate complex legal questions in cutting-edge digital research….

Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine the researchers needed to scrape content about Egyptian artifacts from online sites or databases, or download videos about Egyptian tomb excavations, in order to conduct their automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining. 

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. …

Digital Content vs. Digital Access | Digital Tweed

“All good, it would seem.  These strategies suggest lower cost, “fresher” (or constantly improving) curricular content along with better options for Day One access.   After all, textbook prices are the low-hanging fruit (and publishers the villains) in one component of the continuing public anger and angst about college costs.  So strategies that promise to reduce costs and enhance Day One access are good things.

And yet, going digital or digital first strategies may actually disadvantage large numbers of low-income, full- and part-time undergraduates, primarily enrolled in community colleges or public four-year comprehensives, who are the intended beneficiaries of these initiatives.  As shown below, there is consistent and significant concern from faculty, from provosts/Chief Academic Officers, and from CIOs, about digital access as a key issue in the process of going digital….”

The Right to Read is the Right To Mine: But Not When Blocked by Technical Protection Measures – LIBER

“Our Copyright & Legal Matters Working Group is working with LACA to gather evidence about what happens when Technical Protection Measures (TPMs) block researchers from accessing content because they have attempted text and data mining. 

The survey asks questions related to the type of content blocked, how the issue was solved and how long it took for access to return to business as usual. …”

The Right to Read is the Right To Mine: But Not When Blocked by Technical Protection Measures – LIBER

“Our Copyright & Legal Matters Working Group is working with LACA to gather evidence about what happens when Technical Protection Measures (TPMs) block researchers from accessing content because they have attempted text and data mining. 

The survey asks questions related to the type of content blocked, how the issue was solved and how long it took for access to return to business as usual. …”

Roundtable on Aligning Incentives for Open Science

“In order to increase the contribution of open science to producing better science, the National Academies of Sciences, Engineering, and Medicine’s Roundtable on Aligning Incentives for Open Science convenes critical stakeholders to discuss the effectiveness of current incentives for adopting open science practices, current barriers of all types, and ways to move forward in order to align reward structures and institutional values. The Roundtable convenes two times per year and creates a venue for the exchange of ideas and joint strategic planning among key stakeholders. Each Roundtable meeting has a theme. The diverse themes target slightly different audiences but the core audience will consist of universities, government agencies, foundations, and other groups doing work related to open science. The Roundtable aims to improve coordination among stakeholders and increase awareness of current and future efforts in the broader open science community. The Roundtable will also convene one symposium per year, which may produce National Academies proceedings in brief….”

Repository Ouroboros – Ruth Kitchin Tillman

“The library is going to adopt a new repository and you just got hired to make it happen. You may be fresh out of library school with a few metadata projects under your belt. Perhaps you did metadata work on a two-year digitization grant and are looking forward to getting out of the spreadsheet mines. Or maybe you worked in a similar job at your last institution—running a turnkey repository like DSpace, CONTENTdm, or BePress. Perhaps you’ve moved into the role at an angle, from something more traditional like cataloging….

[After a long struggle] But you feel like a fraud. You feel so discouraged. You are sure everyone else is ahead of you. You do not yet see that you are just one more person riding the Repository Ouroboros.”

The Economic Impacts of Open Science: A Rapid Evidence Assessment | HTML

Abstract:  A common motivation for increasing open access to research findings and data is the potential to create economic benefits—but evidence is patchy and diverse. This study systematically reviewed the evidence on what kinds of economic impacts (positive and negative) open science can have, how these comes about, and how benefits could be maximized. Use of open science outputs often leaves no obvious trace, so most evidence of impacts is based on interviews, surveys, inference based on existing costs, and modelling approaches. There is indicative evidence that open access to findings/data can lead to savings in access costs, labour costs and transaction costs. There are examples of open science enabling new products, services, companies, research and collaborations. Modelling studies suggest higher returns to R&D if open access permits greater accessibility and efficiency of use of findings. Barriers include lack of skills capacity in search, interpretation and text mining, and lack of clarity around where benefits accrue. There are also contextual considerations around who benefits most from open science (e.g., sectors, small vs. larger companies, types of dataset). Recommendations captured in the review include more research, monitoring and evaluation (including developing metrics), promoting benefits, capacity building and making outputs more audience-friendly.

Why Are So Many Scholarly Communication Infrastructure Providers Running a Red Queen’s Race? | Educopia Institute

“A few weeks ago, we at Educopia published the first project deliverable for the “Mapping Scholarly Communication Infrastructure” project, which we’re working on with Middlebury College and TrueBearing Consulting. The deliverable is a report and a set of data visualizations based on our deep dive into the organizational and technical infrastructures of “Scholarly Communication Resources,” (SCRs) or the tools, platforms, and services that undergird and support today’s digital knowledge infrastructures.

The report details our project team’s findings from the Census of Scholarly Communication Infrastructure Providers—a survey we ran this spring (and have recently reopened with IOI) to which 45 programs and organizations willingly gave hours of their time and scads of information about their technical development and design, their fiscal models, their revenue streams and expenditures, their documentation, and their governance and community engagement work….


Academic review promotion and tenure documents promote a view of open access that is at odds with the wider academic community | Impact of Social Sciences

“Overall, the results of our survey give reason to be optimistic: the majority of faculty understand that OA is about making research accessible and available. However, they also point to persistent misconceptions about OA, like necessarily high costs and low quality. This raises questions: How might these misconceptions be affecting RPT [review, promotion, and tenure] evaluations? How should researchers who want to prioritise the public availability of their work guard against the potential that their peers hold one of these negative associations? And, as a community, how can we better communicate the complexities of OA without further diluting the central message of open access? Perhaps we can begin by adequately representing and incentivising the basic principles of openness in our RPT documents.”

Journal impact factor: a bumpy ride in an open space | Journal of Investigative Medicine

Abstract:  The journal impact factor (IF) is the leading method of scholarly assessment in today’s research world. An important question is whether or not this is still a constructive method. For a specific journal, the IF is the number of citations for publications over the previous 2 years divided by the number of total citable publications in these years (the citation window). Although this simplicity works to an advantage of this method, complications arise when answers to questions such as ‘What is included in the citation window’ or ‘What makes a good journal impact factor’ contain ambiguity. In this review, we discuss whether or not the IF should still be considered the gold standard of scholarly assessment in view of the many recent changes and the emergence of new publication models. We will outline its advantages and disadvantages. The advantages of the IF include promoting the author meanwhile giving the readers a visualization of the magnitude of review. On the other hand, its disadvantages include reflecting the journal’s quality more than the author’s work, the fact that it cannot be compared across different research disciplines, and the struggles it faces in the world of open access. Recently, alternatives to the IF have been emerging, such as the SCImago Journal & Country Rank, the Source Normalized Impact per Paper and the Eigenfactor Score, among others. However, all alternatives proposed thus far are associated with their own limitations as well. In conclusion, although IF contains its cons, until there are better proposed alternative methods, IF remains one of the most effective methods for assessing scholarly activity.