Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis

“Imagine you are working with two digital humanities scholars studying post-WWII poetry, both of whom are utilizing a single group of copyright-protected works. The first scholar has collected dozens of these poems to closely analyze artistic approach within a literary framework. The second has built a personal database of the poems to apply automated techniques and statistical methods to identify patterns in the poems’ syntax. This latter methodology—in which previously unknown patterns, trends, or relationships are extracted from a collection of textual documents—is an example of “computational text analysis” (CTA),2 also commonly referred to as “text mining” or “text data mining.”3 …”

New Resource on Law and Literacy in Non-Consumptive Text Mining | Authors Alliance

“Scholars are increasingly using text data mining to uncover previously unknown patterns, trends, or relationships from a collection of textual documents. In doing so, many of these researchers may be accessing, building, working with, and sharing materials without understanding the legal implications of their actions. In their newly released chapter, Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis (in Copyright Conversations: Rights Literacy in a Digital World), Rachael G. Samberg and Cody Hennesy analyze the legal issues that can arise when researchers are engaged in text data mining and provide guidance on how to approach these issues….”

Kumsal Bayazit, Elsevier CEO, shares her vision for building a better future in research

“First and foremost, I want to be very clear: Elsevier fully supports open access….

In fact, my professional background is in applying technology to content to help professionals make better decisions. For example, working in the part of RELX that serves legal professionals, I’ve seen the powerful benefits of analytical services that are built on top of freely available content, such as case law. This is why I’m excited by the potential to create value for researchers by applying text-mining and artificial intelligence technologies to the entire corpus of peer-reviewed content. I understand and appreciate the role that open access can play in delivering that vision.

The question is not whether open access is desirable or beneficial — the question is how we get there. My takeaway from my discussions on the topic is that there are many points of view. Publishers are often blamed for not making enough progress, which I think is fair. But it would also be unfair not to recognize the lack of alignment within our communities about the best way forward, which is understandable as this is a multi-dimensional issue that requires substantial problem-solving and action to make progress.

I am a pragmatist, and I commit to working pragmatically with libraries and other stakeholders to achieve shared open access goals. Part of this means acknowledging obstacles where they exist and discussing them openly and objectively so that we can find solutions to overcome them. If we don’t, progress will continue to be slow. I feel optimistic given the extent of commitment to make progress. In that spirit, please allow me to share t some of the obstacles that I have learned about in the last nine months….”

Publishers’ Responsibilities in Promoting Data Quality and Reproducibility | SpringerLink

Abstract:  Scholarly publishers can help to increase data quality and reproducible research by promoting transparency and openness. Increasing transparency can be achieved by publishers in six key areas: (1) understanding researchers’ problems and motivations, by conducting and responding to the findings of surveys; (2) raising awareness of issues and encouraging behavioural and cultural change, by introducing consistent journal policies on sharing research data, code and materials; (3) improving the quality and objectivity of the peer-review process by implementing reporting guidelines and checklists and using technology to identify misconduct; (4) improving scholarly communication infrastructure with journals that publish all scientifically sound research, promoting study registration, partnering with data repositories and providing services that improve data sharing and data curation; (5) increasing incentives for practising open research with data journals and software journals and implementing data citation and badges for transparency; and (6) making research communication more open and accessible, with open-access publishing options, permitting text and data mining and sharing publisher data and metadata and through industry and community collaboration. This chapter describes practical approaches being taken by publishers, in these six areas, their progress and effectiveness and the implications for researchers publishing their work.

 

 

MIT framework for negotiating with scholarly publishers gains wide support

“Who should own and control the dissemination of research? Not academic publishers, according to a new framework developed by library leaders at the Massachusetts Institute of Technology.

The framework, published this week, asserts that control of scholarship and the way in which it is distributed should reside with scholars and their institutions. The document contains six core principles that will be used by MIT as a starting point for future contract negotiations with academic publishers.

The principles aim to ensure that research is available openly and appropriately archived. They also call for fair and transparent pricing of publisher services and say that no author should be forced to give up a copyright in order to publish their work. Instead, authors should be provided with “generous reuse rights,” the framework says….”

Journal practices (other than OA) promoting Open Science goals | Zenodo

“Journal practices (other than OA) promoting Open Science goals (relevance, reproducibility, efficiency, transparency)

Early, full and reproducible content

preregistration – use preregistrations in the review process
registered reports – apply peer review to preregistration prior to the study and publish results regardless of outcomes
preprint policy – liberally allow preprinting in any archive without license restrictions
data/code availability – foster or require open availability of data and code for reviewers and readers
TDM allowance – allow unrestricted TDM of full text and metadata for any use
null/negative results – publish regardless of outcome
 

Machine readable ecosystem

data/code citation – promote citation and use standards
persistent IDs – e.g. DOI, ORCID, ROR, Open Funder Registry, grant IDs
licenses (in Crossref) – register (open) licenses in Crossref
contributorship roles – credit all contributors for their part in the work
open citations – make citation information openly available via Crossref
 

Peer review

open peer review – e.g. open reports and open identities
peer review criteria – evaluate methodological rigour and reporting quality only or also judge expected relevance or impact?
rejection rates – publish rejection rates and reconsider high selectivity
post-publication peer review – publish immediately after sanity check and let peer review follow that?
 

Diversity

author diversity – age, position, gender, geography, ethnicity, colour
reviewer diversity – age, position, gender, geography, ethnicity, colour
editor diversity – age, position, gender, geography, ethnicity, colour

Metrics and DORA

DORA: journal metrics – refrain from promoting
DORA: article metrics – provide a range and use responsibly…”

MIT announces framework to guide negotiations with publishers | MIT News

“The MIT Libraries, together with the MIT Committee on the Library System and the Ad Hoc Task Force on Open Access to MIT’s Research, announced that it has developed a principle-based framework to guide negotiations with scholarly publishers. The framework emerges directly from the core principles for open science and open scholarship articulated in the recommendations of the Task Force on Open Access to MIT’s Research, which released its final report to the MIT community on Oct. 18.

The framework affirms the overarching principle that control of scholarship and its dissemination should reside with scholars and their institutions. It aims to ensure that scholarly research outputs are openly and equitably available to the broadest possible audience, while also providing valued services to the MIT community….”

MIT Framework for Publisher Contracts | Scholarly Publishing – MIT Libraries

“The core principles of an MIT Framework for publisher contracts are:

No author will be required to waive any institutional or funder open access policy to publish in any of the publisher’s journals.
No author will be required to relinquish copyright, but instead will be provided with options that enable publication while also providing authors with generous reuse rights.
Publishers will directly deposit scholarly articles in institutional repositories immediately upon publication or will provide tools/mechanisms that facilitate immediate deposit.
Publishers will provide computational access to subscribed content as a standard part of all contracts, with no restrictions on non-consumptive, computational analysis of the corpus of subscribed content.
Publishers will ensure the long-term digital preservation and accessibility of their content through participation in trusted digital archives.
Institutions will pay a fair and sustainable price to publishers for value-added services, based on transparent and cost-based pricing models….”

Text mining for clinical support | Hartmann | Journal of the Medical Library Association

Abstract:  Background: In 2013, the Dahlgren Memorial Library (DML) at the Georgetown University Medical Center began using text mining software to enable its clinical informationists to quickly retrieve specific, relevant information from MEDLINE abstracts while on patient rounds.

Description: In 2013, DML licensed the use of the Linguamatics I2E text-mining program, and DML’s clinical informationist began using it to text mine MEDLINE abstracts on patient rounds. In 2015, DML installed I2E on a server at Georgetown and negotiated with Elsevier to obtain the right to download and text mine the full text of clinical journals in ScienceDirect to support clinical decision making. In 2016, the license agreements for the New England Journal of Medicine and the BMJ platform were modified to allow text mining. In 2018, PubMed Central open access content was added to the Linguamatics license.

Results: DML’s informationists found that they were able to quickly find useful information that was not retrievable by traditional methods, and clinicians reported the information was valuable.

Conclusion: The ability to text mine MEDLINE abstracts and selected journal articles on patient rounds has allowed DML’s clinical informationists to quickly search large amounts of medical literature that can be used to answer physicians’ clinical questions. DML plans to acquire additional journal articles from selected publishers in the future, which should increase the usefulness of the project.

Virtual Projects are published on an annual basis in the Journal of the Medical Library Association (JMLA) following an annual call for virtual projects in MLAConnect and announcements to encourage submissions from all types of libraries. An advisory committee of recognized technology experts selects project entries based on their currency, innovation, and contribution to health sciences librarianship.

Text mining for clinical support | Hartmann | Journal of the Medical Library Association

Abstract:  Background: In 2013, the Dahlgren Memorial Library (DML) at the Georgetown University Medical Center began using text mining software to enable its clinical informationists to quickly retrieve specific, relevant information from MEDLINE abstracts while on patient rounds.

Description: In 2013, DML licensed the use of the Linguamatics I2E text-mining program, and DML’s clinical informationist began using it to text mine MEDLINE abstracts on patient rounds. In 2015, DML installed I2E on a server at Georgetown and negotiated with Elsevier to obtain the right to download and text mine the full text of clinical journals in ScienceDirect to support clinical decision making. In 2016, the license agreements for the New England Journal of Medicine and the BMJ platform were modified to allow text mining. In 2018, PubMed Central open access content was added to the Linguamatics license.

Results: DML’s informationists found that they were able to quickly find useful information that was not retrievable by traditional methods, and clinicians reported the information was valuable.

Conclusion: The ability to text mine MEDLINE abstracts and selected journal articles on patient rounds has allowed DML’s clinical informationists to quickly search large amounts of medical literature that can be used to answer physicians’ clinical questions. DML plans to acquire additional journal articles from selected publishers in the future, which should increase the usefulness of the project.

Virtual Projects are published on an annual basis in the Journal of the Medical Library Association (JMLA) following an annual call for virtual projects in MLAConnect and announcements to encourage submissions from all types of libraries. An advisory committee of recognized technology experts selects project entries based on their currency, innovation, and contribution to health sciences librarianship.