“So what, as a thought experiment, might it look like to rethink copyright? What would I suggest if we could get new primary legislation in the UK to change research and copyright arrangements?
I would make it so that research produced by employees at publicly funded research universities could not be placed under copyright. (i.e. were committed to the public domain.) A downstream provision could be included that would mean that no new copyright could be placed on such work by dint of design, typography etc.
I would abolish the implementation of EU Directive 2001/29/EC, at least for academic researchers. This directive makes it a criminal offence to break Digital Rights Management/Technical Protection Measures on digital files. Without the modification or abolition of this criminal directive, even public-domain work can be unusable for text mining.
I would allow academic researchers to re-use and to re-publish material, even that in copyright, that is necessary for their work. In other words, I would absolve academic researchers and institutions of copyright offences that are necessary to conducting their work. This would include distributing in-copyright articles and books to colleagues; publishing in-copyright images and videos that are necessary for work. I would include a clause that such re-use must include attribution credit.
I would extend the current copyright exemptions for text and data mining to a blanket non-commercial research exemption. I would add an allowance to circumvent any API rate limiting or other technological protection measure for the purposes of mining material for research purposes….”
“In increasingly knowledge-based societies and economies, data are a key resource. Enhanced access to publicly funded data enables research and innovation, and has far-reaching effects on resource efficiency, productivity and competitiveness, creating benefits for society at large. Yet these benefits must also be balanced against associated risks to privacy, intellectual property, national security and the public interest. This report presents current policy practice to promote access to publicly funded data for science, technology and innovation, as well as policy challenges for the future. It examines national policies and international initiatives, and identifies seven issues that require policy attention….”
“bioRxiv and medRxiv provide free and unrestricted access to all articles posted on their servers. We believe this should apply not only to human readers but also to machine analysis of the content. A growing variety of resources have been created to facilitate this access.
bioRxiv and medRxiv metadata are made available via a number of dedicated RSS feeds and APIs. Simplified summary statistics covering the content and usage are also available. For bioRxiv, this information is available here’
Bulk access to the full text of bioRxiv articles for the purposes of text and data mining (TDM) is available via a dedicated Amazon S3 resource. Click here for details of this TDM resource and how to access it….”
“The COVID-19 pandemic, and the global scientific effort to develop treatments and vaccines, is the latest large-scale event to show the power and urgency of collaboration and data-sharing to solve society’s greatest challenges. Research libraries and librarians play a critical role in data management, education, and policy, empowering researchers to use data more effectively….
The Academic Data Science Alliance (ADSA) —a community of leaders, practitioners, educators, and librarians—came together to expand the cumulative experience of the cross-disciplinary Moore-Sloan Data Science Environments to other institutions. ADSA holds virtual events on scaling data-science capacity. Libraries and librarians are involved in data science as data curators, trainers, tool builders, and more. To meet this moment, ADSA has also amassed COVID-19 data-science resources and is crowdsourcing expansion of those resources….
In January 2020, the Library Copyright Alliance (LCA) filed public comments with the US Patent and Trademark Office on “Intellectual Property Protection for Artificial Intelligence Innovation.” The LCA explained how the right of fair use in US copyright law clears the way for much of the data processing—often involving large volumes of copyrighted material—that makes machine learning possible. …
Text and data mining are also critical tools in the digital humanities, and require “legal literacy,” or the knowledge and confidence of finding and using sources for this work. Funded by the US National Endowment for the Humanities, a team of librarians, legal experts, and scholars are building an open educational curriculum called “Building Legal Literacies for Text Data Mining.” …”
“The Covid-19 crisis is leading to a “sea change” in the way that researchers are collating and analysing research in a bid to keep up with the “phenomenal” growth in scholarship on the topic, experts have suggested.
According to one search portal for coronavirus research, as of 3 April more than 6,000 papers, including preprints, have been published on the topic and related areas since the beginning of the year….
He added that the fact that many publishers were making Covid-19 research open access also meant that scholars could get around the overwhelming nature of dealing with such a vast amount of information by using sophisticated search techniques such as text mining….”
“This page aims to support researchers and interested individuals by providing tools and data sets related to the Coronavirus disease 2019 (COVID-19) outbreak and the SARS-CoV-2 virus.
Please contact us if you need help or have suggestions for further tools or data sets. We are experienced in bioinformatical data analysis, text mining, data visualization, (FAIR) research data management as well as in hosting information services….”
“The Crossref API can be used for locating the full text of published articles and preprints for the purpose of text mining.
Crossref members who have have subscription-access content and who want to make some of their content available for text mining need to take the following steps.
The Crossref schema supports the NISO Access and License Indicators ALI section, and, normally, the free_to_read functionality of ALI would be the recommended mechanism for indicating that content is available for free (e.g. “gratis”, not “open”). However, the ALI free_to_read element is not currently exposed through our REST API filters.
But we have defined a workaround that allows members to both register the ALI free_to_read element and an equivalent assertion that will work with the REST API and which will allow researchers to locate content that has been flagged as “free.”…”
“This project aims to use modern tools, especially Wikidata (and Wikpedia), R, Java, textmining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge.
The world faces (and will continue to face) viral epdemics which arise suddenly and where scientific/medical knowledge is a critical resource. Despite over 100 Billion USD on medical research worldwide much knowledge is behind publisher paywalls and only available to rich universities. Moreover it is usually badly published, dispersed without coherent knowledge tools. It particularly disadvantages the Global South. This project aims to use modern tools, especially Wikidata (and Wikpedia), R, Java, textmining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge….”
“Today, researchers and leaders from the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) at the National Institutes of Health released the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2, and the Coronavirus group.
Requested by The White House Office of Science and Technology Policy, the dataset represents the most extensive machine-readable Coronavirus literature collection available for data and text mining to date, with over 29,000 articles, more than 13,000 of which have full text.
Now, The White House joins these institutions in issuing a call to action to the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19….”
“As higher education and most of daily life has withdrawn from physical spaces and shifted to a fully online environment to mitigate the spread of COVID-19, librarians and library collectives have focused their already strong communication and advocacy networks to ensure continuous access to information for scholars and all others who depend on it now.
Viewing this moment with scholars at the center, the coronavirus pandemic is revealing both what works and what doesn’t in scholarly communication. The scale of this disease presents a dire use case for open science—the rapid sharing and evaluation of research, and an emphasis on machine readability and computability to handle volume and speed….”