A detailed open access model of the PubMed literature | Scientific Data

Abstract:  Portfolio analysis is a fundamental practice of organizational leadership and is a necessary precursor of strategic planning. Successful application requires a highly detailed model of research options. We have constructed a model, the first of its kind, that accurately characterizes these options for the biomedical literature. The model comprises over 18 million PubMed documents from 1996–2019. Document relatedness was measured using a hybrid citation analysis?+?text similarity approach. The resulting 606.6 million document-to-document links were used to create 28,743 document clusters and an associated visual map. Clusters are characterized using metadata (e.g., phrases, MeSH) and over 20 indicators (e.g., funding, patent activity). The map and cluster-level data are embedded in Tableau to provide an interactive model enabling in-depth exploration of a research portfolio. Two example usage cases are provided, one to identify specific research opportunities related to coronavirus, and the second to identify research strengths of a large cohort of African American and Native American researchers at the University of Michigan Medical School.

 

ANN: A platform to annotate text with Wikidata IDs | Zenodo

Abstract:  Report of the work done by the Ann team at the eLife Sprint 2020. 

It describes the effort pursued towards a system for universal annotation of biomedical articles using the collaborative knowledge graph of Wikidata.  

The project is currently active at https://github.com/lubianat/ann. 

DECLARATION TO IMPROVE BIOMEDICAL & HEALTH RESEARCH

“We are an international group of researchers and patients who believe that:

it is ethically untenable to remain complicit in the crises that undermine science,

there are simple measures which can improve the quality and openness, and

the public and patients have a right to full access of the research they fund….”

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Abstract:  Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.

We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-language original research publications from a single biomedical research institution (n = 8689) and randomly selected from PubMed (n = 1500) we iteratively developed a set of derived keyword categories. ODDPub can detect data sharing through field-specific repositories, general-purpose repositories or the supplement. Additionally, it can detect shared analysis code (Open Code).

To validate ODDPub, we manually screened 792 publications randomly selected from PubMed. On this validation dataset, our algorithm detected Open Data publications with a sensitivity of 0.73 and specificity of 0.97. Open Data was detected for 11.5% (n = 91) of publications. Open Code was detected for 1.4% (n = 11) of publications with a sensitivity of 0.73 and specificity of 1.00. We compared our results to the linked datasets found in the databases PubMed and Web of Science.

Our algorithm can automatically screen large numbers of publications for Open Data. It can thus be used to assess Open Data sharing rates on the level of subject areas, journals, or institutions. It can also identify individual Open Data publications in a larger publication corpus. ODDPub is published as an R package on GitHub.

 

ODDPub – a Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications

Abstract:  Open research data are increasingly recognized as a quality indicator and an important resource to increase transparency, robustness and collaboration in science. However, no standardized way of reporting Open Data in publications exists, making it difficult to find shared datasets and assess the prevalence of Open Data in an automated fashion.

We developed ODDPub (Open Data Detection in Publications), a text-mining algorithm that screens biomedical publications and detects cases of Open Data. Using English-language original research publications from a single biomedical research institution (n = 8689) and randomly selected from PubMed (n = 1500) we iteratively developed a set of derived keyword categories. ODDPub can detect data sharing through field-specific repositories, general-purpose repositories or the supplement. Additionally, it can detect shared analysis code (Open Code).

To validate ODDPub, we manually screened 792 publications randomly selected from PubMed. On this validation dataset, our algorithm detected Open Data publications with a sensitivity of 0.73 and specificity of 0.97. Open Data was detected for 11.5% (n = 91) of publications. Open Code was detected for 1.4% (n = 11) of publications with a sensitivity of 0.73 and specificity of 1.00. We compared our results to the linked datasets found in the databases PubMed and Web of Science.

Our algorithm can automatically screen large numbers of publications for Open Data. It can thus be used to assess Open Data sharing rates on the level of subject areas, journals, or institutions. It can also identify individual Open Data publications in a larger publication corpus. ODDPub is published as an R package on GitHub.

 

Two years into the Brazilian Reproducibility Initiative: reflections on conducting a large-scale replication of Brazilian biomedical science

Abstract:  Scientists have increasingly recognised that low methodological and analytical rigour combined with publish-or-perish incentives can make the published scientific literature unreliable. As a response to this, large-scale systematic replications of the literature have emerged as a way to assess the problem empirically. The Brazilian Reproducibility Initiative is one such effort, aimed at estimating the reproducibility of Brazilian biomedical research. Its goal is to perform multicentre replications of a quasi-random sample of at least 60 experiments from Brazilian articles published over a 20-year period, using a set of common laboratory methods. In this article, we describe the challenges of managing a multicentre project with collaborating teams across the country, as well as its successes and failures over the first two years. We end with a brief discussion of the Initiative’s current status and its possible future contributions after the project is concluded in 2021.

 

Statement on Final NIH Policy for Data Management and Sharing | National Institutes of Health (NIH)

“The extraordinary effort to speed the development of treatments and vaccines in response to the COVID-19 pandemic has put into sharp relief the need for the global science community to share scientific data openly. As the world’s largest funder of biomedical research, NIH is addressing this need with a new NIH Policy for Data Management and Sharing. This policy requires researchers to plan prospectively for managing and sharing scientific data generated with NIH funds. This policy also establishes the baseline expectation that data sharing is a fundamental component of the research process, which is in line with NIH’s longstanding commitment to making the research it funds available to the public….”

Building capacity through open approaches: Lessons from developing undergraduate electrophysiology practicals

Abstract:  Electrophysiology has a wide range of biomedical research and clinical applications. As such, education in the theoretical basis and hands-on practice of electrophysiological techniques is essential for biomedical students, including at the undergraduate level. However, offering hands-on learning experiences is particularly difficult in environments with limited resources and infrastructure. In 2017, we began a project to design and incorporate electrophysiology laboratory practicals into our Biomedical Physics undergraduate curriculum at the Universidad Nacional Autónoma de México. We describe some of the challenges we faced, how we maximized resources to overcome some of these challenges, and in particular, how we used open scholarship approaches to build both educational and research capacity. The use of open tools, open platforms, and open licenses was key to the success and broader impact of our project. We share examples of our practicals and explain how we use these activities to strengthen interdisciplinary learning, namely the application of concepts in physics to understanding functions of the human body. Our goal is to provide ideas, materials, and strategies for educators working in similar resource-limited environments.