What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers | Fantastic Anachronism

[Some recommendations:]

Ignore citation counts. Given that citations are unrelated to (easily-predictable) replicability, let alone any subtler quality aspects, their use as an evaluative tool should stop immediately.
Open data, enforced by the NSF/NIH. There are problems with privacy but I would be tempted to go as far as possible with this. Open data helps detect fraud. And let’s have everyone share their code, too—anything that makes replication/reproduction easier is a step in the right direction.
Financial incentives for universities and journals to police fraud. It’s not easy to structure this well because on the one hand you want to incentivize them to minimize the frauds published, but on the other hand you want to maximize the frauds being caught. Beware Goodhart’s law!
Why not do away with the journal system altogether? The NSF could run its own centralized, open website; grants would require publication there. Journals are objectively not doing their job as gatekeepers of quality or truth, so what even is a journal? A combination of taxonomy and reputation. The former is better solved by a simple tag system, and the latter is actually misleading. Peer review is unpaid work anyway, it could continue as is. Attach a replication prediction market (with the estimated probability displayed in gargantuan neon-red font right next to the paper title) and you’re golden. Without the crutch of “high ranked journals” maybe we could move to better ways of evaluating scientific output. No more editors refusing to publish replications. You can’t shift the incentives: academics want to publish in “high-impact” journals, and journals want to selectively publish “high-impact” research. So just make it impossible. Plus as a bonus side-effect this would finally sink Elsevier….”

Knowledge Infrastructure and the Role of the University · Commonplace

“As open access to research information grows and publisher business models adapt accordingly, knowledge infrastructure has become the new frontier for advocates of open science. This paper argues that the time has come for universities and other knowledge institutions to assume a larger role in mitigating the risks that arise from ongoing consolidation in research infrastructure, including the privatization of community platforms, commercial control of analytics solutions, and other market-driven trends in scientific and scholarly publishing….

The research community is rightfully celebrating more open access and open data, yet there is growing recognition in the academic community that pay-to-publish open access is not the panacea people were hoping for when it comes to affordable, sustainable scholarly and scientific publishing. Publication is, after all, only one step in a flow of research communication activities that starts with the collection and analysis of research data and ends with assessment of research impact. Open science is the movement towards open methods, data, and software, to enhance reproducibility, fairness, and distributed collaboration in science. The construct covers such diverse elements as the use of open source software, the sharing of data sets, open and transparent peer review processes, open repositories for the long-term storage and availability of both data and articles, as well as the availability of open protocols and methodologies that ensure the reproducibility and overall quality of research. How these trends can be reconciled with the economic interests of the publishing industry as it is currently organized remains to be seen, but the time is ripe for greater multi-stakeholder coordination and institutional investment in building and maintaining a diversified open infrastructure pipeline.”

Viral Science: Masks, Speed Bumps, and Guard Rails: Patterns

“With the world fixated on COVID-19, the WHO has warned that the pandemic response has also been accompanied by an infodemic: overabundance of information, ranging from demonstrably false to accurate. Alas, the infodemic phenomenon has extended to articles in scientific journals, including prestigious medical outlets such as The Lancet and NEJM. The rapid reviews and publication speed for COVID-19 papers has surprised many, including practicing physicians, for whom the guidance is intended….

The Allen Institute for AI (AI2) and Semantic Scholar launched the COVID-19 Open Research Dataset (CORD-19), a growing corpus of papers (currently 130,000 abstracts plus full-text papers being used by multiple research groups) that are related to past and present coronaviruses.

Using this data, AI2, working with the University of Washington, released a tool called SciSight, an AI-powered graph visualization tool enabling quick and intuitive exploration
6

 of associations between biomedical entities such as proteins, genes, cells, drugs, diseases, and patient characteristics as well as between different research groups working in the field. It helps foster collaborations and discovery as well as reduce redundancy….

The research community and scientific publishers working together need to develop and make accessible open-source software tools to permit the dual-track submission discussed above. Repositories such as Github are a start….”

Wellcome and Ripeta partner to assess dataset availability in funded research – Digital Science

“Ripeta and Wellcome are pleased to announce a collaborative effort to assess data and code availability in the manuscripts of funded research projects.

The project will analyze papers funded by Wellcome from the year prior to it establishing a dedicated Open Research team (2016) and from the most recent calendar year (2019). It supports Wellcome’s commitment to maximising the availability and re-use of results from its funded research.

Ripeta, a Digital Science portfolio company, aims to make better science easier by identifying and highlighting the important parts of research that should be transparently presented in a manuscript and other materials.

The collaboration will leverage Ripeta’s natural language processing (NLP) technology, which scans articles for reproducibility criteria. For both data availability and code availability, the NLP will produce a binary yes-no response for the presence of availability statements. Those with a “yes” response will then be categorized by the way that data or code are shared….”

Sharing and organizing research products as R packages | SpringerLink

Abstract:  A consensus on the importance of open data and reproducible code is emerging. How should data and code be shared to maximize the key desiderata of reproducibility, permanence, and accessibility? Research assets should be stored persistently in formats that are not software restrictive, and documented so that others can reproduce and extend the required computations. The sharing method should be easy to adopt by already busy researchers. We suggest the R package standard as a solution for creating, curating, and communicating research assets. The R package standard, with extensions discussed herein, provides a format for assets and metadata that satisfies the above desiderata, facilitates reproducibility, open access, and sharing of materials through online platforms like GitHub and Open Science Framework. We discuss a stack of R resources that help users create reproducible collections of research assets, from experiments to manuscripts, in the RStudio interface. We created an R package, vertical, to help researchers incorporate these tools into their workflows, and discuss its functionality at length in an online supplement. Together, these tools may increase the reproducibility and openness of psychological science.

 

Welcome to a new ERA of reproducible publishing | Labs | eLife

“Since 2017, we have been working on the concept of computationally reproducible papers. The open-source suite of tools that started life as the Reproducible Document Stack is now live on eLife as ERA, the Executable Research Article, delivering a truly web-native format for taking published research to a new level of transparency, reproducibility and interactivity.

From today, authors with a published eLife paper can register their interest to enrich their published work with the addition of live code blocks, programmatically-generated interactive figures, and dynamically generated in-line values, using familiar tools like R Markdown and Jupyter in combination with Stencila Hub’s intuitive asset management and format conversion interface. The resulting new ERA publication will be presented as a complement to the original published paper. Very soon, a Google Docs plugin will also be made available to let authors insert executable code and data blocks into their documents using the cloud service.

Readers of ERA publications will be able to inspect the code, modify it, and re-execute it directly in the browser, enabling them to better understand how a figure is generated. They will be able to change a plot from one format to another, alter the data range of a specific analysis, and much more. All changes are limited to an individual’s browsing session and do not affect the published article, so anyone can experiment safely. Readers can also download the ERA publication – with all embedded code and data preserved – and use it as a basis for further study or derivative works….”

What can the humanities do for data science? | The Alan Turing Institute

“The paper outlines recommendations in seven areas across two themes to support, and further, interdisciplinary research in data science and humanities, including:

Research Process 

Methodological frameworks and epistemic cultures: Develop common methodological frameworks/terminology and encourage wider use of shared research protocols in these areas. 
Best practices in the use and evaluation of computational tools: Use practices that ensure transparency and openness in research, and training programmes to help choose the most suitable computational tools in humanities research. 
Reproducible and open research: Promote transparent and reproducible research in the humanities, including data, code, workflows, computational environments, methods, and documentation. …”

Publishing computational research – a review of infrastructures for reproducible and transparent scholarly communication | Research Integrity and Peer Review | Full Text

Abstract:  Background

The trend toward open science increases the pressure on authors to provide access to the source code and data they used to compute the results reported in their scientific papers. Since sharing materials reproducibly is challenging, several projects have developed solutions to support the release of executable analyses alongside articles.

Methods

We reviewed 11 applications that can assist researchers in adhering to reproducibility principles. The applications were found through a literature search and interactions with the reproducible research community. An application was included in our analysis if it (i) was actively maintained at the time the data for this paper was collected, (ii) supports the publication of executable code and data, (iii) is connected to the scholarly publication process. By investigating the software documentation and published articles, we compared the applications across 19 criteria, such as deployment options and features that support authors in creating and readers in studying executable papers.

Results

From the 11 applications, eight allow publishers to self-host the system for free, whereas three provide paid services. Authors can submit an executable analysis using Jupyter Notebooks or R Markdown documents (10 applications support these formats). All approaches provide features to assist readers in studying the materials, e.g., one-click reproducible results or tools for manipulating the analysis parameters. Six applications allow for modifying materials after publication.

Conclusions

The applications support authors to publish reproducible research predominantly with literate programming. Concerning readers, most applications provide user interfaces to inspect and manipulate the computational analysis. The next step is to investigate the gaps identified in this review, such as the costs publishers have to expect when hosting an application, the consideration of sensitive data, and impacts on the review process.