[2011.07571] Software must be recognised as an important output of scholarly research

Abstract:  Software now lies at the heart of scholarly research. Here we argue that as well as being important from a methodological perspective, software should, in many instances, be recognised as an output of research, equivalent to an academic paper. The article discusses the different roles that software may play in research and highlights the relationship between software and research sustainability and reproducibility. It describes the challenges associated with the processes of citing and reviewing software, which differ from those used for papers. We conclude that whilst software outputs do not necessarily fit comfortably within the current publication model, there is a great deal of positive work underway that is likely to make an impact in addressing this.


Introducing Pew Research Center’s Python libraries | by Patrick van Kessel | Pew Research Center: Decoded | Medium

“Over the past five years, Pew Research Center’s Data Labs team has worked hard to steadily advance the Center’s data science capabilities. From text analysis to computer vision, we’ve applied a variety of computational methods to study important social issues in new ways and expand the scope of what’s possible for the Center. In doing so, we’ve written a lot of code.

In the spirit of our commitment to transparency and our desire to provide methodological resources to the public, we’re excited to release a collection of Python tools that we’ve found ourselves returning to again and again.

If you’ve ever been frustrated by wrangling a bunch of files or cleaning up text documents, we’re hoping these tools will help make your life a little easier. We’ve split this release into two packages on the Center’s GitHub page: one for utilities that can be applied to any programming project, and another with tools that are specifically catered to data processing and analysis….”

Making data open, accessible for researchers and scholars | University of Arizona Libraries

“A new service created by the University of Arizona Libraries is helping researchers and students amplify their individual or cross-departmental work, while taking the our commitment to open to the next level.

ReDATA—a free research data repository that stores and shares datasets produced by University of Arizona researchers—was recently launched by the Libraries’ Office of Innovation of Digital Innovation & Stewardship.

In addition to addressing the growing number of funding agencies and journal publishers that require open access to underlying research data, the team that developed ReDATA identified an opportunity to tackle a strategic gap on campus. …

The service, which aligns with the Libraries’ mission to reduce barriers to accessing and sharing information, also allows researchers to receive credit and track the impact of their work. The platform looks at embedded download and citation counts, as well as altmetrics, which counts all of the mentions tracked for an individual research output. 

Traditional scholarly outputs include journal articles, books, conference proceedings, and monographs. Over the last decade, there has been an increase in expectations from the research community to provide supporting data and software alongside the original publication.

ReDATA accepts and archives all types of data, including spreadsheets, binary files, software and scripts, audiovisual content, and presentations….”

GitHub preserves its open-source software code deep in the arctic for future generations – SiliconANGLE

“GitHub Inc. said today it has delivered a copy of all of the open-source software code stored on its website to a data repository at the Arctic World Archive, which is a very long-term archival facility buried 250 meters deep in the permafrost of an Arctic mountain.

The operation is part of the GitHub Archive Program, which is a project announced last year that aims to preserve today’s open-source software for future generations. To do that, GitHub said, it will store its code in an archive called the GitHub Arctic Code Vault, which it says has been built to last for a thousand years….”

New arXivLabs feature provides instant access to code | arXiv.org blog

“Today, arXivLabs launched a new Code tab, a shortcut linking Machine Learning articles with their associated code. arXivLabs provides a conduit for collaboration that invites community participation while allowing arXiv developers to focus on core services. This Code feature was developed by Papers with Code, a free resource for researchers and practitioners to find and follow the latest Machine Learning papers and code.

When a reader activates the Code tool on the arXiv abstract record page, the author’s implementation of the code will be displayed in the tab, if available, as well as links to any community implementations. This instant access allows researchers to use and build upon the work quickly and easily, increasing code accessibility and accelerating the speed of research….”

bjoern.brembs.blog » How academic institutions neglect their duty

“As the technology for such an infrastructure is available off the shelf and institutions are spending multiple amounts of what would be required on legacy publishers, there remain only social obstacles as to why academic institutions keep neglecting their researchers. Given that institutions have now failed for about 30 years to overcome these obstacles, it is straightforward to propose that mandates and policies be put in place to force institutions (and not researchers!) to change their ways and implement such a basic infrastructure.”

What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers | Fantastic Anachronism

[Some recommendations:]

Ignore citation counts. Given that citations are unrelated to (easily-predictable) replicability, let alone any subtler quality aspects, their use as an evaluative tool should stop immediately.
Open data, enforced by the NSF/NIH. There are problems with privacy but I would be tempted to go as far as possible with this. Open data helps detect fraud. And let’s have everyone share their code, too—anything that makes replication/reproduction easier is a step in the right direction.
Financial incentives for universities and journals to police fraud. It’s not easy to structure this well because on the one hand you want to incentivize them to minimize the frauds published, but on the other hand you want to maximize the frauds being caught. Beware Goodhart’s law!
Why not do away with the journal system altogether? The NSF could run its own centralized, open website; grants would require publication there. Journals are objectively not doing their job as gatekeepers of quality or truth, so what even is a journal? A combination of taxonomy and reputation. The former is better solved by a simple tag system, and the latter is actually misleading. Peer review is unpaid work anyway, it could continue as is. Attach a replication prediction market (with the estimated probability displayed in gargantuan neon-red font right next to the paper title) and you’re golden. Without the crutch of “high ranked journals” maybe we could move to better ways of evaluating scientific output. No more editors refusing to publish replications. You can’t shift the incentives: academics want to publish in “high-impact” journals, and journals want to selectively publish “high-impact” research. So just make it impossible. Plus as a bonus side-effect this would finally sink Elsevier….”

Knowledge Infrastructure and the Role of the University · Commonplace

“As open access to research information grows and publisher business models adapt accordingly, knowledge infrastructure has become the new frontier for advocates of open science. This paper argues that the time has come for universities and other knowledge institutions to assume a larger role in mitigating the risks that arise from ongoing consolidation in research infrastructure, including the privatization of community platforms, commercial control of analytics solutions, and other market-driven trends in scientific and scholarly publishing….

The research community is rightfully celebrating more open access and open data, yet there is growing recognition in the academic community that pay-to-publish open access is not the panacea people were hoping for when it comes to affordable, sustainable scholarly and scientific publishing. Publication is, after all, only one step in a flow of research communication activities that starts with the collection and analysis of research data and ends with assessment of research impact. Open science is the movement towards open methods, data, and software, to enhance reproducibility, fairness, and distributed collaboration in science. The construct covers such diverse elements as the use of open source software, the sharing of data sets, open and transparent peer review processes, open repositories for the long-term storage and availability of both data and articles, as well as the availability of open protocols and methodologies that ensure the reproducibility and overall quality of research. How these trends can be reconciled with the economic interests of the publishing industry as it is currently organized remains to be seen, but the time is ripe for greater multi-stakeholder coordination and institutional investment in building and maintaining a diversified open infrastructure pipeline.”

Viral Science: Masks, Speed Bumps, and Guard Rails: Patterns

“With the world fixated on COVID-19, the WHO has warned that the pandemic response has also been accompanied by an infodemic: overabundance of information, ranging from demonstrably false to accurate. Alas, the infodemic phenomenon has extended to articles in scientific journals, including prestigious medical outlets such as The Lancet and NEJM. The rapid reviews and publication speed for COVID-19 papers has surprised many, including practicing physicians, for whom the guidance is intended….

The Allen Institute for AI (AI2) and Semantic Scholar launched the COVID-19 Open Research Dataset (CORD-19), a growing corpus of papers (currently 130,000 abstracts plus full-text papers being used by multiple research groups) that are related to past and present coronaviruses.

Using this data, AI2, working with the University of Washington, released a tool called SciSight, an AI-powered graph visualization tool enabling quick and intuitive exploration

 of associations between biomedical entities such as proteins, genes, cells, drugs, diseases, and patient characteristics as well as between different research groups working in the field. It helps foster collaborations and discovery as well as reduce redundancy….

The research community and scientific publishers working together need to develop and make accessible open-source software tools to permit the dual-track submission discussed above. Repositories such as Github are a start….”