Deep Learning in Mining Biological Data | SpringerLink

Abstract:  Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures—known as deep learning (DL)—have been successfully applied to solve many complex pattern recognition problems. To investigate how DL—especially its different architectures—has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures’ applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.



Survey on standards for open knowledge exchange now open

“The purpose of this survey is to ascertain the support for such a standard and to identify blockers to implementation. The focus of this work is on interoperability and standards that enable and further open exchange of information, knowledge and data across systems and technologies …”

[2012.13117] Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Abstract:  Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines.


Caltech Open-Sources AI for Solving Partial Differential Equations

“Researchers from Caltech’s DOLCIT group have open-sourced Fourier Neural Operator (FNO), a deep-learning method for solving partial differential equations (PDEs). FNO outperforms other existing deep-learning techniques for solving PDEs and is three orders of magnitude faster than traditional solvers….”

Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure / Ford Foundation

“In this extensive report, published with support from the Ford Foundation in 2016, writer and investor Nadia Eghbal explores the lack of institutional support for public code. She unpacks how the system currently functions and the unique challenges it faces, and provides recommendations for how to address the problem.

As Eghbal outlines, digital infrastructure should be treated as a necessary public good. Free public source code makes it exponentially cheaper and easier for companies to build software, and makes technology more accessible across the globe. However, there is a common misconception that the labor for open source projects is well-funded. In reality, it is largely created and maintained by volunteers who do it to build their reputations, out of a sense of obligation, or simply as a labor of love. The decentralized, non-hierarchical nature of the public coding community makes it difficult to secure pay for coders, yet the work that emerges from it is the foundation for a digital capitalist economy. Increasingly, developers are using shared code without contributing to its maintenance, leaving this infrastructure strained and vulnerable to security breaches….

Eghbal emphasizes that because open source thrives on human rather than financial resources, money alone won’t fix the problem. A nuanced understanding of open source culture, and an approach of stewardship rather than control over digital infrastructure are required. She recommends that efforts to fund and support digital infrastructure embrace decentralization, work with existing software communities, and provide long-term, proactive and holistic support. Increasing awareness of the challenges of sustaining digital infrastructure, making it easier for institutions to contribute time and money, expanding and diversifying the pool of open source contributors, and developing best practices and policies across infrastructure projects will all go a long way in building a healthy and sustainable ecosystem.”


Accessing early scientific findings | Early Evidence Base

“Early Evidence Base (EEB) is an experimental platform that combines artificial intelligence with human curation and expert peer-review to highlight results posted in preprints. EEB is a technology experiment developed by EMBO Press and SourceData.

Preprints provide the scientific community with early access to scientific evidence. For experts, this communication channel is an efficient way to accesss research without delay and thus to accelerate scientific progress. But for non-experts, navigating preprints can be challenging: in absence of peer-review and journal certification, interpreting the data and evaluating the strength of the conclusions is often impossible; finding specific and relevant information in the rapidly accumulating corpus of preprints is becoming increasingly difficult.

The current COVID-19 pandemic has made this tradeoff even more visible. The urgency in understanding and combatting SARS-CoV-2 viral infection has stimulated an unprecedented rate of preprint posting. It has however also revealed the risk resulting from misinterpretation of preliminary results shared in preprint and with amplification or perpetuating prelimature claims by non-experts or the media.

To experiment with ways in which technology and human expertise can be combined to address these issues, EMBO has built the EEB. The platform prioritizes preprints in complementary ways:

Refereed Preprints are preprints that are associated with reviews. EEB prioritizes such preprints and integrates the content of the reviews as well as the authors’ response, when available, to provide rich context and in-depth analyses of the reported research.
To highlight the importance of experimental evidence, EEB automatically highlights and organizes preprints around scientific topics and emergent areas of research.
Finally, EEB provides an automated selection of preprints that are enriched in studies that were peer reviewed, may bridge several areas of research and use a diversity of experimental approaches….”