Introducing TLDRs on Semantic Scholar | by Semantic Scholar | AI2 Blog | Nov, 2020 | Medium

“TLDRs (Too Long; Didn’t Read) are super-short summaries of the main objective and results of a scientific paper, generated using expert background knowledge and the latest GPT-3 style NLP techniques. This new feature is now available in beta for nearly 10 million computer science papers and counting in Semantic Scholar.

Staying up to date with scientific literature is a key part of any researchers’ workflow, and parsing a long list of papers from various sources by reading paper abstracts is time-consuming. The new TLDR feature in Semantic Scholar puts single-sentence, automatically-generated paper summaries right on the search results and author pages, allowing you to quickly locate the right papers and spend your time reading what matters to you….”

Building the Mathematical Library of the Future | Quanta Magazine

“Every day, dozens of like-minded mathematicians gather on an online forum called Zulip to build what they believe is the future of their field.

They’re all devotees of a software program called Lean. It’s a “proof assistant” that, in principle, can help mathematicians write proofs. But before Lean can do that, mathematicians themselves have to manually input mathematics into the program, translating thousands of years of accumulated knowledge into a form Lean can understand.

To many of the people involved, the virtues of the effort are nearly self-evident….”

Research Square Launches Beta Testing of Ripeta’s Open Science Assessment Tool – ripeta

“The first 200 authors who opt in  can use this manuscript improvement tool at no cost 

Research Square has launched a beta trial of its new automated Open Science Assessment tool, which can help authors enhance the quality of their research and the robustness of their scientific reporting.

This opt-in tool, powered by Ripeta and currently in the beta testing phase, is available at no cost for authors who upload their preprints  to the Research Square platform….

Ripeta’s natural language processing technology targets several critical elements of a scientific manuscript, including purpose, data and code availability statements, funding statements, and more to gauge the level of responsible reporting in authors’ scientific papers and suggest improvements….”

The Data Nutrition Project

“A “nutrition label” for datasets.

The Data Nutrition Project aims to create a standard label for interrogating datasets for measures that will ultimately drive the creation of better, more inclusive algorithms.

Our current prototype includes a highly-generalizable interactive data diagnostic label that allows for exploring any number of domain-specific aspects in datasets. Similar to a nutrition label on food, our Dataset Nutrition Label aims to highlight the key ingredients in a dataset such as meta-data and populations, as well as unique or anomalous features regarding distributions, missing data, and comparisons to other ‘ground truth’ datasets. We are currently testing our label on several datasets, with an eye towards open sourcing this effort and gathering community feedback.

The design utilizes a ‘modular’ framework that can be leveraged to add or remove areas of investigation based on the domain of the dataset. For example, Dataset Nutrition Labels for data about people may include modules about the representation of race and gender, while Nutrition Labels for data about trees may not require that module.

To learn more, check out our live prototype built on the Dollars for Docs dataset from ProPublica. A first draft of our paper can be found here….”

Golden Raises $14.5M Series A – Crunchbase News

“Golden, a startup aiming to map all human knowledge, raised $14.5 million for its Series A round, the company announced Wednesday….

Golden uses artificial intelligence, machine learning and humans to collect public information on various topics. Most of the information is fragmented, according to CEO Jude Gomila, and Golden compiles that information from sources like news articles and databases.

“The big vision is we’re trying to build a big database of knowledge and this is across everything eventually,” Gomila said, adding that the company is initially focusing on science, technology and companies….”

Opscidia – Free and open access scholarly publishing

“Opscidia is a novel platform for free and Open Access scholarly communication. 

The principle of our platform is to host scientific journals led by an academic editorial committee. Hence, the journal is run by its editorial board while Opscidia provides the software infrastructure, host the journal and assist the communication of the journal free of charge….”

Artificial Intelligence for Data Discovery and Reuse (AIDR) Symposium 2020

“AIDR (Artificial Intelligence for Data Discovery and Reuse) aims to find innovative solutions to accelerate the dissemination and reuse of scientific data in the data revolution. The explosion in the volume of scientific data has made it increasingly challenging to find data scattered across various platforms. At the same time, increasing numbers of new data formats, greater data complexity, lack of consistent data standards across disciplines, metadata or links between data and publications makes it even more challenging to evaluate data quality, reproduce results, and reuse data for new discoveries. Last year, supported by the NSF scientific data reuse initiative, the inaugural AIDR 2019 attracted AI/ML researchers, data professionals, and scientists from biomedicine, technology industry, high performance computing, astronomy, seismology, library and information science, archaeology, and more, to share innovative AI tools, algorithms and applications to make data more discoverable and reusable, and to discuss mutual challenges in data sharing and reuse.

This year, we are following up with a one-day, virtual AIDR Symposium, that provides a place for the community to continue having these conversations and work together to build a healthy data ecosystem. The program will feature invited speakers and panel discussions from a variety of disciplines, including a focused session on COVID-19 data. Audience are highly encouraged to join the conversation by submitting a poster, joining the panel discussions and social hours, chatting on Slack, and participating in collaborative note-taking.”

ripeta – responsible science

“Ripeta is a credit review for scientific publications. Similar to a financial credit report, which reviews the fiscal health of a person, Ripeta assesses the responsible reporting of the scientific paper. The Ripeta suite identifies and extracts the key components of research reporting, thus drastically shortening and improving the publication process; furthermore, Ripeta’s ability to extract data makes these pieces of text easily discoverable for future use….

Researchers: Rapidly check your pre-print manuscripts to improve the transparency of reporting your research.

Publishers: Improve the reproducibility of the articles you publish with an automated tool that helps evidence-based science.

Funders: Evaluate your portfolio by checking your manuscripts for robust scientific reporting.”

Wellcome and Ripeta partner to assess dataset availability in funded research – Digital Science

“Ripeta and Wellcome are pleased to announce a collaborative effort to assess data and code availability in the manuscripts of funded research projects.

The project will analyze papers funded by Wellcome from the year prior to it establishing a dedicated Open Research team (2016) and from the most recent calendar year (2019). It supports Wellcome’s commitment to maximising the availability and re-use of results from its funded research.

Ripeta, a Digital Science portfolio company, aims to make better science easier by identifying and highlighting the important parts of research that should be transparently presented in a manuscript and other materials.

The collaboration will leverage Ripeta’s natural language processing (NLP) technology, which scans articles for reproducibility criteria. For both data availability and code availability, the NLP will produce a binary yes-no response for the presence of availability statements. Those with a “yes” response will then be categorized by the way that data or code are shared….”

[2008.04541] Comprehensiveness of Archives: A Modern AI-enabled Approach to Build Comprehensive Shared Cultural Heritage

Abstract:  Archives play a crucial role in the construction and advancement of society. Humans place a great deal of trust in archives and depend on them to craft public policies and to preserve languages, cultures, self-identity, views and values. Yet, there are certain voices and viewpoints that remain elusive in the current processes deployed in the classification and discoverability of records and archives.

In this paper, we explore the ramifications and effects of centralized, due process archival systems on marginalized communities. There is strong evidence to prove the need for progressive design and technological innovation while in the pursuit of comprehensiveness, equity and justice. Intentionality and comprehensiveness is our greatest opportunity when it comes to improving archival practices and for the advancement and thrive-ability of societies at large today. Intentionality and comprehensiveness is achievable with the support of technology and the Information Age we live in today. Reopening, questioning and/or purposefully including others voices in archival processes is the intention we present in our paper.

We provide examples of marginalized communities who continue to lead “community archive” movements in efforts to reclaim and protect their cultural identity, knowledge, views and futures. In conclusion, we offer design and AI-dominant technological considerations worth further investigation in efforts to bridge systemic gaps and build robust archival processes.