Constellate

“Learn how to text mine or improve your skills using our self-guided lessons for all experience levels. Each lesson includes video instruction and your own Jupyter notebook — think of it like an executable textbook — ready to run in our Analytics Lab….

Teach text analytics to all skill levels using our library of open education resources, including lessons plans and our suite of Jupyter notebooks. Eliminate setup time by hosting your class in our Analytics Lab….

Create a ready-to-analyze dataset with point-and-click ease from over 30 million documents, including primary and secondary texts relevant to every discipline and perfect for learning text analytics or conducting original research….

Find patterns in your dataset with ready-made visualizations, or conduct more sophisticated text mining in our Analytics Lab using Jupyter notebooks configured for a range of text analytics methods….”

How data sharing is accelerating railway safety research

“Andre?’s dataset was shortlisted for the Mendeley Data FAIRest Datasets Award, which recognizes researchers who make their data available for the research community in a way that exemplifies the FAIR Data Principles – Findable, Accessible, Interoperable, Reusable. The dataset was applauded for a number of reasons, not least the provision of clear steps to reproduce the data. What’s more, the data was clearly catalogued and stored in sub folders, with additional links to Blender and GitHub, making the dataset easily available and reproducible for all….”

Library Leaders Forum 2020: Community : Internet Archive : Free Download, Borrow, and Streaming : Internet Archive

“Video recording from the Library Leaders Forum: Community session. October 13, 2020.

A community of practice has emerged around Controlled Digital Lending, and its utility for libraries and educators has been amply demonstrated during library and school closures due to COVID-19. There are now hundreds of libraries that are participating in Controlled Digital Lending programs and using the library practice to reach their patrons while service is disrupted. In this session you’ll learn from librarians, educators, and technologists who are developing next generation library tools that incorporate and build upon Controlled Digital Lending….”

 

Exploration Engines – the koodos collective

“Serendipitous use of the internet is slowly going extinct as we replace link-hopping with the algorithmic-feed. Ranked results and recommendations have become the dominant mode of exploring information online. In this experiment, we break away from this paradigm, and present Wikigraph – our project for Interhackt. While a “search engine” returns a ranked list of results, Wikigraph returns the most relevant sub-graph of pages. Such an application we term an “exploration engine.”…”

Visualizing Altmetric data with VOSviewer – Altmetric

“Visualizations can make data come alive, uncover new insights and capture the imagination in a way that a spreadsheet never can.

Join Mike Taylor, Data Insights & Customer Analytics at Altmetric, and Fabio Gouveia, Public Health Technologist at Oswaldo Cruz Foundation in Brazil, for a demonstration of the exciting ways in which you can create compelling stories to explain the broader impact of academic work using the free-to-download VOSviewer from CWTS Leiden and data from Altmetric.

This actionable webinar will include an introduction to creating network diagrams with VOSviewer with your own data, extracting data from Altmetric tools and adapting it to be imported….”

citizenscience, Twitter, 11/5/2020 4:27:37 AM, 239488

“The graph represents a network of 3,914 Twitter users whose tweets in the requested range contained “citizenscience”, or who were replied to or mentioned in those tweets. The network was obtained from the NodeXL Graph Server on Thursday, 05 November 2020 at 04:07 UTC.

The requested start date was Thursday, 05 November 2020 at 01:01 UTC and the maximum number of days (going backward) was 14.

The maximum number of tweets collected was 7,500.

The tweets in the network were tweeted over the 13-day, 18-hour, 29-minute period from Thursday, 22 October 2020 at 01:42 UTC to Wednesday, 04 November 2020 at 20:11 UTC.

Additional tweets that were mentioned in this data set were also collected from prior time periods. These tweets may expand the complete time period of the data.

There is an edge for each “replies-to” relationship in a tweet, an edge for each “mentions” relationship in a tweet, and a self-loop edge for each tweet that is not a “replies-to” or “mentions”.

The graph is directed.

The graph’s vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.

The graph was laid out using the Harel-Koren Fast Multiscale layout algorithm….”

The Linked Commons 2.0: What’s New?

This is part of a series of posts introducing the projects built by open source contributors mentored by Creative Commons during Google Summer of Code (GSoC) 2020 and Outreachy. Subham Sahu was one of those contributors and we are grateful for his work on this project.


The CC Catalog data visualization—the Linked Commons 2.0—is a web application which aims to showcase and establish a relationship between the millions of data points of CC-licensed content using graphs. In this blog, I’ll discuss the motivation for this visualization and explore the latest features of the newest edition of the Linked Commons.

Motivation

The number of websites using CC-licensed content is enormous, and snowballing. The CC Catalog collects and stores these millions of data points, and each node (a unit in a data structure) contains information about the URL of the websites and the licenses used. It’s possible to do rigorous data analysis in order to understand fully how these are interconnected and to identify trends, but this would be exclusive to those with a technical background. However, by visualizing the data, it becomes easier to identify broad patterns and trends.

For example, by identifying other websites that are linking to your content, you can try to have a specific outreach program or collaborate with them. In this way out of billions of webpages out there on the web, you can very efficiently focus on the webpages where you are more likely to see an increase in growth.

Latest Features

Let’s look at some of the new features in the Linked Commons 2.0.

  • Filtering based on the node name

The Linked Commons 2.0 allows users to search for their favorite node and then explore all of that node’s neighbors across the thousands present in the database. We have color-coded the links connecting the neighbors to the root node, as well as the neighbors which are connected to the root node differently. This makes it immaculately easy for users to classify the neighbors into two categories.

  • A sleek and revamped design

The Linked Commons 2.0 has a sleek design, with a clean and refreshing look along with both a light and dark theme.

The Linked Commons new design

  • Tools for smooth interaction with the canvas

The Linked Commons 2.0 ships with a few tools that allow the user to zoom in, zoom out, and reset zoom with just one tap. It is especially useful to users who are on touch devices or using a trackpad.

The Linked Commons toolbox

  • Autocomplete feature

The current database of the Linked Commons 2.0 contains around 240 thousand nodes and 4.14 million links. Unfortunately, some of the node names are uncommon and lengthy. To prevent users from the exhausting work of typing complete node names, this version ships with an autocomplete feature: for every keystroke, node names will appear that correspond with what the user might be looking for.

The Linked Commons autocomplete

What’s next for the Linked Commons?

In the current version, there are some nodes which are very densely connected. For example, the node “Wikipedia” has around 89k nodes and 102k links as neighbours. This number is too big for web browsers to render. Therefore, we need to configure a way to reduce this to a more reasonable number.

During the preprocessing, we dropped a lot of the nodes and removed more than 3 million nodes which didn’t have CC license information. In general, the current version shows only those nodes which are soundly linked with other domains and their licenses information is available. However, to provide a more complete picture of the CC Catalog, the Linked Commons needs additional filtering methods and other tools. These potentially include:

  • filtering based on Top-Level domain
  • filtering based on the number of web links associated with a node 

Contributing

We plan to continue working on the Linked Commons. You can follow the project development by visiting our GitHub repo. We encourage you to contribute to the Linked Commons, by reporting bugs, suggesting features or by helping us write code. The new Linked Commons makes it easy for anyone to set up the development environment.

The project consists of a dedicated server which powers the filtering by node name and query autocompletion. The frontend is built using ReactJS, for smooth rendering performance. So, it doesn’t matter whether you’re a frontend developer, a backend developer, or a designer: there is some part of the Linked Commons that you can work on and improve. We look forward to seeing you on board with sparkling ideas!

We are extremely proud and grateful for the work done by Subham Sahu throughout his 2020 Google Summer of Code internship. We look forward to his continued contributions to the Linked Commons as a project core committer in the CC Open Source Community! 

Please consider supporting Creative Commons’ open source work on GitHub Sponsors.

The post The Linked Commons 2.0: What’s New? appeared first on Creative Commons.

COVID?19 and the boundaries of open science and innovation: Lessons of traceability from genomic data sharing and biosecurity: EMBO reports: Vol 0, No 0

“While conventional policies and systems for data sharing and scholarly publishing are being challenged and new Open Science policies are being developed, traceability should be a key function for guaranteeing socially responsible and robust policies. Full access to the available data and the ability to trace it back to its origins assure data quality and processing legitimacy. Moreover, traceability would be important for other agencies and organisations – funding agencies, database managers, institutional review boards and so on – for undertaking systematic reviews, data curation or process oversights. Thus, the term “openness” means much more than just open access to published data but must include all aspects of data generation, analysis and dissemination along with other organisations and agencies than just research groups and publishers. The COVID?19 crisis has highlighted the challenges and shortfalls of the current notions of openness and it should serve as an impetus to further advance towards real Open Science.”