Exploring Canadiana: A Use Case for Wikidata – Hanging Together

“My colleagues Jean Godby, Karen Smith-Yoshimura, and Bruce Washburn, along with a host of partners, have just released Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage, a fascinating account of their experiences working with a customized instance of Wikibase to create resource descriptions in the form of linked data. In the spirit of their report, I’d like to offer a modest yet illustrative use case showing how access to the relationships and properties of the linked data in another Wikibase environment – Wikidata – smoothed the way for OCLC Research’s recent study of the Canadian presence in the published record.

Maple Leaves: Discovering Canada Through the Published Record is the latest in a series of OCLC Research studies that explore national contributions to the world’s accumulated body of published materials. A national contribution is defined as materials published in, about, and/or by the people of that country. The last category presents a special challenge: how to assemble a list of entities – people and organizations – associated with a particular country from which authors, musicians, film makers, and other creators of published works can be identified?…”

2019:GLAM/Public Domain Awareness Project: enhancing use of CC’s Public Domain tools to serve the needs of GLAM institutions and reusers – Wikimania

“Making assessments about the copyright status of a work remains a challenge notwithstanding the tools that CC has developed over the years, such as the Public Domain Mark and CC0. It is also hard to communicate to end users about the laws that apply to their particular use of a work. Copyright is jurisdiction based, which means each country has their own copyright and public domain rules. These differing laws presents challenges for digitizers of content and reusers of digital online surrogates.

Several efforts and projects offer partial solutions for these challenges; however they tend to serve single jurisdiction or regional needs, are loosely coordinated, and are not integrated into a unified solution that works starting from the moment of digitization and continuing through to the public that encounters them over the Internet. Ideally, the public domain is the easiest part of the knowledge commons to assess and reuse, but the current environment makes it challenging at each stage in the process of getting that content to a public.

Creative Commons and other key stakeholders such as Wikimedia brought forth this Project for initial discussion with our community and stakeholders at the CC 2019 Global Summit in Lisbon. The outcomes of the 4 hours session at the Summit can be found here.

At this session, we expect to be able to follow on some of the data modelling challenges in relationship with the Help:Copyrights page on Wikidata. We want to gather feedback and input from the community that is working in the intersection of GLAM institutions and Wikidata.

Creative Commons will bring some of its legal expertise on copyright and open licensing, and we expect to engage more with the Wikidata community to leverage the different languages and community needs, and better refine our initial project….”

Wikidata:From “an” Identifier to “the” Identifier

Abstract:  Library catalogues may be connected to the linked data cloud through various types of thesauri. For name authority thesauri in particular I would like to suggest a fundamental break with the current distributed linked data paradigm: to make a transition from a multitude of different identifiers to using a single, universal identifier for all relevant named entities, in the form of the Wikidata identifier. Wikidata (https://wikidata.org) seems to be evolving into a major authority hub that is lowering barriers to access the web of data for everyone. Using the Wikidata identifier of notable entities as a common identifier for connecting resources has significant benefits compared to traversing the ever-growing linked data cloud. When the use of Wikidata reaches a critical mass, for some institutions, Wikidata could even serve as an authority control mechanism.

RightsStatements in Wikidata

“We are pleased to report that the volunteer community behind Wikidata – the freely licensed structured database of information, sister to Wikipedia, has recently approved the creation of a dedicated metadata Property for RightsStatements. P6426 to be precise. This will increase the chances that accurate, understandable, and precise rights-labelling information about cultural heritage works will be findable by end-users.

Here Liam Wyatt explains how this change came about, and what it means for cultural heritage organisations around the world who contribute items to Wikidata….”

An International Knowledge Base for all Heritage Institutions (Part 1*) – SocietyByte

Heritage institutions are places in which works of art, historical records, and other objects of cultural or scientific interest are sheltered and made accessible to the public. The equivalent of that in the digital world, is already taking shape, through digitization and sharing of digital-born or digitized objects on online platforms. In this article we shed light on how the issue of structured data about heritage institutions is being tackled by Wikipedia, and its sister Wikidata, through their “Sum of All GLAM” project.[1].

Access to these objects, and information about them, is provided and mediated both through platforms maintained by the heritage sector itself and through more general-purpose platforms, which often serve as a first point of entry for the wider public. These platforms include Google, Facebook, YouTube, and Wikipedia, which also happen to be among the most visited websites on the Web. In this emerging data and platform ecosystem, Wikipedia and related Wikimedia projects play a special role as they are community-driven, non-profit endeavours. Moreover, these projects are working hard to make data and information available in a free, connected and structured manner, for anybody to re-use.

There are various layers of information about heritage institutions, ranging from descriptions of institutions themselves and descriptions of their collections, to descriptions of individual items. There may be digital representations of these items, and in some cases even searchable content within the items. Figure 1 illustrates how the top four layers of data and information are currently addressed in Wikipedia, with Wikidata and Wikimedia Commons increasingly focussing on providing structured and linked data alongside the unstructured or semi-structured encyclopaedic information contained in Wikipedia articles….”

Enriching Bibliographic Data by Combining String Matching and the Wikidata Knowledge Graph to Improve the Measurement of International Research Collaboration

Abstract:  Measuring international research collaboration is necessary when evaluating, for example, the efficacy of policy meant to increase cooperation between countries, but is currently very difficult as bibliographic records contain only affiliation data from which there is no standard method to identify the relevant countries. In this paper we describe a method to address this difficulty, and evaluate it using both general and domain-specific data sets.

Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata

Abstract:  Knowledge workers like researchers, students, journalists, research evaluators or funders need tools to explore what is known, how it was discovered, who made which contributions, and where the scholarly record has gaps. Existing tools and services of this kind are not available as Linked Open Data, but Wikidata is. It has the technology, active contributor base, and content to build a large-scale knowledge graph for scholarship, also known as WikiCite. Scholia visualizes this graph in an exploratory interface with profiles and links to the literature. However, it is just a working prototype. This project aims to “robustify Scholia” with back-end development and testing based on pilot corpora. The main objective at this stage is to attain stability in challenging cases such as server throttling and handling of large or incomplete datasets. Further goals include integrating Scholia with data curation and manuscript writing workflows, serving more languages, generating usage stats, and documentation.