Linked Research on the Decentralised Web

Abstract:  This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving.

The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems.

From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments.

Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers.

The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data.

Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information?. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud).

Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication.

Reaping the benefits of Open Data in public health

Abstract:  Open Data is part of a broad global movement that is not only advancing science and scientific communication but also transforming modern society and how decisions are made. What began with a call for Open Science and the rise of online journals has extended to Open Data, based on the premise that if reports on data are open, then the generated or supporting data should be open as well. There have been a number of advances in Open Data over the last decade, spearheaded largely by governments. A real benefit of Open Data is not simply that single databases can be used more widely; it is that these data can also be leveraged, shared and combined with other data. Open Data facilitates scientific collaboration, enriches research and advances analytical capacity to inform decisions. In the human and environmental health realms, for example, the ability to access and combine diverse data can advance early signal detection, improve analysis and evaluation, inform program and policy development, increase capacity for public participation, enable transparency and improve accountability. However, challenges remain. Enormous resources are needed to make the technological shift to open and interoperable databases accessible with common protocols and terminology. Amongst data generators and users, this shift also involves a cultural change: from regarding databases as restricted intellectual property, to considering data as a common good. There is a need to address legal and ethical considerations in making this shift. Finally, along with efforts to modify infrastructure and address the cultural, legal and ethical issues, it is important to share the information equitably and effectively. While there is great potential of the open, timely, equitable and straightforward sharing of data, fully realizing the myriad of benefits of Open Data will depend on how effectively these challenges are addressed.

Pubfair – A Framework for Sustainable, Distributed, Open Science Publishing Services

“This white paper provides the rationale and describes the high level architecture for an innovative publishing framework that positions publishing functionalities on top of the content managed by a distributed network of repositories. The framework is inspired by the vision and use cases outlined in the COAR Next Generation Repositories work, first published in November 2017 and further articulated in a funding proposal developed by a number of European partners.

By publishing this on Comments Press, we are seeking community feedback about the Pubfair framework in order to refine the functionalities and architecture, as well as to gauge community interest….

The idea of Pubfair is not to create another new system that competes with many others, but rather to leverage, improve and add value to existing institutional and funder investments in research infrastructures (in particular open repositories and open journal platforms). Pubfair positions repositories (and the content managed by repositories) as the foundation for a distributed, globally networked infrastructure for scholarly communication. It moves our thinking beyond the artificial distinction between green and gold open access by combining the strengths of open repositories with easy-to-use review and publishing tools for a multitude of research outputs….”

Pubfair – A Framework for Sustainable, Distributed, Open Science Publishing Services

“This white paper provides the rationale and describes the high level architecture for an innovative publishing framework that positions publishing functionalities on top of the content managed by a distributed network of repositories. The framework is inspired by the vision and use cases outlined in the COAR Next Generation Repositories work, first published in November 2017 and further articulated in a funding proposal developed by a number of European partners.

By publishing this on Comments Press, we are seeking community feedback about the Pubfair framework in order to refine the functionalities and architecture, as well as to gauge community interest….

The idea of Pubfair is not to create another new system that competes with many others, but rather to leverage, improve and add value to existing institutional and funder investments in research infrastructures (in particular open repositories and open journal platforms). Pubfair positions repositories (and the content managed by repositories) as the foundation for a distributed, globally networked infrastructure for scholarly communication. It moves our thinking beyond the artificial distinction between green and gold open access by combining the strengths of open repositories with easy-to-use review and publishing tools for a multitude of research outputs….”

Pubfair – A Framework for Sustainable, Distributed, Open Science Publishing Services

“This white paper provides the rationale and describes the high level architecture for an innovative publishing framework that positions publishing functionalities on top of the content managed by a distributed network of repositories. The framework is inspired by the vision and use cases outlined in the COAR Next Generation Repositories work, first published in November 2017 and further articulated in a funding proposal developed by a number of European partners.

By publishing this on Comments Press, we are seeking community feedback about the Pubfair framework in order to refine the functionalities and architecture, as well as to gauge community interest….

The idea of Pubfair is not to create another new system that competes with many others, but rather to leverage, improve and add value to existing institutional and funder investments in research infrastructures (in particular open repositories and open journal platforms). Pubfair positions repositories (and the content managed by repositories) as the foundation for a distributed, globally networked infrastructure for scholarly communication. It moves our thinking beyond the artificial distinction between green and gold open access by combining the strengths of open repositories with easy-to-use review and publishing tools for a multitude of research outputs….”

Workflow systems turn raw data into scientific knowledge

“Finn is head of the sequence-families team at the European Bioinformatics Institute (EBI) in Hinxton, UK; Meyer is a computer scientist at Argonne National Laboratory in Lemont, Illinois. Both run facilities that let researchers perform a computationally intensive process called metagenomic analysis, which allows microbial communities to be reconstructed from shards of DNA. It would be helpful, they realized, if they could try each other’s code. The problem was that their analytical ‘pipelines’ — the carefully choreographed computational steps required to turn raw data into scientific knowledge — were written in different languages. Meyer’s team was using an in-house system called AWE, whereas Finn was working with nearly 9,500 lines of Python code.

“It was a horrible Python code base,” says Finn — complicated, and difficult to maintain. “Bits had been bolted on in an ad hoc fashion over seven years by at least four different developers.” And it was “heavily tied to the compute infrastructure”, he says, meaning it was written for specific computational resources and a particular way of organizing files, and thus essentially unusable outside the EBI. Because the EBI wasn’t using AWE, the reverse was also true. Then Finn and Meyer learnt about the Common Workflow Language (CWL).

CWL is a way of describing analytical pipelines and computational tools — one of more than 250 systems now available, including such popular options as Snakemake, Nextflow and Galaxy. Although they speak different languages and support different features, these systems have a common aim: to make computational methods reproducible, portable, maintainable and shareable. CWL is essentially an exchange language that researchers can use to share pipelines for whichever system. For Finn, that language brought sanity to his codebase, reducing it by around 73%. Importantly, it has made it easier to test, execute and share new methods, and to run them on the cloud….”

Crossing the Borders: Re-Use of Smart Learning Objects in Advanced Content Access Systems

Abstract:  Researchers in many disciplines are developing novel interactive smart learning objects like exercises and visualizations. Meanwhile, Learning Management Systems (LMS) and eTextbook systems are also becoming more sophisticated in their ability to use standard protocols to make use of third party smart learning objects. But at this time, educational tool developers do not always make best use of the interoperability standards and need exemplars to guide and motivate their development efforts. In this paper we present a case study where the two large educational ecosystems use the Learning Tools Interoperability (LTI) standard to allow cross-sharing of their educational materials. At the end of our development process, Virginia Tech’s OpenDSA eTextbook system became able to import materials from Aalto University’s ACOS smart learning content server, such as python programming exercises and Parsons problems. Meanwhile, University of Pittsburgh’s Mastery Grids (which already uses the ACOS exercises) was made to support CodeWorkout programming exercises (a system already used within OpenDSA). Thus, four major projects in CS Education became inter-operable. 

African Principles for Open Access in Scholarly Communication – AfricArXiv

“1) Academic Research and knowledge from and about Africa should be freely available to all who wish to access, use or reuse it while at the same time being protected from misuse and misappropriation.

2) African scientists and scientists working on African topics and/or territory will make their research achievements including underlying datasets available in a digital Open Access repository or journal and an explicit Open Access license is applied.

3) African research output should be made available in the principle common language of the global science community as well as in one or more local African languages – at least in summary.

4) It is important to take into consideration in the discussions indigenous and traditional knowledge in its various forms.

5) It is necessary to respect the diverse dynamics of knowledge generation and circulation by discipline and geographical area.

6) It is necessary to recognise, respect and acknowledge the regional diversity of African scientific journals, institutional repositories and academic systems.

7) African Open Access policies and initiatives promote Open Scholarship, Open Source and Open Standards for interoperability purposes.

8) Multi-stakeholder mechanisms for collaboration and cooperation should be established to ensure equal participation across the African continent.

9) Economic investment in Open Access is consistent with its benefit to societies on the African continent – therefore institutions and governments in Africa provide the enabling environment, infrastructure and capacity building required to support Open Access

10) African Open Access stakeholders and actors keep up close dialogues with representatives from all world regions, namely Europe, the Americas, Asia, and Oceania….”