Computational social science: Obstacles and opportunities | Science

“An alternative has been to use proprietary data collected for market research (e.g., Comscore, Nielsen), with methods that are sometimes opaque and a pricing structure that is prohibitive to most researchers.

We believe that this approach is no longer acceptable as the mainstay of CSS, as pragmatic as it might seem in light of the apparent abundance of such data and limited resources available to a research community in its infancy. We have two broad concerns about data availability and access.

First, many companies have been steadily cutting back data that can be pulled from their platforms (5). This is sometimes for good reasons—regulatory mandates (e.g., the European Union General Data Protection Regulation), corporate scandal (Cambridge Analytica and Facebook)—however, a side effect is often to shut down avenues of potentially valuable research. The susceptibility of data availability to arbitrary and unpredictable changes by private actors, whose cooperation with scientists is strictly voluntary, renders this system intrinsically unreliable and potentially biased in the science it produces.

Second, data generated by consumer products and platforms are imperfectly suited for research purposes (6). Users of online platforms and services may be unrepresentative of the general population, and their behavior may be biased in unknown ways. Because the platforms were never designed to answer research questions, the data of greatest relevance may not have been collected (e.g., researchers interested in information diffusion count retweets because that is what is recorded), or may be collected in a way that is confounded by other elements of the system (e.g., inferences about user preferences are confounded by the influence of the company’s ranking and recommendation algorithms). The design, features, data recording, and data access strategy of platforms may change at any time because platform owners are not incentivized to maintain instrumentation consistency for the benefit of research.

For these reasons, research derived from such “found” data is inevitably subject to concerns about its internal and external validity, and platform-based data, in particular, may suffer from rapid depreciation as those platforms change (7). Moreover, the raw data are often unavailable to the research community owing to privacy and intellectual property concerns, or may become unavailable in the future, thereby impeding the reproducibility and replication of results….

Despite the limitations noted above, data collected by private companies are too important, too expensive to collect by any other means, and too pervasive to remain inaccessible to the public and unavailable for publicly funded research (8). Rather than eschewing collaboration with industry, the research community should develop enforceable guidelines around research ethics, transparency, researcher autonomy, and replicability. We anticipate that many approaches will emerge in coming years that will be incentive compatible for involved stakeholders….

Privacy-preserving, shared data infrastructures, designed to support scientific research on societally important challenges, could collect scientifically motivated digital traces from diverse populations in their natural environments, as well as enroll massive panels of individuals to participate in designed experiments in large-scale virtual labs. These infrastructures could be driven by citizen contributions of their data and/or their time to support the public good, or in exchange for explicit compensation. These infrastructures should use state-of-the-art security, with an escalation checklist of security measures depending on the sensitivity of the data. These efforts need to occur at both the university and cross-university levels. Finally, these infrastructures should capture and document the metadata that describe the data collection process and incorporate sound ethical principles for data collection and use….”

Dear Colleague Letter

“We would like to inform you about an upcoming major transition for the Journal of Field Robotics.

After 15 years of service, John Wiley and Sons, the publisher has decided not to renew the contract of the Editor in Chief (Sanjiv Singh) and the Managing Editor (Sanae Minick) and hence our term will expire at the end of 2020.

This comes after two years of discussions between new Wiley representatives and the  Editorial Board have failed to converge to a common set of principles and procedures by which the journal should operate. The Editorial Board has unanimously decided to resign….

While this moment calls for creativity and collaboration with the scholarly community to find new models, Wiley is intent on making broad changes to the way that the Journal of Field Robotics is operated, guided mostly by an economic calculation to increase revenue and decrease costs. To do this, they have unilaterally decided to change the terms of the contract that has been constant since the JFR was started in 2005. Wiley has confronted a similar case (European Law Journal) with similar effect— the entire editorial board has resigned in January of 2020….”

Los Alamos National Laboratory Jobs – Digital Library Infrastructure Engineer (Software Developer 2/3) in Los Alamos, New Mexico, United States

“The Research Library ( https://www.lanl.gov/library/ ) seeks a Digital Library Infrastructure Engineer to help imagine, create, and sustain its digital library infrastructure. We support the Laboratory’s paramount mission to solve national security challenges through scientific excellence by delivering essential knowledge services. The durable value of these services depends on a foundation of effective and efficient software infrastructure, for which this role is instrumental.

This Software Engineer will work on a variety of projects supporting management, curation, discovery, dissemination, and preservation of institutional scientific content. Current initiatives involve upgrading specialized content discovery platforms, re-engineering data pipelines, modernizing core repository services, and adapting Agile software development and DevOps practices to our local context….”

Publishing computational research – a review of infrastructures for reproducible and transparent scholarly communication | Research Integrity and Peer Review | Full Text

Abstract:  Background

The trend toward open science increases the pressure on authors to provide access to the source code and data they used to compute the results reported in their scientific papers. Since sharing materials reproducibly is challenging, several projects have developed solutions to support the release of executable analyses alongside articles.

Methods

We reviewed 11 applications that can assist researchers in adhering to reproducibility principles. The applications were found through a literature search and interactions with the reproducible research community. An application was included in our analysis if it (i) was actively maintained at the time the data for this paper was collected, (ii) supports the publication of executable code and data, (iii) is connected to the scholarly publication process. By investigating the software documentation and published articles, we compared the applications across 19 criteria, such as deployment options and features that support authors in creating and readers in studying executable papers.

Results

From the 11 applications, eight allow publishers to self-host the system for free, whereas three provide paid services. Authors can submit an executable analysis using Jupyter Notebooks or R Markdown documents (10 applications support these formats). All approaches provide features to assist readers in studying the materials, e.g., one-click reproducible results or tools for manipulating the analysis parameters. Six applications allow for modifying materials after publication.

Conclusions

The applications support authors to publish reproducible research predominantly with literate programming. Concerning readers, most applications provide user interfaces to inspect and manipulate the computational analysis. The next step is to investigate the gaps identified in this review, such as the costs publishers have to expect when hosting an application, the consideration of sensitive data, and impacts on the review process.

ACM Signs New Open Access Agreements with Four Leading Universities | MIT Libraries News

“ACM, the Association for Computing Machinery, entered into transformative open access agreements with several of its largest institutional customers, including the University of California (UC), Carnegie Mellon University (CMU), Massachusetts Institute of Technology (MIT), and Iowa State University (ISU). The agreements, which run for three-year terms beginning January 1, 2020, cover both access to and open access publication in ACM’s journals, proceedings and magazines for these universities, and represent the first transformative open access agreements for ACM….”

About ACM’s Decision to Sign Letters Regarding OSTP’s Proposal to Mandate Zero Embargo of Research Articles

“There have been some strong reactions to ACM’s decision to sign on to letters to the White House Office of Science and Technology Policy (OSTP) as a response to a new directive that OSTP is preparing to issue. That directive would eliminate the current 12-month embargo period for opening U.S. federally funded research publications.

ACM both supports and enables open access models and has worked to support a long and growing list of open access initiatives (see https://www.acm.org/publications/openaccess), doing so in a responsible and sustainable way. For the past decade, all ACM authors have had the right to post accepted versions of their articles in pre-print servers, personal websites, funder websites, and institutional repositories with a zero embargo. More recently, for example, ACM has introduced the OpenTOC service that enables free full-text downloads from links on conference websites immediately upon publication.

It is important to understand why ACM opted to sign the letters opposed to the OSTP zero embargo directive. A long dialogue between OSTP and scholarly publishers led to broad agreement on the current policy (from 2013) of a 12-month embargo for digital libraries. However, due process was not followed for the proposed change to zero embargo. The new directive fails to take into account the significant progress that has been made by ACM and other societies with respect to open access publication since 2013 and there was no dialogue with stakeholders prior to proposing the change.”

Free Machine Learning Repository Increases Accessibility in Genome Research | Technology Networks

Although the importance of machine learning methods in genome research has grown steadily in recent years, researchers have often had to resort to using obsolete software. Scientists in clinical research often did not have access to the most recent models. This will change with the new free open access repository: Kipoi.

Kipoi enables an easy exchange of machine learning models in the field of genome research. The repository was created by Julien Gagneur, Assistant Professor of Computational Biology at the TUM, in collaboration with researchers from the University of Cambridge, Stanford University, the European Bioinformatics Institute (EMBL-EBI) and the European Molecular Biology Laboratory (EMBL)….”

arXiv Update – January 2019 – arXiv public wiki – Dashboard

“In 2018, the repository received 140,616 new submissions, a 14% increase from 2017. The subject distribution is evolving as Computer Science represented about 26% of overall submissions, and Math 24%. There were about 228 million downloads from all over the world. arXiv is truly a global resource, with almost 90% of supporting funds coming from sources other than Cornell and 70% of institutional use coming from countries other than the U.S….”

Could This Search Engine Save Your Life? – The Chronicle of Higher Education

One of the Allen Institute’s priorities is an academically oriented search engine, established in 2015, called Semantic Scholar (slogan: “Cut through the clutter”). The need is great, with more than 34,000 peer-reviewed journals publishing 2.5 million articles a year. “What if a cure for an intractable cancer is hidden within the tedious reports on thousands of clinical studies?,” Etzioni once said.

Although Semantic Scholar has focused so far on computer and biomedical sciences, Etzioni says that the engine will soon push into the social sciences and the humanities as well. The Chronicle spoke with him about information overload, impact factors’ imperfect inevitability, and the promise and perils of AI….”

Open Access for Impact: How Michael Nielsen Reached 3.5M Readers – SPARC

“Michael Nielsen recognizes that Open Access is often argued about in the abstract.  To help the discussion move from the conceptual to the concrete, he recently decided to openly share his experience of writing an open-access book, “Neural Networks and Deep Learning” http://neuralnetworksanddeeplearning.com/chap1.html  to illustrate the positive impact and far reach of online publishing….”