“Mariel Borowitz’s new book, Open Space: The Global Effort for Open Access to Environmental Satellite Data traces the history of environmental satellite data sharing policies, offering a model of data-sharing policy development, case studies and practical recommendations for increasing global data sharing. Below, she writes about why some countries have adopted an open data policy, while others have not.”
“Techdirt has been writing about the (slow but steady) rise of open access for a decade. That’s as long as the Annual International Open Access Week has been running. Cambridge University came up with quite a striking way to join in the celebrations: Stephen Hawking’s PhD thesis, ‘Properties of expanding universes’, has been made freely available to anyone, anywhere in the world, after being made accessible via the University of Cambridge’s Open Access repository, Apollo. The 1966 doctoral thesis by the world’s most recognisable scientist is the most requested item in Apollo with the catalogue record alone attracting hundreds of views per month. In just the past few months, the University has received hundreds of requests from readers wishing to download Professor Hawking’s thesis in full. The idea has been quite a hit — literally, since the demand for Hawking’s thesis was so great on Monday, that it hit the Apollo server hard enough to take it offline for a while. The Guardian reported:”
In November of 2014, in a first, unexpected move for the field of particle physics, the Compact Muon Solenoid (CMS) experiment — one of the main detectors in the world’s largest particle accelerator, the Large Hadron Collider — released to the public an immense amount of data, through a website called the CERN Open Data Portal.
The data, recorded and processed throughout the year 2010, amounted to about 29 terabytes of information, yielded from 300 million individual collisions of high-energy protons within the CMS detector. The sharing of these data marked the first time any major particle collider experiment had released such an information cache to the general public.
A new study by Jesse Thaler, an associate professor of physics at MIT and a long-time advocate for open access in particle physics, and his colleagues now demonstrates the scientific value of this move. In a paper published in Physical Review Letters, the researchers used the CMS data to reveal, for the first time, a universal feature within jets of subatomic particles, which are produced when high-energy protons collide. Their effort represents the first independent, published analysis of the CMS open data.
“In our field of particle physics, there isn’t the tradition of making data public,” says Thaler. “To actually get data publicly with no other restrictions — that’s unprecedented.”
Part of the reason groups at the Large Hadron Collider and other particle accelerators have kept proprietary hold over their data is the concern that such data could be misinterpreted by people who may not have a complete understanding of the physical detectors and how their various complex properties may influence the data produced.
“The worry was, if you made the data public, then you would have people claiming evidence for new physics when actually it was just a glitch in how the detector was operating,” Thaler says. “I think it was believed that no one could come from the outside and do those corrections properly, and that some rogue analyst could claim existence of something that wasn’t really there.”
“This is a resource that we now have, which is new in our field,” Thaler adds. “I think there was a reluctance to try to dig into it, because it was hard. But our work here shows that we can understand in general how to use this open data, that it has scientific value, and that this can be a stepping stone to future analysis of more exotic possibilities.”
Thaler’s co-authors are Andrew Larkoski of Reed College, Simone Marzani of the State University of New York at Buffalo, and Aashish Tripathee and Wei Xue of MIT’s Center for Theoretical Physics and Laboratory for Nuclear Science.
Seeing fractals in jets
When the CMS collaboration publicly released its data in 2014, Thaler sought to apply new theoretical ideas to analyze the information. His goal was to use novel methods to study jets produced from the high-energy collision of protons.
Protons are essentially accumulations of even smaller subatomic particles called quarks and gluons, which are bound together by interactions known in physics parlance as the strong force. One feature of the strong force that has been known to physicists since the 1970s describes the way in which quarks and gluons repeatedly split and divide in the aftermath of a high-energy collision.
This feature can be used to predict the energy imparted to each particle as it cleaves from a mother quark or gluon. In particular, physicists can use an equation, known as an evolution equation or splitting function, to predict the pattern of particles that spray out from an initial collision, and therefore the overall structure of the jet produced.
“It’s this fractal-like process that describes how jets are formed,” Thaler says. “But when you look at a jet in reality, it’s really messy. How do you go from this messy, chaotic jet you’re seeing to the fundamental governing rule or equation that generated that jet? It’s a universal feature, and yet it has never directly been seen in the jet that’s measured.”
In 2014, the CMS released a preprocessed form of the detector’s 2010 raw data that contained an exhaustive listing of “particle flow candidates,” or the types of subatomic particles that are most likely to have been released, given the energies measured in the detector after a collision.
The following year, Thaler published a theoretical paper with Larkoski and Marzani, proposing a strategy to more fully understand a complicated jet in a way that revealed the fundamental evolution equation governing its structure.
“This idea had not existed before,” Thaler says. “That you could distill the messiness of the jet into a pattern, and that pattern would match beautifully onto that equation — this is what we found when we applied this method to the CMS data.”
To apply his theoretical idea, Thaler examined 750,000 individual jets that were produced from proton collisions within the CMS open data. He looked to see whether the pattern of particles in those jets matched with what the evolution equation predicted, given the energies released from their respective collisions.
Taking each collision one by one, his team looked at the most prominent jet produced and used previously developed algorithms to trace back and disentangle the energies emitted as particles cleaved again and again. The primary analysis work was carried out by Tripathee, as part of his MIT bachelor’s thesis, and by Xue.
“We wanted to see how this jet came from smaller pieces,” Thaler says. “The equation is telling you how energy is shared when things split, and we found when you look at a jet and measure how much energy is shared when they split, they’re the same thing.”
The team was able to reveal the splitting function, or evolution equation, by combining information from all 750,000 jets they studied, showing that the equation — a fundamental feature of the strong force — can indeed predict the overall structure of a jet and the energies of particles produced from the collision of two protons.
While this may not generally be a surprise to most physicists, the study represents the first time this equation has been seen so clearly in experimental data.
“No one doubts this equation, but we were able to expose it in a new way,” Thaler says. “This is a clean verification that things behave the way you’d expect. And it gives us confidence that we can use this kind of open data for future analyses.”
Thaler hopes his and others’ analysis of the CMS open data will spur other large particle physics experiments to release similar information, in part to preserve their legacies.
“Colliders are big endeavors,” Thaler says. “These are unique datasets, and we need to make sure there’s a mechanism to archive that information in order to potentially make discoveries down the line using old data, because our theoretical understanding changes over time. Public access is a stepping stone to making sure this data is available for future use.”
This research was supported, in part, by the MIT Charles E. Reed Faculty Initiatives Fund, the MIT Undergraduate Research Opportunities Program, the U.S. Department of Energy, and the National Science Foundation.
Abstract: Contemporary scholarly discourse follows many alternative routes in addition to the three-century old tradition of publication in peer-reviewed journals. The field of High- Energy Physics (HEP) has explored alternative communication strategies for decades, initially via the mass mailing of paper copies of preliminary manuscripts, then via the inception of the first online repositories and digital libraries.
This field is uniquely placed to answer recurrent questions raised by the current trends in scholarly communication: is there an advantage for scientists to make their work available through repositories, often in preliminary form? Is there an advantage to publishing in Open Access journals? Do scientists still read journals or do they use digital repositories?
The analysis of citation data demonstrates that free and immediate online dissemination of preprints creates an immense citation advantage in HEP, whereas publication in Open Access journals presents no discernible advantage. In addition, the analysis of clickstreams in the leading digital library of the field shows that HEP scientists seldom read journals, preferring preprints instead.
“The SCOAP3 consortium (Sponsoring Consortium for Open Access Publishing in Particle Physics), which aims to convert journals in high energy physics to open access, has chosen two Springer journals to participate in the initiative. They are the Journal of High Energy Physics, published for the International School for Advanced Studies (SISSA – Trieste, Italy), and the European Physical Journal C, published with Società Italiana di Fisica. The selection is the result of an open and transparent tender process run by CERN for the benefit of SCOAP3, in which journal quality, price and publishing services were taken into account….”
“After lengthy negotiations, CERN, the European Organization for Nuclear Research, and the American Physical Society (APS) have now signed an Open Access Agreement for SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics): from January 2018, all articles on high energy physics in the three leading APS journals “Physical Review C”, “Physical Review D” and “Physical Review Letters” will be published Gold Open Access, meaning that such articles will be freely accessible from the very first publication. In 2014 and 2015, APS articles accounted for around 44 per cent of all articles on high energy physics published throughout the world. As a result of the agreement concluded between CERN and APS, the number of articles on high energy physics will virtually double from 2018. This signifies a major success for the SCOAP³ project, which, as a result, includes almost 90 per cent of journal articles in the field of high energy physics.
Thanks to the agreement between CERN and APS, SCOAP³ now includes almost 90 per cent of journal articles in the field of high energy physics. With the involvement of 3,000 libraries and research institutions from 44 countries and the support of eight research promotion organisations, SCOAP³ is the biggest Open Access initiative in the world. Ever since the SCOAP³ repository was launched in 2014, 15,000 articles by around 20,000 academics from 100 countries have been made freely accessible for all to read. The publishing fees for SCOAP³ articles are paid out of a central fund, financed by the participating institutions, meaning that no costs are incurred by the authors themselves….”
“Observing orbits around a black hole would take a career’s worth of measurements and, frankly, who has the time? It is also a rare benefactor who will fund a couple of decades worth of telescope time. Luckily, telescopes have been collecting data for a while, and some of that happens to include the vicinity of some black holes. Recently, some scientists decided to dig up the data and test general relativity in the vicinity of a supermassive black hole….This silent revolution is spreading to every branch of science, but we are only really scratching the surface of what might be hidden in the vast reams of digitized data. Scientists can now imagine conducting experiments that, a decade ago, might have taken an entire career of observations for one data point. Today, the data may already exist and, most importantly, be accessible. In this respect, the open data movement is probably one of the more important recent developments in science.
In astronomy, the number of eyes pointed at the heavens is increasing. The sensitivity of those eyes is getting better. Once the observations are consistently documented, we will have a treasure trove of data for future generations. We will be able to test our theories of the Universe with exquisite precision….”
“CERN, DESY, Fermilab and SLAC have built the next-generation High Energy Physics (HEP) information system, INSPIRE. It combines the successful SPIRES database content, curated at DESY, Fermilab and SLAC, with the Invenio digital library technology developed at CERN. INSPIRE is run by a collaboration of CERN, DESY, Fermilab, IHEP, and SLAC, and interacts closely with HEP publishers, arXiv.org, NASA-ADS, PDG, HEPDATA and other information resources.
INSPIRE represents a natural evolution of scholarly communication, built on successful community-based information systems, and provides a vision for information management in other fields of science….”
From Google’s English: “Under SCOAP³ from professional journals of high-energy physics research open access are provided. SCOAP³ or Sponsoring Consortium for Open Access Publishing in Particle Physics is an international consortium that has published 13,400 articles Open Access in the first funding period (2014 to 2016). 60% of all downloads were based on two SpringerNature magazines , 28% of the downloads on two Elsevier magazines .
Both publishers have now announced that the number of downloads from these journals has doubled since they joined SCOAP³ on 01.01.2014….”