First open-access data from large collider confirm subatomic particle patterns

In November of 2014, in a first, unexpected move for the field of particle physics, the Compact Muon Solenoid (CMS) experiment — one of the main detectors in the world’s largest particle accelerator, the Large Hadron Collider — released to the public an immense amount of data, through a website called the CERN Open Data Portal.

The data, recorded and processed throughout the year 2010, amounted to about 29 terabytes of information, yielded from 300 million individual collisions of high-energy protons within the CMS detector. The sharing of these data marked the first time any major particle collider experiment had released such an information cache to the general public.

A new study by Jesse Thaler, an associate professor of physics at MIT and a long-time advocate for open access in particle physics, and his colleagues now demonstrates the scientific value of this move. In a paper published in Physical Review Letters, the researchers used the CMS data to reveal, for the first time, a universal feature within jets of subatomic particles, which are produced when high-energy protons collide. Their effort represents the first independent, published analysis of the CMS open data.

“In our field of particle physics, there isn’t the tradition of making data public,” says Thaler. “To actually get data publicly with no other restrictions — that’s unprecedented.”

Part of the reason groups at the Large Hadron Collider and other particle accelerators have kept proprietary hold over their data is the concern that such data could be misinterpreted by people who may not have a complete understanding of the physical detectors and how their various complex properties may influence the data produced.

“The worry was, if you made the data public, then you would have people claiming evidence for new physics when actually it was just a glitch in how the detector was operating,” Thaler says. “I think it was believed that no one could come from the outside and do those corrections properly, and that some rogue analyst could claim existence of something that wasn’t really there.”

“This is a resource that we now have, which is new in our field,” Thaler adds. “I think there was a reluctance to try to dig into it, because it was hard. But our work here shows that we can understand in general how to use this open data, that it has scientific value, and that this can be a stepping stone to future analysis of more exotic possibilities.”

Thaler’s co-authors are Andrew Larkoski of Reed College, Simone Marzani of the State University of New York at Buffalo, and Aashish Tripathee and Wei Xue of MIT’s Center for Theoretical Physics and Laboratory for Nuclear Science.

Seeing fractals in jets

When the CMS collaboration publicly released its data in 2014, Thaler sought to apply new theoretical ideas to analyze the information. His goal was to use novel methods to study jets produced from the high-energy collision of protons.

Protons are essentially accumulations of even smaller subatomic particles called quarks and gluons, which are bound together by interactions known in physics parlance as the strong force. One feature of the strong force that has been known to physicists since the 1970s describes the way in which quarks and gluons repeatedly split and divide in the aftermath of a high-energy collision.

This feature can be used to predict the energy imparted to each particle as it cleaves from a mother quark or gluon. In particular, physicists can use an equation, known as an evolution equation or splitting function, to predict the pattern of particles that spray out from an initial collision, and therefore the overall structure of the jet produced.

“It’s this fractal-like process that describes how jets are formed,” Thaler says. “But when you look at a jet in reality, it’s really messy. How do you go from this messy, chaotic jet you’re seeing to the fundamental governing rule or equation that generated that jet? It’s a universal feature, and yet it has never directly been seen in the jet that’s measured.”

Collider legacy

In 2014, the CMS released a preprocessed form of the detector’s 2010 raw data that contained an exhaustive listing of “particle flow candidates,” or the types of subatomic particles that are most likely to have been released, given the energies measured in the detector after a collision.

The following year, Thaler published a theoretical paper with Larkoski and Marzani, proposing a strategy to more fully understand a complicated jet in a way that revealed the fundamental evolution equation governing its structure.

“This idea had not existed before,” Thaler says. “That you could distill the messiness of the jet into a pattern, and that pattern would match beautifully onto that equation — this is what we found when we applied this method to the CMS data.”

To apply his theoretical idea, Thaler examined 750,000 individual jets that were produced from proton collisions within the CMS open data. He looked to see whether the pattern of particles in those jets matched with what the evolution equation predicted, given the energies released from their respective collisions. 

Taking each collision one by one, his team looked at the most prominent jet produced and used previously developed algorithms to trace back and disentangle the energies emitted as particles cleaved again and again. The primary analysis work was carried out by Tripathee, as part of his MIT bachelor’s thesis, and by Xue.

“We wanted to see how this jet came from smaller pieces,” Thaler says. “The equation is telling you how energy is shared when things split, and we found when you look at a jet and measure how much energy is shared when they split, they’re the same thing.”

The team was able to reveal the splitting function, or evolution equation, by combining information from all 750,000 jets they studied, showing that the equation — a fundamental feature of the strong force — can indeed predict the overall structure of a jet and the energies of particles produced from the collision of two protons.

While this may not generally be a surprise to most physicists, the study represents the first time this equation has been seen so clearly in experimental data. 

“No one doubts this equation, but we were able to expose it in a new way,” Thaler says. “This is a clean verification that things behave the way you’d expect. And it gives us confidence that we can use this kind of open data for future analyses.”

Thaler hopes his and others’ analysis of the CMS open data will spur other large particle physics experiments to release similar information, in part to preserve their legacies.

“Colliders are big endeavors,” Thaler says. “These are unique datasets, and we need to make sure there’s a mechanism to archive that information in order to potentially make discoveries down the line using old data, because our theoretical understanding changes over time. Public access is a stepping stone to making sure this data is available for future use.”

This research was supported, in part, by the MIT Charles E. Reed Faculty Initiatives Fund, the MIT Undergraduate Research Opportunities Program, the U.S. Department of Energy, and the National Science Foundation.

Repository Librarian, CERN

“Are you a skilled repository librarian, with a profound knowledge of scholarly communication, who likes to work in a dynamic environment to ensure open access to scientific results? Then you can apply your skills to improve the user experience by enriching the material made available to the community via platforms such as Inspire and the CERN Document Server (CDS). Our service helps 50,000 scientists, mainly within the field of high-energy physics, worldwide every day to find information across a million of scientific articles, seamlessly populate their scientific profile and explore connections between ideas through our graph of tens of million citations. CERN, Take part! …”

CERN and American Physical Society sign Open Access Agreement for SCOAP³ – Technische Informationsbibliothek (TIB)

“After lengthy negotiations, CERN, the European Organization for Nuclear Research, and the American Physical Society (APS) have now signed an Open Access Agreement for SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics): from January 2018, all articles on high energy physics in the three leading APS journals “Physical Review C”, “Physical Review D” and “Physical Review Letters” will be published Gold Open Access, meaning that such articles will be freely accessible from the very first publication. In 2014 and 2015, APS articles accounted for around 44 per cent of all articles on high energy physics published throughout the world. As a result of the agreement concluded between CERN and APS, the number of articles on high energy physics will virtually double from 2018. This signifies a major success for the SCOAP³ project, which, as a result, includes almost 90 per cent of journal articles in the field of high energy physics.

Thanks to the agreement between CERN and APS, SCOAP³ now includes almost 90 per cent of journal articles in the field of high energy physics. With the involvement of 3,000 libraries and research institutions from 44 countries and the support of eight research promotion organisations, SCOAP³ is the biggest Open Access initiative in the world. Ever since the SCOAP³ repository was launched in 2014, 15,000 articles by around 20,000 academics from 100 countries have been made freely accessible for all to read. The publishing fees for SCOAP³ articles are paid out of a central fund, financed by the participating institutions, meaning that no costs are incurred by the authors themselves….”

SCOAP3 and APS: What You Need to Know

“The APS Board of Directors voted on April 23 to enter into an agreement with the European Organization for Nuclear Research (CERN) to participate in the Sponsoring Consortium for Open Access Publishing in Particle Physics (SCOAP3). Here’s what it means for you as a member, author, and researcher.”

INSPIRE-HEP

“CERN, DESY, Fermilab and SLAC have built the next-generation High Energy Physics (HEP) information system, INSPIRE. It combines the successful SPIRES database content, curated at DESY, Fermilab and SLAC, with the Invenio digital library technology developed at CERN. INSPIRE is run by a collaboration of CERN, DESY, Fermilab, IHEP, and SLAC, and interacts closely with HEP publishers, arXiv.org, NASA-ADS, PDG, HEPDATA and other information resources.

INSPIRE represents a natural evolution of scholarly communication, built on successful community-based information systems, and provides a vision for information management in other fields of science….”

SCOAP3 journals double downloads – SCOAP3

“Elsevier announced that downloads to their two journals, Physics Letters B and Nuclear Physics B have doubled since they became Open Access at the start of SCOAP3 in January 2014. This increase is remarkable as SCOAP3 covers the most recent 3,500 articles in the journals, while most of the historic content of over 77,000 articles, is available to subscribers.

SpringerNature announced that since January 2014 they have observed a doubling of  downloads across their two learned-society journals participating in SCOAP3: European Physical Journal C and the Journal of High Energy Physics.”

Current situation

“A chronological overview of important Dutch open access and open science successes….”

Open Science Librarian: THOR

“The THOR project is financed by the European Commission H2020 program. It focuses on the Technical and Human Infrastructure for Open Research. It started in June 2015 and will run through November 2017. It is a cooperation of CERN, the British Library, ORCID, DateCite, Dryad, EMBL-EBI, PANGAEA, Australian National Data Service (ANDS), PLoS and Elsevier.

THOR builds on the DataCite and ORCID initiatives to uniquely identify scholarly artefacts (beyond articles: such as data and software) and attribute them to researchers through `persistent identifiers’. THOR project partners aim to support Open Science by facilitating, discovery and re-use of scientific artefacts, and deploy enhanced metrics to assess their impact. THOR partners design and deploy services both in general, across the ORCID and DataCite infrastructures, and in partnership with data repositories and emerging publishers’ solutions as well as concrete examples in High-Energy Physics (at CERN), Humanities and Social Sciences, Life Sciences and Geosciences….

The successful candidate will join the team working on Open Science services for the High-Energy Physics community, including the CERN Open Data portal (link is external), INSPIRE (link is external) and HEPData (link is external). In collaboration with all THOR partners, the successful candidate will participate to the design and delivery of services to uniquely identify scholarly artefacts in the field (such as data, but also software) across several platforms through persistent identifiers, and attribute them uniquely to researchers by using the ORCID services. The successful candidate will collaborate with the entire international and multidisciplinary THOR team and contribute to R&D for interoperability solution across scientific communities….”

Tribute to Timbl (Tim Berners-Lee)

From Stevan Harnad’s tribute to Tim Berners-Lee: “Nor can we remind ourselves enough, that although, because of today’s absurd intellectual property and patent laws, Tim’s uniqueness might have been that he became the world’s richest man, he has instead opened his contribution to every one of us, and to all future generations, opening access to the web, world-wide, opening the door to open science, open data, open knowledge, on a scale for which the only analogy in human history is the advent of language itself.”