PHYSICS AND BIG DATA

The increase in the sensitivity and efficiency of physics experiments, combined with the use of more and more technologically advanced electronics, has led to an explosion in the quantity of data collected during experiments in recent decades. If, on the one hand, this aspect naturally advantages the precision of measurements, on the other hand, it requires the adoption of advanced solutions for managing and analysing enormous datasets. At the same time, the “big data” paradigm also influences the design of these experiments and the computational tools required. In particular, the simulation of complex phenomena requires the generation of large quantities of virtual data, so as to be able to deal with experimental results with detailed simulations.

A significant example of the impact of big data in contemporary physics is the LHC particle accelerator at CERN in Geneva, which can be considered, without exaggeration, the largest information factory in the world. When fully operational, the machine creates approximately 25 collisions between protons every 27 billionths of a second, equal to 600 million collisions per second. Through suitable monitoring software, more than 90% of the data produced by these collisions is eliminated as not being of interest. Only a small, scientifically significant part is saved and then studied. However, even only this portion of stored data corresponds to a quantity of information equal to the entire telephone traffic of Europe. To study such a quantity of data, the supercomputers at CERN are not enough, nor the European supercomputing centres. It was actually necessary to structure a global network (worldwide LHC computing grid) composed of 1.4 million computers and 1.5 hexabytes of data storage capacity, spread across 42 countries. INFN is one of the main promoters of the Grid project and houses one of the 11 tier-1 sites of the global network at CNAF in Bologna.

CERN Data Center (© Bennett, Sophia Elizabeth)