<< Torna alla home page
L'inserimento della tesi può essere fatto dall'autore stesso, da un responsabile d'esperimento o dalle segreterie scientifiche
Per qualsiasi problema di natura tecnica scrivere a Supporto Web INFN

NEW: E' assolutamente vietato inserire il nome del relatore al posto del tesista per "bypassare" l'inserimento in anagrafica. Le tesi così formate saranno ritenute invalide ed eliminate dal sistema

Il tesista dev'essere presente nel database dell'anagrafica centralizzata (GODiVA)
Cercare il tesista per cognome, cliccando sul pulsante apposito, ed eventualmente inserirlo tramite il link proposto. Sono richieste le informazioni fondamentali di anagrafica e tipo/numero di documento di riconoscimento
Ultimo aggiornamento 20 gen 2018
Matteo Magoni
Sesso M
Esperimento ATLAS
Tipo Laurea Triennale
Destinazione dopo il cons. del titolo Laurea Secondo Livello (estero)
Università SISSA
Titolo Development of a model to predict ATLAS data popularity using machine learning techniques
Abstract The ATLAS experiment at the Large Hadron Collider (LHC), operating at 13 TeV center of mass energy at CERN (Switzerland), is one of the world biggest particle detectors. It collects every year several PetaBytes of data, which are stored in more than 150 computer centres widespread around the world, according to the Grid project. The prompt access to the LHC data is the key of many successful ATLAS achievements, including the discovery of the Higgs boson in 2012. However, difficulties in accessing the different data centres, due to local hardware problems or multiple user accesses, may cause a delay in fulfilling requests and therefore in achieving crucial physics discoveries. This thesis aims at discussing in detail the problematic of data access and to propose an original algorithm based on popularity metrics to dynamically create multiple copies of data on the ATLAS computer centres. This algorithm includes the implementation of a model able to predict the time evolution of the popularity of ATLAS datasets with the aid of machine learning techniques, and to apply such predictions to the replication of popular datasets. In the first chapter, the Worldwide LHC Computing Grid (WLCG), namely the LHC distributed computing infrastructure, and its arrangement in Tiers is presented. Fur- thermore, an explanation of the current algorithm for ATLAS Data Management is given, underlying its main predictive performance and limits. A valid alternative to this algorithm can be implemented by using machine learning techniques, models that can dynamically learn how to improve their performances by means of appropriate training samples. The second chapter deals with the three main kinds of learning, known as unsupervised learning, reinforcement learning and supervised learning. After that, two machine learning algorithms, known as the decision trees and the neural networks, are explained in detail, also presenting some recent theoretical developments. In the third chapter, a particular machine learning algorithm, called the AdaBoost Decision Tree, is used to build a model able to dynamically predict ATLAS data popularity. This result is achieved in three steps which are described in details, showing the outcomes of each of them. Despite being just a first prototype, the developed algorithm provides promising re- sults, already similar to those achieved by the current static algorithm for ATLAS Data Management. It is for sure passible of adjustments and modification to achieve the opti- mal, desired results. In the conclusions, I draw some possible future developments in this direction. This algorithm has been originally developed in collaboration with scientists working in the ATLAS Distributed Computing group, thanks to the financial support offered by Collegio Ghislieri in association with the Stiftung Maximilianeum for my studentship at Ludwig-Maximilians-Universit ╠łat, Mu ╠łnchen (DE).
Anno iscrizione 2013
Data conseguimento 29 set 2017
Luogo conseguimento Universita di Pavia
D. Rebuzzi F. Legger (LMU)  
File PDF
File PS