Determination of impact parameter in high-energy heavy-ion collisions via deep learning

In this study, Au+Au collisions with the impact parameter of $0 \leq b \leq 12.5$ fm at $\sqrt{s_{NN}} = 200$ GeV are simulated by the AMPT model to provide the preliminary final-state information. After transforming these information into appropriate input data (the energy spectra of final-state charged hadrons), we construct a deep neural network (DNN) and a convolutional neural network (CNN) to connect final-state observables with impact parameters. The results show that both the DNN and CNN can reconstruct the impact parameters with a mean absolute error about $0.4$ fm with CNN behaving slightly better. Then, we test the neural networks for different beam energies and pseudorapidity ranges in this task. It turns out that these two models work well for both low and high energies. But when making test for a larger pseudorapidity window, we observe that the CNN shows higher prediction accuracy than the DNN. With the method of Grad-CAM, we shed light on the `attention' mechanism of the CNN model.


I. INTRODUCTION
As the unique means to generate quark-gluon plasma (QGP) on earth, high-energy heavy-ion collision experiments provide us opportunities to study this kind of extremely hot and dense matter. With more research on the collective behaviour of quarks and gluons, both the deep structure of a nucleus and the state of the universe at a few microsecond after the Big Bang have been brought to light. For heavy-ion collisions, besides the collision energy, the impact parameter (denoted by b) is another crucial quantity which determines the initial geometry of a collision. Numerous quantities have essential correlations with the impact parameter. For example, the elliptic flow of hadrons is very sensitive to the impact parameter [1]. The electromagnetic (EM) fields in heavyion collisions roughly satisfy e|Field| ∝ √ sf (b/R A ) where f (b/R A ) is a function of b/R A with R A the nucleus radius [2,3]. When studying the EM properties of QGP, dilepton production is a significant probe. For lepton pair production, not only the cross section but the azimuthal asymmetry has a strong dependence on impact parameter [4]. The recently observed hyperon spin polarization increases with increasing impact parameter of the collisions [5,6]. However, the impact parameter of a single collision cannot be measured directly in heavy-ion experiments. Usually, it is estimated by particular final-state observables sensitive to it, such as the chargedparticle multiplicity. By introducing the concept of centrality, which is defined as classes classified by b, and comparing experimental data with simulation results by, e.g., the Glauber model, one can determine the rough interval of the impact parameter of an event [7].
Due to the powerful ability to establish a reliable map between the input data and the target value without that much prior knowledge, deep learning (DL) methods are widely used in not only science research, but also our daily life. When applying DL algorithms to face recognition tasks, a machine can identify one's ID with his/her facial information [8]. For * huangxuguang@fudan.edu.cn heavy ion collisions, the impact parameter can be viewed as one of the IDs of an event. Several works have proved the effectiveness of DL methods on impact parameter 'recognition' [9][10][11][12][13][14][15][16][17]. From a simple neural network [9] to a PointNet model [13] and to boosted decision trees [17], with the development of machine learning algorithms, more and more appropriate learning methods have been proposed to improve the performance of 'recognition' and to satisfy experimental requests. However, most of these researches only involve collisions at low or intermediate energies [9][10][11][12][13][14][15][16]. Though a recent work [17] considered LHC energies, the adopted machine learning model is not a deep neural network.
Here, we investigate RHIC Au+Au collisions at √ s N N = 200 GeV and choose final-state charged hadrons as probes. After transforming these particles' momentum information into energy spectra as input data of learning models, we use a deep neural network (DNN) model and a convolutional neural network (CNN) model, respectively, to find a map between the energy spectrum and the impact parameter. Then, we analyze the influences of beam energy and the range of pseudorapidity on the accuracy of predictions in this task. Furthermore, we examine the interpretability of the CNN serving as a regression machine. An 'attention' map of the CNN model is obtained by the Grad-Cam algorithm. All of the collisions are simulated by A Multi-Phase Transport (AMPT) model [18].

II. DEEP LEARNING ALGORITHMS
The discovery of Higgs Boson in 2012 completes the jigsaw of elementary particles according to the current Standard Model. Not long from that, CERN announced the Higgs boson machine learning challenge [19] to the public. The goal of this challenge is to help experimentalists to distinguish the signal of Higgs boson decay from background noise better by machine learning (ML) methods. It turned out that ML was extremely effective in this task.
As a branch of the field of artificial intelligence, ML technology has promoted intelligentization in many areas of industry, such as autonomous driving, smart mobile electronic arXiv:2112.03824v2 [hep-ph] 8 Dec 2021 devices, internet industry, and so on. Due to the same request of dealing with a large amount of data or obtaining information which is hard to fetch by traditional methods, scientists have applied ML techniques to fundamental science research, and physics is no exception. On the whole, the applications of ML in physics can be divided into two categories. One is the replacement of physical models with ML models if the latter ones are more effective for specific problems. By training ML models with a big amount of data, one can construct a mapping between two or more physical quantities. For instance, by training neural networks to imitate wave functions approximately, we can construct a map between the potential and one particle's energy without solving the Schrödinger equation [20], or solve the quantum many-body problem [21]. The other one is to recognize the target signal (such as a physical phenomenon) from the background with noise. For example, deep neural networks can help distinguish Higgs boson or other exotic particles of interest (signal) from other particles (background) [22]. In short, ML algorithms can be used to deal with regression and classification problems in physics. As a branch of ML, DL becomes the most popular AI method in physics research. In the field of relativistic heavy ion collision, DL has been applied to the problems of QCD phase transition [23][24][25], relativistic hydrodynamics [26], study of jet structure [27,28], the search of chiral magnetic effect [29], recognition of initial clustering structure in nuclei [30], etc. Among various DL algorithms, the deep neural network and the convolutional neural network are two common models.
The deep neural network (DNN) is one of the early DL models. Due to its remarkable ability to realize nonlinear mapping with a comparatively simple structure, the DNN is still a preferred tentative model for most regression tasks. A DNN (Fig.1), also known as multilayer perceptron, is composed of an input layer, enough hidden layers to be 'deep', and an output layer. The convolutional neural network (CNN) commonly appears in 2D image-related tasks. A typical CNN consists of the input layer, convolutional layers, pooling layers, fully connected layers, and the output layer. Our CNN architecture is shown in Fig.2.
As a supervised learning regression task, impact parameter determination aims at constructing a mapping between the 25 25 25 + Input 0 (25 × 25) Zero padding (1) Input 1 (26 ×26) Convolutional (CN) layer 1 (64) Kernel (3 ×3) Stride (1) CN layer 2 (64) Average pooling (2 ×2) Fully connected layers Output input observables and a single value, i.e. the impact parameter of an event. Thus, it is appropriate to choose the mean squared error (MSE) as the loss function serving as examining the performance of the learning models. It is defined as whereŷ true i is the true value of impact parameter of an event among a batch of events whose size is N batch , and y pred i is the output of the DNN/CNN model as corresponding prediction value.

III. SIMULATION OF EVENTS AND SELECTION OF OBSERVABLES
We consider Au+Au collisions at √ s N N = 200 GeV and use A Multi-Phase Transport (AMPT) model to perform the simulation. The AMPT model [18] is a hybrid transport model which contains four basic stages: the initial condition, partonic scattering, hadronization, and hadronic interaction. The initial condition is generated by HIJING model [31]. Scatterings among partons are modeled by Zhang's Parton Cascade (ZPC) model [32]. Differing from each other in the process of hadronization, two versions of AMPT model are available.
One is the default model in which partons are recombined with their parent strings and the Lund string fragmentation model is used to turn partons into hadrons. The other one contains a string melting model. It combines partons into hadrons via a quark coalescence model. Finally, the rescatterings of the hadronic matter are performed by A Relativistic Transport (ART) [33]. Here, we choose the one with string melting as the simulator.
The selection of features serving as input information is the first step of a deep learning process. Considering the experimental observability, we choose momenta of the freezed-out charged hadrons with pseudorapidity η satisfying −1 ≤ η ≤ 1 as observables.
As shown in Sec. II, the input data of a DNN model and that of a CNN model have different forms. Here, for the CNN model, the observables above are transformed into 2dimensional energy spectra in (p x , p y ) space. Fig.3 illustrates the case of an event with b = 1.65 fm. The x, y components of momenta are cut by the interval [−1.5, 1.5] and the (p x , p y ) space are cut by a 25 × 25 grid. Every cell in this grid contains the sum of energies of charged particles whose p x and p y are within corresponding intervals. Note that the energy contains not only the momentum but also the mass of the particle. Thus, an energy spectrum carries more physical information than a multiplicity spectrum. Arrange the pixels in a 2D energy spectrum into an 1D chain, the data is generated as an input for the DNN model.
FIG. 3. The energy spectrum in (px, py) space of final-state charged particles of an event with b = 1.65 fm. It is cut by a 25 × 25 grid. Every unit in the grid represents the sum of energies of the enclosed particles. This kind of energy spectrum is used as the input data fed to DL models.

IV. RESULTS AND DISCUSSION
We generate 28,000 events per centrality and split them into two parts: 20,000 events for training and 8,000 events for validation. The interval of impact parameter b ∈ [0, 12.5] fm is divided to 9 centrality classes [34]. Thus, the total training dataset is composed of 180,000 events and the total validation dataset contains 72,000 events. After training the model into an appropriate one, 20,000 events (satisfying a differential distribution ∝ bdb in impact parameter) are fed to the model to test its prediction accuracy.
In Fig.4, we show the errors of the predicted impact parameters comparing to the true values. Both the DNN and CNN models show high prediction accuracy for semicentral and semi-peripheral events. The mean error of the CNN model for events with impact parameter satisfying 2 fm ≤ b ≤ 11 fm ranges from −0.06 fm to 0.05 fm and that of the DNN model ranges from −0.08 fm to 0.08 fm. The mean absolute prediction error is 0.

A. Robustness Over Collision Energies
In heavy ion collisions, the beam energy is another crucial quantity which largely determines the bulk properties of the matter created in an event. Low energy collisions are different from high energy ones in many aspects. With higher beam energy, colliding nuclei generate more partons and finally more varieties of hadrons freeze out. To illustrate this, we generate dataset for events of collision energies √ s N N = 7.7, 39.0, 62.4, 130.0, 200.0 GeV corresponding to the beam energy scan program performed at RHIC. Then we analyze their differences in multiplicity and composition of final state charged particles. As shown in Fig.5, the multiplicity of charged particles increases with the beam energy. In addition, the fraction of protons in charged hadrons is higher at lower collision energy but the fraction of charged pions is not monotonic in collision energy. We thus test whether the differences in multiplicity and composition of produced hadrons at lower collision energies affect the ability of the DL models. Therefore, we train and test the DL models to the cases of √ s N N = 7.7, 39.0, 62.4, 130.0, 200.0 GeV. The results are shown in Fig.6 and Fig.7 for DNN model and CNN model, respectively. The DL models perform well for low and intermediate energy cases as well. Thus we find that the DL models are very robust against different collisions energies.

B. Influence of Pseudorapidity Cut
In most of relativistic heavy ion collision experiments, the greatest attention is focused on the mid-rapidity region, i.e. −1 ≤ η ≤ 1, due to coverage limit of most of the detectors. Actually, the mid-rapidity region just covers a small part of final-state charged particles [35], see Fig.8. Therefore, it is certain that the energy spectrum contains more information if we can observe a larger region in pseudorapidity. In fact, the octagon detector in the PHOBOS experiment can accept charged particles with |η| < 3.2 [36], the recent upgrade of the inner Time Projection Chamber (iTPC) detector at RHIC can extend the rapidity acceptance to −1.5 ≤ η ≤ 1.5 [37]. Thus, we take the regions of −3 ≤ η ≤ −1 and 1 ≤ η ≤ 3 into consideration. Now, we expand the training and detection region to −3 ≤ η ≤ 3. Because the particles with positive pseudorapidity move in the opposite direction relative to those with negative pseudorapidity, we add two extra channels into the input layer of the CNN. Just like inputting a colored picture with 3 channels 'RGB' into the CNN, we choose the region of −3 ≤ η ≤ −1 as the channel 'R', −1 ≤ η ≤ 1 as the channel 'G', and 1 ≤ η ≤ 3 as the channel 'B' (Fig.8). Then, we feed the energy spectra with 3 channels above to the CNN we trained with only data in 'G' region. The CNN shows higher prediction accuracy than the case of 1 channel (Fig.9). Now, the mean absolute prediction error of the CNN for 2 ≤ b ≤ 12.5 fm is 0.35 fm. This value is 0.40 f m in the case of CNN with only 1 input channel. Thus, extending the pseudorapidity window can truly improve the performance of regression.

C. Comparison with Multiplicity Method
As we mentioned before, the impact parameter of a single event cannot be measured directly experimentally. And the situation is the same for geometrical quantities, such as the participant number N part and binary collision number N coll . Instead, one can introduce the quantity 'centrality', which is usually expressed as a percentage of the total nuclear interaction cross section [7] and strongly correlated with the impact parameter b, to estimate an event's b range. In heavy ion collision experiments, usually the multiplicity of final state charged particles (N ch ) is chosen to be the main observable to classify events' centrality. It is assumed that the average mul- tiplicity of charged particles decreases monotonically when the impact parameter increases. With the Glauber model and Monte Carlo method, we can establish the centrality classes for heavy ion collisions [38]. By comparing the chargedparticle multiplicity of an event measured by experiments with the results given by the MC-Glauber model, one can determine this event's centrality and further estimate it's impact parameter.
Here, we propose a scheme to compare the uncertainty of b determined by multiplicity method with the CNN method. Based on the above assumption that the impact parameter is monotonically related to charged-particle multiplicity, if we pick out a batch of events with the same multiplicity (N ch ), their impact parameters are supposed to satisfy a Gaussianlike distribution. We select the standard deviation (σ mult b ) of these impact parameters as the quantity to characterize the uncertainty of mapping N ch to the corresponding impact parameter, which can be taken as the center of the Gaussian-like distribution, i.e. , we can evaluate the uncertainty of impact parameter determination by multiplicity method or by CNN method.
Due to the event-by-event fluctuation in nucleon distribution in the nuclei before the collision and in the complex evolution process after the collision, events with the same impact parameter ought to have different charged-particle multiplicities and different energy spectra. Thus, evaluating the uncertainty of these two methods above is equivalent to measuring the degree of single-valued correspondence between impact parameter and the multiplicity or the energy spectrum. From the results shown in Fig.10, it can be concluded that the charged-particle energy spectrum with CNN as identifier of impact parameter behaves better than the multiplicity method.

V. FETCHING PHYSICAL INFORMATION FROM THE CNN
Deep learning algorithms have shown their strong capability to construct a map between the input data and the target. As a result, we can succeed in various regression or classification tasks without prior knowledge. However, most of the widely used DL models are not interpretable. Due to their 'complexity' and 'dimensionality', it is hard to understand how these models work and fetch instructive information from them [39]. So these DL models are viewed as 'black boxes' in most cases. More and more effort has been made to open the 'black boxes' of DL algorithms. By analyzing the DNNs in the Information Plane [40], one can have an understanding of the training and learning processes and furthermore the internal representations of the DNNs [41]. In addition, for CNNs which are usually used in visual recognition tasks, there has In the CAM method, the class activation map for class c (in our case, c is trivial and can be supressed since we treat a regression instead of a classification problem) is defined as M c : Here, f k (x, y) represents the activation of unit k in the last convolutional layer at spatial location (x, y) and ω c k is the weight measuring the importance of unit k for class c. In Grad-CAM, the ω c k is defined as the result of performing global average pooling to the gradient of the score for class c with respect to the activations f k : where (1/Z) x y means the operation of global average pooling. Then the gradient-weighted class activation map is given as: CAM and Grad-CAM have succeeded in classification problems. However, we need some adjustments in the definitions of the maps in order to apply them to our CNN for regression. When Grad-CAM meets classification tasks, only the regions correlated positively with the class of interest should be preserved. Thus, a ReLU operation is performed on the linear combination of maps. But for regression problems, both the positively-correlated and the negativelycorrelated features ought to be considered. Consequently, we redefine an activation map as the absolute value of the linear combination of maps:  We obtain average 'attention' maps for impact parameters on the interval [2,11] fm where our CNN behaves well in prediction (see Fig.11). It turns out that compared with the cases of central collision, the CNN turns its 'attention' to the regions of larger transverse momentum for peripheral collisions. The CNN gives high marks to charged particles with small transverse momentum when it tries to 'recognize' a central event's impact parameter. With the increase of b, the peripheral area of the energy spectrum begins to attract the CNN's attention and becomes more and more important for distinguishing peripheral events from central ones, which means, the CNN focuses on the particles with larger transverse momentum for peripheral events.

VI. SUMMARY
In this paper, we investigate the feasibility of applying the method of deep learning to determining a single event's impact parameter in relativistic heavy-ion collisions. By constructing proper DNN model and CNN model, we establish a map between the energy spectrum of final-state charged particles and the impact parameter for a single collision event.
Simulated by the AMPT model, 180,000 events of Au + Au collision at √ s N N = 200 GeV are generated for training the DL models. In addition, 72,000 events compose the validation dataset aiming at measuring the performance of the DL models. After selecting the best one from the trained models, 20,000 events are used to test its prediction accuracy. The DNN model and CNN model all show high accuracy in the cases of semi-central and semi-peripheral collisions, i.e. with impact parameter 2 − 10 fm. The mean prediction error of the CNN model for events with 2 ≤ b ≤ 11 fm ranges from -0.06 fm to 0.05 fm and that of the DNN model ranges from -0.08 fm to 0.08 fm. The mean absolute prediction error on the b interval [2, 12.5] fm is 0.40 f m for the CNN model. But these two DL models work worse for central and peripheral collisions.
To investigate the influence of beam energy in this task, we apply these two DL models with the same architectures to the cases of √ s N N = 7.7, 39.0, 62.4, 130.0 as well. The DL models turn out to be effective for both low energy and high energy collisions. By extending the pseudorapidity cut, the performance of the CNN is improved. And now it reconstructs the impact parameter satisfying 2 ≤ b ≤ 12.5 fm with the mean absolute error 0.35 fm. Then, we propose a scheme to compare the uncertainty of b determination by multiplicity method with the CNN method. The charged-particle energy spectrum is proved to be a better observable than the multiplicity to realize this kind of impact parameter determination. By modifying the Grad-CAM algorithm for classification tasks to the one available for regression tasks, we obtain the 'attention' maps for the CNN model. When meeting central collisions, the CNN has a tendency to focus on the particles with small transverse momentum and the situation is opposite for peripheral events.