A neural network clustering algorithm for the ATLAS silicon pixel detector

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton--proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.


Introduction
Track and vertex finding is one of the most challenging tasks in reconstructing events from protonproton collisions recorded by the ATLAS detector [1] at the Large Hadron Collider (LHC) at CERN.ATLAS is a general-purpose detector designed to make precision measurements of known physics processes and to probe new physics at the energy frontier of the LHC.At the centre of the detector is an optimised, multi-technology tracking detector embedded in a 2 TeV axial magnetic field produced by a solenoid.This central tracking detector is designed to enable the reconstruction of trajectories (also called tracks) from charged particles and is called the inner detector (ID).It has a silicon pixel detector at the innermost radii, surrounded by a silicon microstrip detector and a straw-tube detector called the transition radiation tracker (TRT) that also identifies particle types using transition radiation.The ID provides full φ coverage for particles produced within |η| < 2.5 1  (|η| < 2.0 in the case of the TRT) in the LHC beam interaction region.The ID is surrounded by electromagnetic and hadronic calorimeters.The outermost detector is the Muon Spectrometer (MS) which is a stand-alone tracking detector with a toroidal magnetic field.Trajectories in the MS are matched to ID trajectories to reconstruct combined muon candidates [2].
Precisely measuring the trajectories of charged particles is fundamental to many analyses.Tracks are used to reconstruct primary and secondary vertices.Primary vertices are found within 1 The ATLAS reference system is a Cartesian right-handed coordinate system, with the nominal collision point at the origin and the z-axis along the beam direction.The positive x-axis is defined as pointing from the collision point to the centre of the LHC ring and the positive y-axis points upwards.The azimuthal angle, φ , is measured around the beam axis, and the polar angle, θ , is measured with respect to the z-axis.The pseudorapidity is defined as η = − ln (tan (θ /2)).
the beam interaction region and provide information about the number and positions of the individual proton-proton collisions.Secondary vertices are found outside the beam interaction region and are used to identify decays of heavy-flavour and long-lived particles.
Track reconstruction algorithms use local and global pattern recognition algorithms to identify detector measurements that were produced by a charged particle.A track fitting procedure is used to estimate the compatibility of measurements with the track hypothesis under the assumptions of a track model.The term "measurement" is used to describe clusters from the pixel detector and the SCT, and drift circles from the TRT when assigned to a track.The formation of clusters from individual hits is explained in section 2. The ID track reconstruction consists of several sequences with different strategies as described in ref. [3].The main sequence is referred to as inside-out track finding, which starts from track seeds formed through combinatorial grouping of clusters in nearby layers.Track candidates are obtained by performing a search with a combinatorial filter [4] for additional measurements within a road following the direction of the seed.These track candidates must satisfy rather loose requirements on the number of associated measurements, the number of missing measurements and the quality of the fit.At this stage, multiple track candidates can be found for a single particle and each measurement can be associated with several track candidates.Such multiply-assigned measurements will be referred to as shared.Track candidates are then examined by an ambiguity processor that runs on this ensemble of tracks and aims to resolve ambiguities.This suppresses fake tracks built from random measurement combinations and remove track duplicates.These occur when multiple track candidates with almost identical measurement content pass the ambiguity processing.A scoring schema that analyses the measurements assigned to the track, in particular the shared measurements and holes, is used to rank the tracks.The fit quality of the individual track candidates is taken into account when the tracks are ranked according to their achieved score.Holes are defined as active detector modules where a measurement would be expected by the track prediction but none was assigned.
The requirement that a track candidate has a low fraction of shared measurements is a very strong handle to suppress duplicate and fake tracks.However, in signatures with many collimated particles, such as the core of very energetic jets or the decay of boosted τ leptons into multiple hadrons, shared measurements can occur because of cluster merging, when the detector granularity is insufficient to resolve close-by particles.Therefore tracks passing the ambiguity-resolving stage are allowed to have a small fraction of shared measurements2 to avoid compromising the track reconstruction efficiency.Shared measurements typically provide a worse position estimate and hence degrade the quality of the reconstructed track.A correct modelling of shared measurements in Monte Carlo simulation is important for the control of systematic uncertainties on the track reconstruction efficiency.
In this paper, a novel technique to identify and split merged clusters in the ATLAS pixel detector by using a set of artificial neural networks (NN), or NN clustering, is introduced.This leads to a reduction of approximately a factor of three in the number of measurements shared between tracks.It also significantly improves the precision of the position estimate of the clusters and the track performance as will be discussed in section 4.
The data used in this document were collected by the ATLAS experiment in 2011 with the For primary charged particles, originating from beam collisions and with p T > 400 MeV, clusters vary in size between 1.4 and 3 pixels in the transverse direction and between 1 and approximately 3.5 pixels in the longitudinal direction.Illustrations of different scenarios of charged particles traversing a silicon sensor are shown in figure 2. In figure 2 (a), a single charged particle traverses the silicon sensor with incident angle, φ , and produces a cluster with a width of two pixels.The charge drifts due to the electric and magnetic field in the direction of the Lorentz angle θ L . Figure 2 (b) illustrates a merged cluster produced by two close-by particles that both deposit charge in the second pixel.The fourth pixel is not read-out as the charge is below the read-out threshold.In figure 2 (c), a large cluster created from single charged particle is shown.In this case, the cluster is large due to the emission of a δ -ray.δ -rays can also change the amount of collected charge without changing the size of the cluster.In the case of the emission of a δ -ray the cluster position is biased with respect to the true intersection point of the track.
A connected component analysis (CCA) [7] is used to find groups of neighbouring pixels.This is referred to as eight-cell connectivity because two pixels having a one common corner are considered neighbouring and a single pixel is surrounded by, at most, eight cells.The group of connected pixels is referred to as cluster and information from all pixels is used to estimate the position using a linear approximation.
Measurements are refined during the fitting of the tracks by using the incident angle and predicted position of incidence on the pixel module of the track associated with the cluster.Such calibrations include a correction for module distortions [8] that cause silicon modules to not be perfectly planar as assumed in the detector description used for Monte Carlo simulation and event reconstruction.
Tracks including all components of the ID are designed to provide a transverse impact parameter resolution of σ d 0 = 140 [µm GeV]/p T [GeV] ⊕ 10 µm and a longitudinal impact parameter resolution of σ z 0 sin θ = 209 [µm GeV]/p T [GeV] ⊕ 91 µm [1].The symbol ⊕ denotes the sum in quadrature of the two components. 3he charge collected in each of the pixels of the cluster is used to refine the estimate of where The fourth pixel is not read-out as the charge is below the read-out threshold.(c) a large cluster created from a single charged particle due to a δ -ray produced in the silicon the particle intersects the module using a charge interpolation technique, which significantly improves the resolution compared to the purely geometrical limit determined by the pixel size.This technique starts from the geometrical centre of the cluster, defined as (x centre , y centre ) in the local reference frame of the sensor surface.The charge measurements in the first and last rows and columns of the cluster are used to perform the interpolation.The particle intersection is estimated by using the following equations: (2.1) where Ω x(y) is defined as and q denotes the charge collected in a given pixel.The parameter ∆ x (∆ y ) is a function of the projected incident angle with respect to the sensor surface, φ (θ ), and the number of pixels within the clusters in the x (y) direction, N row (N col ).These parameters are extracted from either Monte Carlo simulations or data [9].The charge interpolation technique was the default ATLAS clustering approach until mid-2011 and will be referred to as CCA clustering.
In dense environments, such as in the cores of energetic jets, there is an increased risk that pixels traversed by different particles are merged by the CCA, or even that charge induced by -5 -several particles is collected in a single pixel.This problem occurs more and more often as the spatial separation between the particles on the sensor plane approaches the pixel size.This cluster merging is depicted in figure 3, which illustrates an event in which the charge induced by three particles is reconstructed as a single cluster.
Figure 4 shows the average separation in the transverse ( δ min x ) and longitudinal ( δ min y ) direction of the two closest stable charged particles in jets, at the radius of the innermost pixel layer in the barrel.Only track pairs separated by less than a pixel in the longitudinal (transverse) direction are shown here for the transverse (longitudinal) direction.A sample of simulated dijet events based on the PYTHIA [10] Monte Carlo generator with the leading jet p T greater than 800 GeV was used.Jets were reconstructed from stable generator-level particles using an anti-k t jet algorithm [11] with a cone size of 0.4.The figure illustrates that shared measurements appear already in jets with relatively moderate momentum as cluster merging starts before the pixel size is reached.In the worst case, when cluster merging appears in pixel layers beyond the innermost and the number of shared measurements on a track exceeds the given threshold, the track candidate is completely disregarded to avoid the creation of duplicate tracks.This leads to an inefficiency in finding both tracks.The limit where two close-by tracks can still be reconstructed separately is often referred to as double-track resolution.With the CCA clustering, no attempt is made to identify or split these merged clusters.

Pixel cluster splitting
If merged clusters from several charged particles are split into sub-clusters for each particle they can appear as individual measurements on tracks.This improves the double-track resolution and reduces the number of measurements shared between tracks.The identification of such merged clusters without performing any splitting can already improve track quality because this allows for a dedicated treatment in the measurement calibration or ambiguity-resolving process (see section 3.3).
Both the cluster size and the charge collection pattern of the cluster can be exploited when attempting to split the cluster.Assumptions about the particle origin and direction are required to determine the predicted cluster shape.

A neural network for cluster splitting
Artificial neural networks are powerful tools for solving complex pattern recognition problems characterised by significant non-linearities.The increasing CPU power available for event reconstruction in high-energy physics makes them attractive for problems with many degrees of freedom.A novel approach to clustering based on artificial neural networks is presented.A single NN is used to estimate the probability that a cluster was created by one or many particles and to split the cluster when possible.Two sets of NNs are used to estimate cluster positions and uncertainties, containing three and six NNs respectively.This approach allows the NN clustering algorithm to also improve the cluster position estimation and hence the resolution of the track parameters.
Clusters are first created by running the CCA algorithm.The NN clustering algorithm then runs over all the resulting clusters and obtains an estimate of: 1. the number N of charged particles traversing the cluster; 2. the two-dimensional P i (with i = 1, N) of the positions and corresponding uncertainties of the impact points of the particles on the pixel sensor.
Only clusters which are contained in a rectangular matrix of 7×7 pixels are included.If any pixel is further than three pixels from the centroid, the cluster is not processed by the NN clustering algorithm.There are only less than 1% of such clusters and they most probably are due to particles not originating from the beam collisions.In data there is an increased rate of large clusters from beam backgrounds and low-momentum particles.The latter is because these are below the threshold in the simulation.Neglecting big clusters in the NN clustering algorithm does not influence the high p T tracking performance as these clusters originate from the charged particles of p T <0.4 GeV which are below the track reconstruction threshold.No attempt is made to assign single pixels of the original cluster to the split cluster.This is because the pixel content of a cluster itself has no influence on the track reconstruction after it has been exploited to determine the cluster position and error.
In the first pass of the neural network clustering algorithm, particles are assumed to have been produced at the centre of the beam interaction region, which is estimated using the primary vertices of collisions from a sub-sample of events analysed during a rapid first pass of the reconstruction algorithms.The incident angle of the particles is calculated using a straight line from the centre of the beam interaction region to the cluster position.The uncertainty on the incident angle in the longitudinal direction can be large.Clusters in pixel modules at small radii and covering a region of small |η| are affected more than clusters in modules at high radii or at high |η|.This is exploited by the neural network in order to reduce such uncertainty.Once track candidates have been identified, the algorithm is rerun to benefit from the significantly refined estimate of the incident angle.Different neural networks are trained and used in the two cases.No additional attempt to split clusters is made at this stage, because the initial pattern recognition stage of the track reconstruction is only run once, which means that this additional information cannot be used to identify further track candidates.
A multi-layer perceptron, which is a feed-forward artificial neural network, is used.This is equivalent to a set of functions with input variables (nodes) x k (k = 0, N inputs ) and output values (nodes) F i .It has two intermediate hidden layers and can be denoted by: where: ω (0,1,2) and χ (0,1,2) correspond to the weight and threshold parameters which are adjusted during the training process, with the upper index denoting the layer number; g(x) and h(x) are the activation functions for computing the values of the nodes in the intermediate and output layers.
Here g(x) is chosen to be: (3. 2) The choice of h(x) depends on the network.For the estimation of the number of particles traversing a cluster, h(x) takes the same form as g(x), which confines the output nodes to be between 0 and 1, while for the estimation of the cluster position, linear functions are chosen (h(x) = x).All networks are trained using the JetNet package [12].This package minimises the sum of the mean squares of the errors of the network output nodes with respect to their target values.The sum is taken over all patterns in the training set.

Number of particles per cluster
The number, N, of particles in a cluster is estimated using a neural network dedicated to classification.It is assumed that a cluster may be split into parts stemming from at most three particles, thus the number of output nodes is chosen to be equal to three and the target is chosen pattern by pattern to be {100}, {010} or {001} for the cases of one, two or three traversing particles, respectively.This ensures that the output nodes F i can be interpreted as the posterior Bayesian probabilities for a pattern with inputs x to be of the type i, provided the training converged to the global minimum.A higher number of particles per cluster is not considered by the NN clustering algorithm.The Bayesian probabilistic interpretation relies on both the description of observed cluster properties and the composition of the training sample.The posterior probability of having either two or three particles traversing the cluster is referred to as P shared .
In the simplest approach, the candidate cluster properties would be condensed into a few input variables to the NN, which would then be trained to distinguish between the different output types.Instead, a more aggressive approach is followed here by feeding the neural network input nodes x k with the full available information: • a fixed-size 7×7 matrix of the charge collected in each pixel of the candidate cluster, • a fixed-size vector of seven elements with the longitudinal size of the pixels in the 7×7 matrix to label the long pixels, • layer number (from 0 to 2) and layer type (barrel or endcap), • the angles of incidence φ and θ of the candidate charged particle with respect to the sensor surface, • η of the pixel module (only if no track candidate is available yet): In total, this corresponds to 60 (or 61) input nodes.If a pixel in the charge matrix has charge below the readout threshold, the matrix entry is set to zero.The matrix is centred at the position of the pixel containing the charge-weighted centroid of the cluster.
Clusters with P shared > 0.5 are split, while clusters with P shared > 0.02 are labelled as merged clusters.The optimisation of the cut to split clusters is discussed in section 4.
Cluster position and error estimate The cluster position is estimated using a second set of neural networks configured for interpolation.In this case, neural networks are used to obtain a function for which the outputs are as close as possible to one or more continuous target values by exploiting the dependence of such targets on the input variables.Three different neural networks are used depending on the number of particles N (from 1 to 3) traversing the candidate cluster and the corresponding training sets are selected based on the true number of particles in the simulation.All networks use the same input variables as the classification network, while the 2 × N output nodes are trained to represent an estimate of the two-dimensional position of the N particle(s) impact point(s) P i .For each pattern, the targets in the training are set to the desired continuous values.
A third set of neural networks is used to estimate the probability density function (pdf) describing the residual of the estimated point of impact, ∆ P = P − P true .This is the difference between the reconstructed, P, and true, P true points of impact.
This pdf therefore depends on all cluster properties encoded in the input nodes.The impact point position(s) relative to the cluster centroid are used as additional inputs for training.Different neural networks are used to estimate the uncertainties in the transverse (longitudinal), NN err x (NN err y ) directions.Different neural networks are trained depending on the number of traversing particles N (from 1 to 3).This leads to six neural networks in total.The possible continuous outcomes of ∆P x (∆P y ) are subdivided into intervals and one output node is mapped to each interval, reducing the problem of estimating the probability density function to a classification problem.If a training pattern obtains a deviation ∆P corresponding to interval j in the training, then the target values of the output nodes will be {0...010...0}, with the value of one filling the j-th position.This is extended to multiple particles by considering N sets of intervals, each set corresponding to one of the N particles, resulting in N × N bins output nodes.As a consequence, for each pattern, N of the output nodes are set to one.Therefore the output nodes yield the posterior Bayesian probability that, given the observed cluster, the estimated position results in a deviation from the true position within the interval represented by that specific bin. 4his set of neural networks yields a cluster-by-cluster estimate of the binned pdf for the deviation of the estimated position(s) from the true one(s).In the track reconstruction, cluster measurements are assumed to follow a Gaussian distribution and therefore only a single value for the position uncertainty is needed.This is derived for each traversing particle as the square root of the second moment of the distribution.In the future, for specific applications such as b-tagging or τ reconstruction, the full error pdf could be used as input to more complex track fitting approaches such as the Gaussian Sum Filter [13].Such approaches allow the assumption of Gaussian-distributed measurements to be dropped.

Neural network training
Ten neural networks are needed for up to N = 3 possible particles: NN par estimates the number of particles traversing a cluster and the remaining nine estimate the positions (NN pos ) and errors (NN err x and NN err y ).The training data are a mixed sample of top-antitop pair production (t t) and highly energetic dijet events (140 < p jet T < 560 GeV).The t t events were generated using the MC@NLO [14] generator interfaced with HERWIG [15] to model the parton shower and the dijet events were generated using PYTHIA.Both samples were passed through the standard GEANT4 [16] simulation of the ATLAS detector [17] and the pixel digitisation model.The detector simulation includes effects such as ionisation loss, the production of knock-out shell electrons, δrays, and hadronic spallation from the interactions of charged particles in the pixel modules.The digitisation model accounts for the charge drift to the readout surface including the Lorentz angle and effects from thermal diffusion, cross-talk between neighbouring channels and the modelling of the ToT shape and its 8-bit readout precision.Time-dependent radiation effects are not included at this stage, but may need to be considered in the future when the detector will have suffered from significant radiation damage.
All networks are trained on a number of training patterns which exceed the number of network parameters by at least a factor of a thousand, in order to avoid problems due to the finite size of the training sample.All network layouts include two hidden layers.The inputs and continuous outputs are shifted and normalised to yield an average value of zero and a standard deviation of one to improve the efficiency of the neural network training.The data used to train the neural network are subdivided into a training and a testing sample.The training of each network is completed once the sum of the mean square errors of the target values with respect to the values estimated by the network reaches a minimum, when computed on the statistically independent testing sample.A typical training takes from 12 to 24 hours on a single CPU.

Changes to the track reconstruction
The NN clustering algorithm allows many previously merged clusters from two nearby particles to be resolved.Consequently, the track-finding strategy and the candidate selection cuts in the ambiguity processing is modified to take advantage of the improved double-track resolution.
Prior to the NN clustering, track candidates were found using a progressive filter that iteratively selected the best matching measurement on each layer, starting from an initial track seed.With the NN clustering, the progressive filter is not optimal as it does not consider all possible nearby clusters in a pixel module, therefore a combinatorial filter is used instead.The combinatorial filter considers more than one measurement on a predicted measurement layer when building the track candidate, which may lead to several track candidates per track seed.The extrapolation of low momentum tracks has large uncertainties due to multiple scattering.Therefore, all split clusters from the original cluster found by the CCA are considered in the combinatorial filter if their separation is within the extrapolation uncertainty.In order to reduce the CPU overhead, seeds that are already contained in previously reconstructed candidates are not used to seed new candidates.
The track selection in the subsequent ambiguity processing also benefits from the significantly reduced rate of shared measurements to efficiently reject fake candidates.After running the NN clustering algorithm, a track can share a measurement with another track only if the cluster is not split and the output of the NN is compatible with a possible merged cluster.There is, however, the possibility that the algorithm falsely splits clusters.False splitting raises the risk of creating fake or duplicate tracks.The relative fractions of correctly split to falsely split clusters was studied using Monte Carlo simulations, where the number of contributing particles to each cluster is known.Figure 6 compares the interdependence between falsely split clusters stemming from a single particle and the rate of clusters created by two particles which were not split.The chosen working point of the NN clustering splits about 71% of the clusters that arise from 2 particles correctly.On the other hand, 7.5% of the clusters that arise only from one particle are incorrectly split into two, which is indicated in the figure by a star.

Performance of the neural network clustering algorithm
The cluster-splitting performance can be significantly improved when the incident angles from reconstructed tracks are used instead of estimates based on the centre of the luminous region.This is also seen in figure 6, which shows that using track information improves the rate of incorrectly split clusters by approximately a factor of two at 71% efficiency.However, in the current ID track reconstruction strategy, no additional track finding is run after the track fitting stage and therefore an additional pass of the splitting algorithm cannot be used to resolve shared clusters or recover previously lost tracks.Therefore, the NN par run with the track information are only used to improve the resolution and error estimates.
Merged clusters, when assigned to a track, usually yield a degraded position estimate.This results in a larger spread of the track-to-cluster distance, the so-called cluster residual.Figure 7 shows the comparison of the residuals in the local x direction for clusters with a width of three or four pixels in the transverse direction when applying the NN clustering algorithm, compared to the outcome of the CCA algorithm.A clear improvement can be seen in both cases and becomes particularly visible in the four-pixel wide cluster category.The layout of the ATLAS pixel detector minimises the probability that a reconstructable charged particle originating from the luminous region traverses more than three pixels in the transverse direction as discussed in section 2. Thus, most of those clusters stem from multiple particles or δ -rays.Both cases lead to a double peak structure reflecting the 50 µm pitch size in the transverse direction.This double peak structure completely vanishes with the NN clustering algorithm.Some non-Gaussian tails are present in the residuals for the NN clustering.These originate from large-angle scattering, δ -rays (which are not included in the labelling of the number of particles per cluster in the NN training sample) and from clusters on the edges of modules, which typically have skewed distributions.

Performance in data and simulation
The MC simulation was used to obtain the pattern of charge deposition for the training set.Good performance of the NN clustering algorithm, as well as a good agreement between the performance on data and simulation, depends on how well the interaction of particles with the silicon and the signal collection is modelled by the detector simulation and digitisation.Cluster merging depends on the local charged-particle density, Lorentz drift and the incident angles of the traversing particles; effects from charge collection and channel cross-talk are negligible.Clusters in the barrel and endcap are thus treated similarly, but with the detector region given as input to the NN, so cluster classification is performed based on cluster sizes.Figure 8 compares the root mean square (RMS) of the measurement residuals for the CCA clustering and the NN clustering algorithm in data and simulation in the transverse and longitudinal direction in the different cluster categories.The majority of three-and four-pixel wide clusters in the transverse direction are due to close-by particles and δ -rays.In the longitudinal direction, clusters of this size are geometrically possible due to the shallower incidence angle.The improvement shown in figure 8(left) can thus be mostly attributed to actual cluster splitting, which includes splitting components from δ -rays, while in figure 8(right) a sizeable contribution of the improvement is caused by the non-linear charge interpolation of the NN clustering algorithm.Discrepancies between data and Monte Carlo simulation can arise from imperfections of the detector such as module misalignment or deformations that are not present in the simulated model of the detector geometry, as well as from limitations in the detector simulation and digitisation model that include several complex components as described in section 3.2.Discrepancies are seen in figure 8 for the longitudinal direction.This is most likely due to limitations in the modelling of the longitudinal charge sharing.Nonetheless, the relative improvement obtained by the NN clustering algorithm compared to the CCA clustering algorithm is largely consistent between data and Monte Carlo simulations.
The improvement coming from the non-linear charge interpolation and δ -ray handling in the NN clustering can be checked on isolated tracks as there are no other close-by particles from the beam collision.Pairs of oppositely charged combined muons with p T > 25 GeV, which produce a Z boson candidate with a mass m µ µ > 50 GeV were selected.A combined muon is a muon reconstructed using information from both the inner detector and the muon spectrometer.The impact parameter resolution with respect to the primary vertex in data is shown in figure 9.Only the inner detector component of the combined track is taken to extract the impact parameter distribution, and only tracks that have a measurement in the innermost pixel layer (B-layer) were considered for this comparison.The resolution in the core of the distribution is estimated by performing an iterative procedure to obtain the RMS within 3σ around the mean of the distribution.The NN clustering algorithm yields an improvement of 7% (3%) in the z 0 (d 0 ) resolution compared to the CCA clustering algorithm.This improvement varies significantly as a function of η, due to the variation in the incident angle of the tracks with respect to the module.At central η, no improvement in the z 0 resolution is obtained with the neural network algorithm because at normal incidence to the module the clusters are one pixel in size in the longitudinal direction and therefore interpolation is not possible.In addition to the improvement of the track resolution, the NN clustering algorithm suppresses clusters that are assigned to several track candidates.Figure 10 shows the average number of shared measurements in the innermost pixel layer for tracks found in jets reconstructed from calorimetric -15 -information with 500 < p T < 600 GeV, as a function of the distance ∆R to the centre of the jet.This distance is calculated from the angular separation as ∆R = ∆η 2 + ∆φ 2 .In the jet core, the average number of shared measurements is reduced by a factor of three.The simulation accurately describes the rate of shared measurements and the reduction obtained with the application of the neural network.The NN clustering algorithm runs approximately six times slower than the CCA clustering algorithm.However, in comparison to the full event reconstruction, the NN splitting, the reevaluation of the splitting during the track fitting and the increased combinations from additional track candidates increase the per-event execution time by at most 5 percent.This was estimated using the highest pile-up conditions experienced during normal physics data taking in 2012.

Conclusion
A new method using a set of neural networks to identify and split clusters created by multiple charged particles in the ATLAS silicon pixel detector is presented.The algorithm results in a factor of three reduction of the numbers of measurements assigned to multiple tracks, in particular in the core of highly energetic jets.
An additional set of neural networks was trained to estimate cluster positions and uncertainties.The superior, non-linear behaviour of the neural network results in a significant improvement of the impact parameter resolution even for isolated tracks.Good agreement in neural network performance is observed when comparing Monte Carlo simulation and experimental data from proton-proton collisions at the LHC.

Figure 1 .
Figure 1.Geometry of the ATLAS pixel detector.There are three concentric cylindrical barrel layers and three discs on each side, all are equipped with identical pixel modules.

Figure 2 .
Figure2.Illustration of cluster sizes in the transverse direction for selected scenarios: (a) a single charged particle traversing the silicon sensor with incident angle, φ , producing a cluster with a width of two pixels.The charge drifts due to the electric and magnetic field in the direction of the Lorentz angle θ L to the surface.(b) a merged cluster produced by two close-by particles (A 1 , A 2 ) that both deposit charge in the second pixel.The fourth pixel is not read-out as the charge is below the read-out threshold.(c) a large cluster created from a single charged particle due to a δ -ray produced in the silicon

Figure 3 .Figure 4 .
Figure 3. Illustration of charge deposited by multiple particles in the dense core of a jet in a layer of the pixel detector.The pixel size is not drawn to scale.The arrows indicate the passage of charged particles through the pixel sensor.The pixels are shaded according to which particle deposited charge in them.The dashed lines indicate the path traversed by the particles in the silicon and the solid line shows the single cluster obtained by the eight-cell CCA.

Figure 5
Figure 5 illustrates the result of a successful split achieved by the NN clustering algorithm on an example cluster created by two close-by particles from Monte Carlo simulation.In this figure, the black arrows indicate the paths taken by the true particles passing through the silicon.The black circle surrounded by a dotted ellipse indicates the single cluster position and error estimate

Figure 5 .
Figure 5. Example of a merged cluster created by two particles, from Monte Carlo simulation.The two arrows show the paths of the particles through the silicon, the black squares indicate the true intersection with the mid-sensor plane.The black dot illustrates the non-split cluster position, while the two stars show the estimated cluster positions after splitting.The ovals indicate the position error estimates and the p(N = i) denote the probabilities for the cluster to have been created by i particles as estimated by the neural network.Effects caused by the Lorentz angle in the silicon sensor were removed in this illustration.

Figure 6 .
Figure 6.The fraction of non-split two-particle clusters versus the fraction of incorrectly split one-particle clusters in simulation.The cut value of the NN varies along the solid and dashed lines: a tight cut corresponds to the lower right corner of the plot and a loose cut correponds to the upper left corner.The distribution is shown both for the NN clustering algorithm using only the cluster and luminous region information (solid line) and for the NN clustering algorithm additionally including the track information (dashed line).The chosen working point is indicated with a star for the setup without track information, because, due to the configuration of track reconstruction algorithms used by ATLAS, this is the only pass of the NN algorithm used to split clusters.

Figure 7 .
Figure 7.The cluster residual in the local x direction for clusters with a width of three (left) or four pixels (right) in the x-direction reconstructed with the CCA clustering algorithm (dashed line) and the NN clustering algorithm (solid line).

Figure 8 .
Figure 8. Measurement residual resolution in the transverse (left) and longitudinal (right) direction in data and simulation for the CCA and NN clustering algorithms.The lower panel shows the ratios of the residual resolutions of the NN compared to the CCA clustering algorithms.Error bars are included for the data but are smaller than the markers.

Figure 9 .
Figure 9.The transverse (left) and longitudinal (right) impact parameter resolution with respect to the primary vertex for isolated muons from Z boson decays with the NN (solid) and the CCA (open) clustering algorithms.The ratio of the resolutions obtained with the NN and CCA clustering algorithms is indicated in the lower panels.Error bars are included for the data but are smaller than the markers.

Figure 10 .
Figure 10.The average number of shared measurements in the B-layer on tracks associated to anti-k T jets with 500 < p T < 600 GeV for data and simulation, reconstructed with the CCA and NN clustering algorithms.This is shown as a function of the distance of the track from the centre of the jet.The ratio of the average number of shared measurements in data and simulation is shown for both the NN (solid line) and the CCA (dashed line) clustering algorithms.