Numerical optimization for Artificial Retina Algorithm

High-energy physics experiments rely on reconstruction of the trajectories of particles produced at the interaction point. This is a challenging task, especially in the high track multiplicity environment generated by p-p collisions at the LHC energies. A typical event includes hundreds of signal examples (interesting decays) and a significant amount of noise (uninteresting examples). This work describes a modification of the Artificial Retina algorithm for fast track finding: numerical optimization methods were adopted for fast local track search. This approach allows for considerable reduction of the total computational time per event. Test results on simplified simulated model of LHCb VELO (VErtex LOcator) detector are presented. Also this approach is well-suited for implementation of paralleled computations as GPGPU which look very attractive in the context of upcoming detector upgrades.


Introduction
Track reconstruction naturally arises in many of high-energy physics experiments: events produced by p-p collisions at the LHC energies typically include hundreds of signal examples (interesting decays) and a significant amount of noise (uninteresting examples). This makes track reconstruction a challenging task. The substantial increase in collision energy, which leads to the increase in the number of produced tracks, makes one seek for more sophisticated event selection and reconstruction techniques which heavily rely on track finding procedures. High computational cost of event reconstruction methods gives an advantage to algorithms designed for massively parallel architectures (e.g. GPU or custom hardware). One of such algorithms is the Artificial Retina [1], a pattern-matching algorithm inspired by the structure of low-level visual recognition areas in mammal's receptive fields [2]. One of the advantages of the algorithm is its extremely high parallization capacity which makes it well-suited to the track finding in high track multiplicity environments [3].
In this work, we study a modification of Artificial Retina algorithm: it is reformulated as an optimization problem and well-known methods for global optimization in continuous space are adopted. This approach allows more flexible trade-off between computational cost and track finding performance and leads to considerable reduction of total computational time.
Comparison of a grid-based and the proposed method is made on a simplified model of the LHCb VELO (VErtex LOcator) detector. The model qualitatively reflects physics in VELO, parameters for the model are inspired by parameters of Monte-Carlo simulation with the Run 3 upgrade design of the VELO [4].

Artificial Retina algorithm
The Artificial Retina (AR) algorithm was proposed as a fast, massively parallel track reconstruction method [1], inspired by low-level structure of line and edge detection areas of visual cortex in mammal's brain [2]. The algorithm introduces a grid of units (or, continuing biological analogy, 'cells' or 'neurons' ), each corresponding to a particular track pattern configuration (pattern) such as position and angle. For each new observation (hits) each cell computes measurements of correspondence between its pattern and the observation.
In a simple case of straight line detection in 2D space, a line can be represented by two parameters θ = (α, β) ∈ R 2 . Thus units of AR can be arranged into a two dimensional grid, each of which corresponds to a pattern with parameters θ ij = (α i , β j ). Given a set of hits with coordinates {(x k , y k )} N k=1 , activation (response) of a unit is typically defined as: (1) where: • s(x, θ) -the distance between hit x and the line defined by parameters θ; • σ -bandwidth parameter.
As can be seen from (2), the activation R ij roughly corresponds to the number of hits that are in agreement with pattern's parameters (α i , β j ). Typically, a set of hits aligned in a line activates a cluster of units with maximal activation in the unit with pattern parameters closest to the tracks parameters.
Bandwidth parameter σ controls smoothness of the response function (2), and should be set accordingly to the grid step and hit position errors, e.g. σ ∼ ∆ α = α i+1 − α i . As shown on figure 2a, high σ values may result in merge of two close tracks, and, when uncertainties in hits coordinates are present, low σ, in contrast, may lead to several clusters activated by a single track.
Most of the known AR algorithms rely on computing response R ij of the whole grid The usual steps of such algorithms are: (i) define track model, parameter grid, distance measure; (ii) compute responses (2) for each unit of the grid; (iii) select clusters of activated units; (iv) for each cluster estimate track parameters.
One may notice that the AR algorithm can be reformulated as an optimization problem of finding all local maxima of the response function (1) with respect to track parameters. From this perspective, AR algorithms described above employ brute-force approach: gridsearch in parameter space. In this work we examine one family of methods that can be used as a substitution for grid-search: first-and second-order optimization procedures. One crucial observation is that computations of gradient and Hessian matrix of (1) with respect to and 20 uniformly distributed noise hits, hits are denoted by dots, true tracks are denoted by dashed lines, detector planes -by solid lines; 1b response of the Artificial Retina grid for σ = 2 · 10 −2 , track is parametrized by angle θ to horizontal line and offset x 0 , two local maxima are close to the true track parameters, the distance between the track and the hit is defined as euclidean distance in corresponding detector planes.
track parameters imposes relatively small overhead 2 , which may bring significant benefits to the methods that can utilize this information, e.g. gradient descent.
However, the problem of finding all local maxima of the response function (1) is intrinsically non-convex and non-local, hence global optimization strategies must be adopted. One of such strategies is the multi-start algorithm [5], which allocates q initial guesses drawn from prior distribution P θ and then sequentially updates each of them. In this study the number of updates for each initial guess is fixed.
Pseudo-code for the proposed method is shown in listing 1: function update denotes selected optimization procedure.

Simplified LHCb VELO model and experiment
We illustrate the application of Artificial Retina for tracking on the example of a simplified model of the LHCb Vertex Locator (VELO) detector. This simplified model (sVELO) is inspired by the VELO upgrade Technical Design Report [4], and is aimed to capture all VELO details essential from the tracking point of view.
At LHCb two crossing beams result in proton-proton collisions which produce numerous secondary particles. The collision point (called primary vertex) is far from the magnet, thus secondary particles trajectories can be considered as straight lines. The VELO detector surrounds interaction region and consists of N l layers perpendicular to the beam axis (z-axis). Each layer consist of a number of silicon detectors (pixels) that react on a charged particles crossing the material.
An event consist of N e coordinates (x, y, z) of triggered pixels (hits): either activated by a secondary particle, or noise.
In sVELO we assume that: • each layer is a disk with radiuses: outer r outer = 42mm, inner r inner = 8mm; • layers are equally spaced within 700 mm along z-axis; • particles are travelling in straight lines; • pseudo-rapidity of particles η = − ln tan θ 2 is distributed uniformly η ∼ U [1,6]; • angle in the traverse plain φ is distributed uniformly: φ ∼ U [0, 2π]; • each particle has a probability p hit = 0.5 of interacting with detector layer; • particles that leave less than N min = 2 hits are considered as undetectable and their hits are marked as noise; • errors of (x, y) coordinate measurements are distributed normally: x , y ∼ N (0, 10 −2 ); • N noise uniformly distributed hits are introduced in each event with N noise ∼ Poisson(250).
In the experiment number of reconstructible tracks was varied between 50 and 350. The track is parametrized by two angles θ and φ: x(t) = t sin θ; y(t) = t cos θ sin φ; z(t) = t cos θ cos φ.
The distance function s is defined as euclidean distance between the hit and intersection of a track within hit's detector layer.
A track is considered to be reconstructed if the algorithm reports an estimation within = 10 −3 radians from true track's parameters (which is comparable to the angular size of VELO pixel).
In the experiment we found that for this distance function routine for computing AR response, its gradient and Hessian matrix takes less than 3 time longer than computations of response alone (T 0 ), normalization constant C = 3 is used to account the time difference for different routines. To show the reduction in total computational time relative to plain grid-search, we set the number of initial guesses, so that computational resources are α fraction of these required by grid-search: where: • n grid -number of cells required for plain grid-search to provide resolution, • q = 3 number of optimization steps, • C 0 -time required by each optimization step normalized by T 0 .
Among all methods examined during the experiment, the Truncated Newton method [6,7] was found to yield the best results. For this methods the normalization constant C 0 ≈ 30. During the experiments we discovered that slight improvement of the results can be achieved by updating bandwidth parameter σ with each optimization step, sequence σ 1 = 0.3, σ 2 = 0.175, σ 3 = 0.05 was used in this study.
Results are shown in figure 3. Generally, the efficiency of the algorithm is high. From the figure 3b it can be clearly seen that the efficiency of the algorithm decreases as number of initial seeds approaches to the number of tracks. Nevertheless, with enough initial seeds (figure 3a) efficiency is close to 1, while the whole procedure requires only one third of the resources required by grid-search method.

Conclusion
In this work we examined a modification of the Artificial Retina algorithm that adopts continuous space optimization methods and multi-start procedure. High convergence rates of these methods overcome gradient and Hessian matrix computational costs, which results in overall reduction of total computation time.
Experiments on a simplified model of LHCb VErtex LOcator detector were performed and showed that hat it is possible to keep track reconstruction efficiency above 95% and thanks to the method proposed the computational time can be reduced by the factor of 3 compared to the grid-search based Artificial Retina algorithm.