Helium identification with LHCb

The identification of helium nuclei at LHCb is achieved using a method based on measurements of ionisation losses in the silicon sensors and timing measurements in the Outer Tracker drift tubes. The background from photon conversions is reduced using the RICH detectors and an isolation requirement. The method is developed using pp collision data at √(s) = 13 TeV recorded by the LHCb experiment in the years 2016 to 2018, corresponding to an integrated luminosity of 5.5 fb-1. A total of around 105 helium and antihelium candidates are identified with negligible background contamination. The helium identification efficiency is estimated to be approximately 50% with a corresponding background rejection rate of up to 𝒪(1012). These results demonstrate the feasibility of a rich programme of measurements of QCD and astrophysics interest involving light nuclei.


Introduction
The observation of antihelium in Cosmic Rays (CR) could be a signature for physics beyond the standard model, such as dark matter annihilation [1] or the existence of antimatter domains in space [2].The AMS-02 experiment [3] aboard the International Space Station (ISS) has reported several antihelium candidates of unclear origin [4].Light nuclei can also be produced in interactions of CR with the interstellar medium via coalescence of secondary baryons, where the CR and the interstellar gas are mostly protons and around 10% helium [5].The coalescence of antibaryons into antinuclei has been studied at colliders, for example at the LHC in pp and PbPb collisions (ALICE [6]), at the RHIC in AuAu collisions (STAR [7], PHENIX [8], BRAHMS [9]), and at the SPS in fixed-target experiments (NA49 [10]).
The LHCb detector [11,12] covers the forward pseudorapidity region (2 < η < 5), as opposed to other experiments that have helium identification capabilities only at central pseudorapidity (small |η|).The LHCb experiment can therefore measure helium production in a region that is unexplored by other experiments.It is a single-arm forward spectrometer that includes a high-precision charged-particle reconstruction (tracking) system consisting of: a silicon-strip vertex detector (VELO) [13,14] that surrounds the pp interaction region, a large-area silicon-strip detector (TT) [15] located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors (IT) [16,17] and straw drift tubes (OT) [18,19] placed downstream of the magnet.The tracking system measures the momentum p of charged particles with a relative uncertainty that ranges from 0.5% at low momentum 1 to 1.0% at 200 GeV.The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15+29/p T ) µm where p T is the component of the momentum transverse to the beam, in GeV.Different types of charged hadrons are distinguished from one another using information from two ring-imaging Cherenkov (RICH) detectors [20,21].
The method presented in this paper exploits information from energy losses through ionisation in the silicon sensors, alongside information from the OT and RICH to improve the separation power between helium and minimally-ionising particles of charge Z = 1.It is assumed throughout this paper that all observed helium is 3 He, due to the suppression by a factor O(10 3 ) predicted by coalescence for each additional baryon [22].A dedicated study for the 4 He production in pp collisions exploring the particle identification capabilities of the LHCb RICH is foreseen in the future.

Energy-loss measurements
The average energy loss per path length of a particle of charge Z passing through a silicon sensor (dE/dx) is proportional to Z 2 , as modelled by the Bethe-Bloch formula [23].Figure 1 illustrates the energy deposited by helium and Z = 1 particles.In silicon, these energy losses are converted into around 80 electron-hole pairs per micron, and the movement of this charge in the electric field of the depleted region induces a signal 1 Natural units where ℏ = c = 1 are used throughout.on the readout strips.This signal can appear in one or a few strips, depending on the electric field, incidence angle, and couplings between strips.The strips in LHCb silicon sensors have different pitch, width, length and shapes, thus leading to different electric fields, total capacitances, and couplings.The signal from each strip is registered by the front-end electronics and measured by the readout system, which is equipped with 7-bit analogue-to-digital converters.This limits the dynamical range to 127 counts per strip, and given that one count corresponds to ∼ 440 collected electrons, signals above ∼ 56 000 electrons saturate.Neighbouring signal strips are typically combined into clusters of up to four strips.The counts from the strips in a cluster are summed together to obtain the cluster amplitude (ADC).The saturated strips lead to an underestimation of the cluster ADC, an effect that is partially compensated by the increasing cluster size (CLS) for Z = 2 particles.The distribution of the median ADC of all clusters per track, corrected for the incidence angle, is shown on the right-hand side of Fig. 1.The sensors are calibrated such that the signal from Z = 1 particles peaks at ADC ∼ 40, whilst the helium signal is found to peak at ADC ∼ 135.This is as expected from the Z 2 dependence, taking into account saturation effects.One such effect is the sharp feature at ADC = 127, which corresponds to CLS = 1 clusters that saturate.The distribution of electrons from photon conversions is discussed in Sect.4.3.

Sample selection
The results presented in this paper are obtained with the pp-collision data collected during the years 2016, 2017, and 2018, at a centre-of-mass energy √ s = 13 TeV.The total integrated luminosity of 5.5 fb −1 is composed of 1.6 fb −1 collected in 2016, 1.7 fb −1 in 2017, and 2.2 fb −1 in 2018.The combined output of all LHCb physics trigger lines [24] is used.Two data subsets, called "preselection 1" and "preselection 2", are defined in a data reduction stage that takes place after the trigger and full data reconstruction.
In addition, minimum-bias data are used to study the performance of the identification method.This sample consists of 1.7 × 10 8 events with 2.1 × 10 9 tracks, recorded with random triggers and selected using a small prescaling factor.Reference samples of singlycharged particles are also obtained from Λ → pπ − and D * + → D 0 (K − π + )π + decays2 that are processed into dedicated calibration data [25].They contain around 10 9 tracks in total and correspond to an integrated luminosity of 3 fb −1 . 3These samples are supplemented by kaons and pions from B 0 → K * 0 (K + π − )J/ψ(µ + µ − ) candidates selected from data corresponding to 5.5 fb −1 .
The response of the LHCb detector to 3 He is simulated using a simplistic Monte Carlo simulation where a single particle is generated at the nominal pp interaction point and traced through the LHCb detector using the Geant4 toolkit [26].The simulated events are processed by the same reconstruction software as for the data.An additional sample of simulated B 0 → K * 0 (K + π − )J/ψ(µ + µ − ) decays is used for comparison with the matching data sample.
Selected tracks are required to have passed through the VELO, TT, and the three tracking stations downstream of the magnet.A requirement is placed on the significance of the match between upstream and downstream track segments to suppress fake tracks and particles produced in interactions with the detector material.The rigidity of each track, p/|Z|, is required to be larger than 2.5 GV and the transverse component of the rigidity must be at least 0.3 GV.Each track must also be of good quality and have a sufficient number of hits in the silicon detectors to enable the identification techniques discussed below.Tracks that fulfil these requirements are considered well-reconstructed.In the preselection stage two simple, robust, and independent variables are used to define two subsets of data enriched in helium.Together, they ensure redundancy in the selection, and enable efficiency comparisons.The first subset, referred to as "preselection 1", requires at least one track with the number of overflows greater than 3.This quantity is defined as the total number of saturated strips in VELO clusters along the track that have CLS < 4. Its distribution is shown in Fig. 2, where a clear separation between simulated helium and Z = 1 particles from calibration data is observed.The background rejection rate of this requirement is O(10 5 ).The second subset, called "preselection 2", is obtained by requiring median ADC > 115.Additional kinematic and track-quality requirements are applied, and pions are rejected using loose particle identification cuts.The efficiencies of these two preselections are estimated by means described in Sect.4.1.

Helium identification
The VELO, IT, and TT silicon sensors have different geometries, with the most intricate one being the VELO.It has 42 modules (21 per side), each consisting of two different types of sensors with strips oriented to make measurements of the radial and azimuthal position.They are referred to as R and ϕ sensors, respectively.The TT sensors have a pitch of 183 µm and four different strip lengths, ranging from 94 mm to 378 mm.The IT sensors that occupy the central part of the tracking stations downstream of the magnet have two different strip lengths, 108 mm and 216 mm, with 198 µm pitch, and two different sensor thicknesses, 320 µm and 400 µm.The thicker sensors produce larger signals.To account for different sensor geometries, VELO R and VELO ϕ sensors are considered separately, as are the thin and thick IT sensors; no significant improvement is found when considering TT sensors of different strip length.
Figure 3 shows the distribution of the cluster amplitude in VELO R sensors, separately for each cluster size, as obtained from simulated helium tracks, and Z = 1 particles from data.It can be seen that helium is characterised by larger CLS, higher average ADC values, and more frequent saturation.The sharp features at ADC = 127, 254, 381 and 508 correspond to saturation of 1, 2, 3 and 4 strips, respectively.

Likelihood estimators from silicon detectors
Each cluster is a separate independent measurement of the energy deposit.This information is combined into a likelihood estimator for each type of sensor.To construct this likelihood estimator, two-dimensional probability density distributions (PDD) are derived from the ADC distributions, as exemplified in Fig. 3 for VELO R sensors.One dimension of the PDD is the cluster size and the other is the ADC.To derive the likelihoods of the helium (He) and background (bkg) particle hypotheses, the PDDs are used as look-up tables: where X ∈ {He, bkg}, and n is the number of clusters on the track.Helium and background tracks are separated using the log-likelihood ratio The likelihoods are evaluated for VELO R, VELO ϕ, TT, IT thick and IT thin sensors.
The likelihood for the former is shown as an example in Fig. 4 on the left-hand side, where the 3 He simulation is compared to the calibration data described in Sect.3. Significant separation power between helium and Z = 1 particles is observed in the likelihood estimators from all subdetectors.However, none of them would be sufficient by themselves to produce a pure helium sample from an expected helium-to-background ratio of 1 in O(10 8 ).On the right-hand side of Fig. 4, the distribution of Λ VELO R LD in samples of pions and kaons from B 0 → K * 0 (K + π − )J/ψ(µ + µ − ) decays is compared between data and simulation.The shape is found to be described well by the simulation; however, the tail would lead to overestimation of the background contamination in the signal region if simulation were used for such estimates.The differences between the distribution of helium candidates and of simulated helium are attributed to imperfect modelling of the signal in the silicon sensors.For the VELO, a weighted mean of the VELO R and VELO ϕ likelihood estimators is used to define The weights n VELO R and n VELO ϕ are the number of clusters on the track in R and ϕ sensors, respectively.The thin and thick IT estimators are combined in the same way to obtain Λ IT LD .) tracks in the minimum-bias data, this indicates a separation power of at least 1 in 10 8 .Roughly ∼ 20% of these tracks are in the IT acceptance and Fig. 5 (right) shows that they are predominantly distributed at large Λ IT LD values.The roughly 50 helium candidates are used to estimate the efficiencies of the two preselections discussed in Sect.3. As shown on the left-hand side of Fig. 6, the overlap of the selected tracks is (24 ± 6)%.In addition, both preselections are found to be approximately 50% efficient, as shown on the right-hand side of Fig. 6 for the case of preselection 1.

Track-time measurements from the OT
As depicted in Fig. 7 on the left-hand side, the VELO and the TT stations measure the dE/dx of tracks upstream of the magnet.However, the IT silicon detector at the centre of the T-stations downstream of the magnet does not cover the full detector acceptance.The rest of the tracks lie in the acceptance of the OT, which consists of gaseous drift tubes that measure the drift time of ionisation clusters produced by passing charged tracks; the time resolution is around 1 ns.The OT is calibrated such that the reference track time (t 0 ) is zero for Z = 1 particles [19].However, helium produces on average four times more ionisation in the gas, thus triggering the readout discriminator earlier.Therefore, helium is characterised by negative t 0 values at high momentum (p > 10 GeV), as shown in Fig. 7. Across the entire kinematic range, the shift is influenced by the momentum dependence of dE/dx.At low momentum, the time measurement is sensitive to the mass of 3 He, leading    to an upwards shift of around 4 ns at a momentum of 5 GeV.
The momentum dependence of t 0 is parameterised by a smooth function depicted in Fig. 7 on the right-hand side.This function is used to calculate the expected time of a given track (t fit ).As part of the selection of helium candidates, tracks passing through the OT are required to have ∆t OT ≡ |t 0 − t fit | < 1 ns.This requirement significantly reduces the background with minimal signal loss.
The VELO and TT estimators provide helium identification upstream of the magnet.The combination of the IT and OT provides similar information downstream of the magnet, which is essential to discriminate the signal from photon conversions, as discussed in the following section.A photon traversing the LHCb detector may produce an electron-positron pair with a small opening angle; the electron and positron may lose energy in the same silicon sensors, thus leading to a cluster whose amplitude is on average twice as large as expected from a Z = 1 particle, as can be seen in the right-hand-side of Fig. 1.Because of this, e + e − pairs from photon conversions in the VELO are not distinguished from helium as easily as other Z = 1 particles.The electrons and positrons are deflected in opposite directions by the magnet but the resulting distinct tracks share the same segment in the VELO.Therefore, conversions are suppressed by accepting only tracks that do not share their VELO segment with another track.

Rejection of photon conversions
In addition, the RICH is used to construct log-likelihoods corresponding to different mass hypotheses for each track in the event.These are standard variables used in LHCb measurements, with the pion hypothesis being the default given that pions are the most produced particles.
Helium nuclei must have momenta of at least 53 GeV to produce Cherenkov rings in the RICH.Since most detected helium track candidates are below this threshold, the response of the RICH detectors to tracks in the signal sample is expected to be dominated by Cherenkov photons from other tracks in the event.However, electrons below the pion momentum threshold may be separated using the difference between the RICH log-likelihoods constructed assuming the associated track is either a pion or an electron.This quantity, denoted by Λ RICH e−π , is used to reject electrons from photon conversions, as illustrated in Fig. 8.It depicts the distributions of Λ RICH e−π in a helium-enriched and a background-enriched sample.The former consists of tracks required to have Λ VELO LD > 0 and Λ TT LD > 1, whilst the latter comprises tracks selected using the cuts Λ VELO LD > 0 and Λ TT LD < −5.It can be seen that requiring Λ RICH e−π < 2 is expected to reject a substantial amount of background, whilst being highly efficient on the signal.The separation power is diluted at high momentum, where Cherenkov rings from helium are indistinguishable from those produced by other particles.This low-momentum background separation technique is therefore complementary to the high-momentum one provided by the OT.
5 The helium sample Table 1: Helium selection criteria quantified throughout this paper.The logical or is required between the two preselection criteria.Given the complementary acceptances of OT and IT, the logical or is also required between the downstream requirements.

Type Track property Selection
Acceptance, kinematics The data selected by applying the requirements described in the previous sections are shown in Figs. 9 and 10.Additionally, tracks that are incompatible with originating from a primary vertex are rejected by means described in the following section.The selection criteria quantified thus far are summarised in Table 1.A large, well-separated population of helium candidates is observed for Λ VELO LD > 0 and Λ TT LD > −1 (region A).Approximately 1.1 × 10 5 candidates are selected, and Fig. 5 shows that their location is consistent with the ∼ 50 candidates found in minimum-bias data.
Singly-charged particles are located at low values of Λ VELO LD and Λ TT LD (region D).They peak at two different Λ VELO LD values, one for each of the preselections presented in Sect.3. The median ADC requirement of preselection 2 biases the Λ VELO LD distribution towards larger values with respect to preselection 1. Electrons from converted photons are highly collinear in the VELO but are separated by the magnetic field.This means that they cannot be separated by Λ VELO LD alone, however they can be separated by Λ TT LD and are therefore found predominantly in region B.
The Λ VELO LD , Λ TT LD , and Λ IT LD estimators are expected to have a mass, a momentum and an incidence-angle dependence, due to the Bethe-Bloch formula.This is studied in detail, and is found to be barely resolvable with the 7-bit ADC resolution of the LHCb silicon trackers.As a result, the obtained distributions are continuous and nearly independent of these quantities.This is illustrated for Λ VELO LD in Fig. 11.The material in the LHCb detector is inhomogeneously distributed in pseudorapidity and azimuthal angle, however no significant dependence of the separation power on these quantities is observed.

Helium production in the detector material
Particles produced in collisions may interact with the detector material or the beam pipe, thus producing secondary particles, including helium.Antihelium is not produced in this way, thus inducing an asymmetry in distributions of variables that are different for prompt and non-prompt particles.One such variable is the change in the χ 2 of the PV when it is reconstructed with or without the track of interest.This quantity, denoted by χ 2 IP , is very close to the square of the ratio between the impact parameter of the track and its uncertainty.IP for helium and antihelium tracks, combining data from both preselections.The requirement that separates prompt and displaced helium is shown by the vertical dashed line.
The distribution of ln χ 2 IP in selected helium tracks is shown in Fig. 12.An enhancement is observed at large values in the helium candidate sample, but not in the antihelium one.This enhancement is consistent with the production of helium in the detector material.Therefore, prompt and displaced helium tracks are isolated by a cut on χ 2 IP .The upper tail in the antihelium distribution above ln χ 2 IP > 2 is a sign of non-prompt contributions to the sample, as expected from hypertriton or Λ 0 b decays.Only tracks with ln χ 2 IP < 2 are considered for the results presented throughout Sect. 5.

Estimation of residual contamination
The projections onto the Λ VELO LD axis of the 2D distributions displayed in Fig. 10 are used to estimate the level of residual background and are shown in Fig. 13.The distribution of The procedure is performed separately on the preselection 1 subset, which contains around 75% of the tracks in region A, and on data from the preselection 2 sample.This is due to the fact that the requirements for preselection 2 bias the background towards larger values, as explained in Sect.3. The background sample is required to have Λ TT LD < −2.In the case of preselection 2, the shape of the background is shifted by +0.5 to account for correlations between Λ TT LD and Λ VELO LD , which can be seen in Fig. 9.This shift is determined by minimising the χ 2 between the signal and background histograms at negative Λ VELO LD values.Fig. 13 shows that the background in the signal region, i.e. at Λ VELO LD > 0, is negligible.
The signal region of preselection 1 contains 8.7 × 10 4 tracks, of which 15 are expected to be background.Together with the observation of 54 helium tracks in minimum-bias, the number of background tracks in the signal region in minimum-bias that pass the same requirements can be estimated to be 6 × 10 −3 .Given the total number of 1.2 × 10 9 well-reconstructed tracks in the minimum-bias sample, this leads to a mis-identification probability for such a background track to pass the helium identification of O(10 −12 ).

Summary
Employing techniques that are almost entirely data driven, helium and antihelium are observed for the first time at the LHCb experiment.This is accomplished using dE/dx measurements in the silicon sensors (VELO, TT and IT), alongside information from the RICH and OT subdetectors.
A total of 1.1 × 10 5 prompt helium and antihelium are identified with negligible background contamination in the LHCb pp collision data collected in the years 2016 to 2018.The efficiency of the helium identification method is estimated to be approximately 50% with a corresponding background rejection factor of up to O(10 12 ).
The identification method will be applied to other LHCb Run 2 datasets, such as proton-ion, ion-ion, and SMOG collision data.Compared to the ALICE experiment, which covers the central rapidity region |y| < 0.5, the LHCb results will extend the available measurements in the so far experimentally unexplored forward region (2 < η < 5).This identification technique, innovative at the LHCb experiment, proves the feasibility of a rich programme of measurements of QCD and astrophysics interest involving light nuclei.

Figure 2 :
Figure 2: Distributions of the number of overflows for simulated helium signal and Z = 1 particles from calibration data.The vertical line indicates the preselection requirement.

Figure 3 :
Figure 3: Distributions of the cluster amplitudes, for different cluster sizes, in the VELO R sensors for (left) simulated helium signal and (right) calibration data.

Figure 5 :
Figure 5: Distributions of the log-likelihood estimators from the LHCb VELO (Λ VELO LD ), TT (Λ TT LD ) and IT (Λ IT LD ) silicon trackers from the minimum-bias data.Helium candidates are indicated by green boxes.Left: the tracks are required to pass the selection described in Sect.4.2 or to have Λ IT LD > −1.Right: the tracks must traverse the IT and have Λ VELO LD > 0. The full selection described in Sects.4.2, 4.3 and 5.1 is applied.

Figure 6 :
Figure 6: Left: Efficiency matrix of the two preselections determined from the helium candidates selected in minimum-bias data.Right: Distribution of the number of overflows in helium candidate tracks from minimum-bias data and Z = 1 particles from calibration data.The latter is normalised to 25% of the former.The requirement used in the preselection is shown by the vertical dashed line.

Figure 7 :
Figure 7: Left: sketch of the LHCb tracking detectors and magnet; adapted from Ref. [12].Right: Distribution of the reference track time versus the momentum for tracks with Λ VELO LD > 0 and Λ TT LD > 1.In each momentum slice, the distribution is normalised to unity and fitted with a Gaussian function.The positions of the means is indicated by the black dots.These are fitted as described in the main body, resulting in the shape depicted by the green continuous line.The dashed green lines indicate the ±1 ns interval.

Figure 8 :
Figure 8: Distribution of the RICH electron-pion separation estimator (Λ RICH e−π ) for helium signal and background below the rigidity thresholds for Cherenkov photon production by 3 He in the two LHCb RICH detectors.The vertical dashed line indicates the selection requirement.

Figure 9 :
Figure 9: Distributions of the log-likelihood estimators from the LHCb VELO (Λ VELO LD ), TT (Λ TT LD ) and IT (Λ IT LD ) silicon trackers.Left: the tracks are required to pass the downstream selection.Right: the tracks must traverse the IT and have Λ VELO LD > 0. On the left, the signal region is denoted by A, whilst regions B, C, and D correspond to background.

Figure 10 :
Figure 10: Distributions corresponding to that on the left-hand side of Fig. 9, but separated into tracks of (left) positive and (right) negative charge.

Figure 11 :Figure 12 :
Figure 11: Distribution of Λ VELO LD as a function of momentum (left) and pseudorapidity (right), in data from both preselections combined.Tracks are required to pass the selection in Sect.4.2, as well as Λ TT LD > −1.Each vertical slice is normalised to unity.

Figure 13 :
Figure 13: Distribution of Λ VELO LD in (left) helium and (right) antihelium samples from the (upper) preselection 2 and (lower) preselection 1 data samples.The distribution from an independent background sample is shown in blue, scaled to match the size of the populations found in data at negative Λ VELO LD values.To visualise the background in the preselection 1 sample, the inset of each bottom-row plot shows a magnification of the region Λ VELO LD ∈ [−10, 1].Each inset's horizontal axis matches that of the containing plot.
Figure1: Left: Distributions of the deposited energy in a VELO sensor for simulated helium and protons.Right: Distributions of the median ADC of all VELO clusters per track, corrected for the incidence angle, for Z = 1 particles (blue), e + e − pairs from photon conversions (purple), and helium (orange).The helium selection, described in the text, is applied but no VELO requirements are imposed.