Evidence for the rare decay Σ + → p μ + μ − at LHCb

A search for the rare decay Σ + → p μ + μ − is performed using pp collision data recorded by the LHCb experiment at centre-of-mass energies s = 7 and 8TeV, corresponding to an integrated luminosity of 3 fb−1. An excess of events is observed with respect to the background expectations with a signal significance of 4.0 standard deviations. No significant structure is observed in the dimuon invariant mass distribution.


Introduction
The Σ + → pµ + µ − decay 1 is a flavour changing neutral current process, allowed only at loop level in the standard model (SM). The process is dominated by long-distance contributions for a predicted branching fraction of B(Σ + → pµ + µ − ) ∈ [1. 6,9.0] × 10 −8 [1], while the shortdistance SM contributions are suppressed at a branching fraction of about 10 −12 . Evidence for this decay was seen by the HyperCP experiment [2] with a measured branching fraction B(Σ + → pµ + µ − ) = (8.6 +6. 6 −5.4 ±5.5)×10 −8 , which is compatible with the SM. Remarkably, the three observed decays have almost the same dimuon pair invariant mass of m X 0 = 214.3 ± 0.5 MeV/c 2 , which lies close to the kinematic limit. Such a distribution, if confirmed, would point towards a decay with an intermediate particle coming from the Σ + baryon and decaying into two muons, i.e. a Σ + → pX 0 (→ µ + µ − ) decay, which would constitute evidence of physics beyond the SM (BSM). Various BSM explanations have been proposed to explain the HyperCP result. The intermediate X 0 particle could be for example a light pseudoscalar Higgs boson [3] [4] in a NMSSM or 2HDM model, or a sgoldstino [5] [6] in various supersymmetric models. Other interpretations and implications can be found in Ref. [7] [8] [9] [10] [11] [12] [13]; in general a pseudoscalar particle is favoured over a scalar one and a lifetime of order 10 −14 s is estimated for the former case. This lifetime would correspond to a prompt X 0 signal, decaying in the same vertex as the Σ + baryon, in any present search for this particle. Attempts to confirm the presence of this X 0 particle have been made by different experiments in various initial and final states without finding any signal [14][15][16][17][18][19][20][21], and in LHCb through the final states B 0 (s) → µ + µ − µ + µ − [22] and B 0 → K * 0 µ + µ − [23]. However, the search for the Σ + → pµ + µ − decay has not been repeated, mainly due to the absence of experiments with large hyperon production and due to experimental difficulties.
Hyperons are produced copiously in high energy proton-proton collisions at the Large Hadron Collider (LHC). A search for Σ + → pµ + µ − decays at the LHCb experiment could therefore confirm or disprove the HyperCP evidence, and measure its branching fraction. This search was also suggested in Ref. [24]. In this report, the search for the Σ + → pµ + µ − decay is presented as performed using pp collision data recorded by the LHCb experiment at centre-of-mass energies √ s = 7 and 8 TeV, corresponding to an integrated luminosity of 3 fb −1 . The measurement described here is detailed in Ref. [25].

Detector and data samples
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks, and is detailed elsewhere [26] [27]. The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by two software stages, the first performing a preliminary event reconstruction based on partial information while the second applying a full event reconstruction. Each of the three trigger stages is divided into several parallel trigger lines dedicated to different kinds of signals. Decay particles involved in this analysis are often not able to activate one or more trigger stages, owing mainly to their soft transverse momenta. Nevertheless, since Σ + baryons are copiously produced in the pp collisions recorded by LHCb, the present search can be performed on the events already recorded. In the offline processing, trigger decisions are associated with reconstructed candidates. A trigger decision on a particular line or on the full trigger can thus be ascribed to the reconstructed candidate, the rest of the event or both; events triggered as such are defined respectively as triggered on signal (TOS), independently of signal (TIS), or triggered on both. The estimation of the trigger efficiency when no specific path is selected is difficult, therefore a different strategy is adopted here: all the candidates passing the selection are used in the search for Σ + → pµ + µ − decays, while only the TIS candidates are used to convert the event yield into a branching fraction. This ensures a partial cancellation of the TIS trigger efficiency between signal and normalisation channels. Furthermore, control channels with large statistics can be exploited to estimate the trigger efficiency directly on data by measuring the overlap of events which are TIS and TOS simultaneously [28].

Analysis strategy and selection
After the trigger, a loose selection is applied based on geometric and kinematic variables. Afterwards candidates are selected by means of a multivariate selection based on a boosted decision trees algorithm (BDT) [29] [30]. The final search datasets are obtained by rejecting the background with a cut on the BDT output and on particle identification variables. The signal yield is obtained from a fit to the pµ + µ − invariant mass and is converted into a branching fraction by normalising to the Σ + → pπ 0 control channel. To avoid experimenter bias, candidates in the signal regions were not examined until the analysis procedure had been finalised.
The analysis is designed in order to search for possible peaks in the dimuon invariant mass, pointing towards unknown intermediate particles. The resolution on the dimuon invariant mass as a function of the mass itself is shown in Figure 1(a). The selection is devised such that no fake structures are induced in the dimuon invariant mass distribution of signal candidates. The signal efficiency varies as a function of the dimuon invariant mass and is shown in Figure 1(b). The efficiency is larger at lower dimuon mass owing to the larger recoil momentum against the proton, which gives larger minimum transverse momentum to the two muons, ensuring a better tracking efficiency.
Together with the signal Σ + → pµ + µ − decay, the following channels are also selected in data: the Σ + → pπ 0 decay as normalisation channel, the Σ + → pµ + µ + candidate decay as control channel for the combinatorial background, and the K + → π + π − π + decay as a control channel for different parts of the efficiency evaluation. Candidate Σ + → pµ + µ − decays are selected by combining two good quality oppositely charged tracks with muon identification with a third track with proton identification. The three tracks are required to form a good quality secondary vertex (SV), displaced from any pp interaction vertex (PV) requiring a measured Σ + lifetime greater than 6 ps. Only Σ + candidates with transverse momentum p T > 0.5 GeV/c, χ 2 IP < 36 and cosine of the angle between the flight direction and the reconstructed momentum larger than 0.9 are retained. Candidate Σ + → pµ + µ − decays are considered only if the invariant mass where m Σ + is the known mass of the Σ + particle [31]. A large background component is present in data due to Λ 0 → pπ − decays, where the pion is misidentified as a muon and the two tracks are combined with a third one. This is vetoed discarding candidates having a pµ − pair mass within 5 MeV/c 2 from the Λ 0 known mass when calculated with the pπ − mass hypothesis. Possible backgrounds from exclusive decays peaking in the pµ + µ − invariant mass have been examined, including K + → π + π − π + and K + → π + µ − µ + decays and various hyperon decays, and none has been found to contribute. In case of multiple candidates in a single event, all are retained in the selection. Multiple candidates are present in 5% of the events after the initial selection, while no multiple candidate is present in the final selection. The selection for the control channel Σ + → pµ + µ + is identical to that of the signal but considering same-sign dimuon pairs. Candidate Σ + → pπ 0 decays are selected by combining one good quality track with proton identification with a π 0 reconstructed in the π 0 → γγ mode from two clusters in the calorimeter. The selection of this decay is similar to that of the signal, but places a tighter requirement on the proton identification, and on the transverse momenta of the daughters in order to reduce the high combinatorial background. Finally, candidate K + → π + π − π + decays are selected from three good quality tracks, one of which is oppositely charged with respect to the other two; the selection is similar but tighter than that of the signal to cope with the high level of combinatorial background. See Ref. [25] for further details on the selection criteria of the different channels.
The sample of Σ + → pµ + µ − candidates in data after the selection is dominated by combinatorial background, part of which is due to misidentified particles. This is rejected by cuts on the BDT and on multivariate particle identification variables [27] on the muons and on the proton. The BDT variable combines geometric and kinematic variables chosen so that the dependence on the pµ + µ − invariant mass and on the dimuon invariant mass is linear and small to avoid biases. The BDT is optimised using simulated samples of Σ + → pµ + µ − events for the signal and candidates from the sidebands of the Σ + → pµ + µ + selection in data for the background. The final cut values were chosen in order to optimise the sensitivity to a signal evidence for the smallest possible branching fraction [32]. No BDT selection is applied to the normalisation and control channels.

Normalisation
The number of signal candidates in the TIS sample is converted into a branching fraction with the formula where ε, N and B are the efficiency, candidate yield and branching fraction of the corresponding channel, respectively, and α is the single event sensitivity. The ratio of signal and normalisation channel efficiencies, which includes the acceptance, the reconstruction efficiency of the final state particles and the selection efficiency, is computed with samples of simulated events corrected to take into account known differences between data and simulation. The reconstruction efficiency for the π 0 is calibrated using the ratio of B + → J/ψK * + (→ K + π 0 ) and B + → J/ψK + decays reconstructed in data. The particle identification efficiency of protons and muons is calibrated exploiting control channels in data. Residual differences between data and simulation are treated as sources of systematic uncertainty. The signal and normalisation channels are required to be TIS at all trigger levels. The trigger efficiency ratio is thus expected to be unity. However, small differences in the average kinematic of the rest of the event in the two samples are present which cause the ratio to be different. The ratio of trigger efficiencies is thus evaluated with data-driven techniques [28] exploiting the large sample of K + → π + π − π + decays. A systematic uncertainty is assigned for the applicability of this method to the relatively soft events of this analysis.
The invariant mass distribution of the K + → π + π − π + control channel candidates in data is shown in Figure 2(a). A binned maximum likelihood extended fit is performed to the invariant mass distribution. The signal is described as an Hypatia function [33] while the background is described by a second-order polynomial. A total of (966 ± 2) × 10 3 K + → π + π − π + candidates is measured.  Figure 2: Invariant mass distribution of (a) K + → π + π − π + candidates superimposed with the fit to data; (b) fit to the distribution of the corrected mass m corr Σ for Σ + → pπ 0 candidates superimposed with the fit to data.
The observed number of Σ + → pπ 0 candidates is (1 711 ± 9) × 10 3 as obtained from a binned maximum likelihood extended fit to the corrected invariant mass distribution. The corrected invariant mass is defined as m corr to correct for the π 0 mass reconstructed from the two photons. The Σ + → pπ 0 distribution is described as a Gaussian function with a power tail on the right side, while the background is described by a modified ARGUS function [34]. The invariant mass distribution is shown in Figure 2(b), superimposed with the fit. The single event sensitivity is α = (1.1 ± 0.6) × 10 −8 , where the uncertainty is dominated by the aforementioned systematic uncertainties, and corresponding to about 4 × 10 11 Σ + particles produced in the LHCb acceptance in the full dataset in TIS events. This corresponds to 4.6 ± 4.2 Σ + → pµ + µ − candidates assuming a branching fraction of (5 ± 4) × 10 −8 , to cover the SM predicted range.
The observed number of signal Σ + → pµ + µ − events is obtained with a fit to the pµ + µ − invariant mass distribution in the range 1149.6 < m pµ + µ − < 1409.6 MeV/c 2 . The signal distribution is described by an Hypatia function [33]. The mass resolution and scale are calibrated using the control channel K + → π + π − π + and comparing data and simulation distributions. No bias is seen in the peak position, while a 25% correction to the resolution has to be applied to match the one observed in data. A resolution of 4.28 ± 0.19 MeV/c 2 is used as width of the signal Σ + → pµ + µ − distribution. The resolution is allowed to vary in the fit but constrained to the central value with a Gaussian constraint. The combinatorial background is described as a modified ARGUS function, with all parameters left free with exception of the threshold which is fixed to the kinematic limit.
While the normalisation is available for the TIS sample, further studies are needed to calculate the trigger efficiency for the full sample. Since the TIS dataset is a sub-sample of the full sample, the single event sensitivity of the latter will be equal or better than the one of the former.

Results and discussion
The invariant mass distribution of the Σ + → pµ + µ − candidates in data is shown in Figure 3(a) and (b) for the full and TIS datasets, respectively. A significant signal is present in the full dataset. The significance of the signal is of 4.0 σ, obtained from the comparison of the likelihood value of the full fit with that of the background-only fit, and includes the relevant systematic uncertainties as gaussian constrains to the likelihood. A total of 12.9 +5.1 −4.2 signal candidates is observed. The signal in the TIS sample is found not to be significant. This is due to the observed signal candidates being needed for the related event to switch on at least one of the LHCb trigger stages. From the absence of a significant signal in the TIS dataset an upper limit on the Σ + → pµ + µ − branching fraction is derived, using the CLs method [35], of B(Σ + → pµ + µ − ) < 6.3 × 10 −8 at 95% confidence level (CL). Considering candidates in the full selection, a scan for possible signals in the dimuon invariant mass is performed, restricted to within two times the resolution in the Σ + → pµ + µ − invariant mass around the known Σ + mass. The scan is performed considering a single Gaussian function for a putative X 0 resonance (signal) and a linear distribution for the remaining candidates (background). In Figure 4 of the dimuon mass is shown. No significant signal is found. In Figure 4(b) the fit corresponding to a mass of 214.3 MeV/c 2 is shown. The fitted number of events for this hypothesis is 1.6 ± 1.9, corresponding to a fraction 0.078 ± 0.092 of the considered candidates. In summary, a search for the Σ + → pµ + µ − rare decay is performed by the LHCb experiment at centre-of-mass energies √ s = 7 and 8 TeV, corresponding to an integrated luminosity of 3 fb −1 . Evidence for the Σ + → pµ + µ − decay is found in the full dataset, albeit not found when requiring the events to be triggered independently of the signal decay. The observed signal candidates show a dimuon invariant mass distribution consistent with phase space; no significant peak consistent with an intermediate particle is found in the dimuon invariant mass distribution. The upper limit on the branching fraction of the Σ + → pµ + µ − decay is 6.3 × 10 −8 at 95% CL for a SM-like signal.