Identification of beauty and charm quark jets at LHCb

Identification of jets originating from beauty and charm quarks is important for measuring Standard Model processes and for searching for new physics. The performance of algorithms developed to select $b$- and $c$-quark jets is measured using data recorded by LHCb from proton-proton collisions at $\sqrt{s}=7$ TeV in 2011 and at $\sqrt{s}=8$ TeV in 2012. The efficiency for identifying a $b(c)$ jet is about 65%(25%) with a probability for misidentifying a light-parton jet of 0.3% for jets with transverse momentum $p_{\rm T}>20$ GeV and pseudorapidity $2.2<\eta<4.2$. The dependence of the performance on the $p_{\rm T}$ and $\eta$ of the jet is also measured.


Introduction
Identification of jets that originate from the hadronization of beauty (b) and charm (c) quarks is important for studying Standard Model (SM) processes and for searching for new physics.For example, the ability to efficiently identify b jets with minimal misidentification of c and light-parton jets is crucial for the measurement of top-quark production.The study of t t production in the forward region probes the structure of the proton [1] and can be used to search for physics beyond the SM [2].Measuring charge asymmetries in di-b-jet production also probes beyond the SM physics [3,4].Furthermore, identification of c jets is important for probing the structure of the proton, e.g. in W +c production.
The signature of a b or c jet is the presence of a long-lived b or c hadron that carries a sizable fraction of the jet energy.The LHCb detector was designed to identify b and c hadrons, and so is expected to perform well at identifying, or tagging, b and c jets.This paper describes two algorithms for identifying b and c jets, one designed to identify both b and c jets offline, and another initially designed to identify b-hadron decays in the trigger.The performance of each algorithm is measured using several subsamples of the 3 fb −1 of proton-proton collision data collected at √ s = 7 TeV in 2011 and at 8 TeV in 2012 by the LHCb detector.The distributions of observable quantities used to discriminate between b, c and light-parton jets are compared between data and simulation.

The LHCb detector
The LHCb detector [5,6] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks.The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region [7], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes [8] placed downstream of the magnet.The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV (c = 1 throughout this paper).The minimum distance of a track to a primary vertex, the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV.Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors.Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter.The electromagnetic and hadronic calorimeters have an energy resolution of σ(E)/E = 10%/ √ E ⊕ 1% and σ(E)/E = 69%/ √ E ⊕ 9% (with E in GeV), respectively.Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [9].
The trigger [10] consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.This analysis requires that either a high-p T muon or a (b, c)-hadron1 candidate satisfies the trigger requirements.Events recorded due to the presence of a high-p T muon are required to have a muon candidate with p T > 10 GeV.Events recorded due to the presence of a (b, c)-hadron decay require that at least one track should have p T > 1. 7 GeV and χ 2  IP with respect to any primary interaction greater than 16, where χ 2 IP is defined as the difference in χ 2 of a given primary pp interaction vertex (PV) reconstructed with and without the considered track.Decays of b hadrons are inclusively identified by requiring a two-, three-or four-track secondary vertex (SV) with a large sum of p T of the tracks and a significant displacement from the PV.A specialized boosted decision tree (BDT) [11] algorithm is used for the identification of SVs consistent with the decay of a b hadron [12].This inclusive trigger algorithm is called the topological trigger (TOPO) and is studied as a b-jet tagger in this paper.Decays of long-lived c hadrons are identified either exclusively using decay modes with large branching fractions, or in D * (2010) ± → D 0 π ± decays where the D 0 is selected inclusively by the presence of a two-track SV.
In the simulation, pp collisions are generated using Pythia [13] with a specific LHCb configuration [14].Decays of hadronic particles are described by EvtGen [15], in which final-state radiation is generated using Photos [16].The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [17] as described in Ref. [18].

Jet identification algorithms
Jets are clustered using the anti-k T algorithm [19] with a distance parameter 0.5, as implemented in Fastjet [20].Information from all the detector sub-systems is used to create charged and neutral particle inputs to the jet algorithm using a particle flow approach [21].During 2011 and 2012, LHCb collected data with a mean number of pp collisions per crossing of about 1.7.To reduce contamination from multiple pp interactions, charged particles reconstructed within the vertex detector may only be clustered into a jet if they are associated to the same PV.The identification of (b, c) jets is performed using SVs from the decays of (b, c) hadrons.The choice of using SVs and not single-track or other non-SV-based jet properties, e.g. the number of particles in the jet, is driven by the need for a small misidentification probability of light-parton jets in the analyses performed at LHCb.Furthermore, the properties of SVs from (b, c)-hadron decays are known to be well modeled in LHCb simulation.

The SV tagger
The tracks used as inputs to the SV-tagger algorithm are required to have p T > 0.5 GeV and χ 2 IP > 16.The χ 2 IP requirement is rarely satisfied by tracks reconstructed from particles originating directly from the PV.All tracks are assigned a pion mass, and hadronic particle identification is not used.In contrast to many other jet-tagging algorithms, tracks are not required to have ∆R ≡ ∆η 2 + ∆φ 2 < 0.5, where ∆η(∆φ) is the difference in pseudorapidity (azimuthal angle) between the track momentum and jet axis, since for low p T jets usage of tracks outside of the jet cone helps to discriminate between c and b jets.
All possible two-track SVs are built using pairs of the input tracks such that the distance of closest approach between the tracks is less than 0.2 mm, the vertex fit χ 2 < 10 and the two-body mass is in the range 0.4 GeV < M < M (B), where M (B) is the nominal B 0 mass [22].Since all particles are assigned a pion mass, the upper mass requirement rarely removes SVs from any long-lived b hadrons.The lower mass requirement removes SVs from most strange-particle decays, including the Λ baryon whose computed mass is always below 0.4 GeV when the proton is assigned a pion mass.At this stage tracks are allowed to belong to multiple SVs.Next, all two-track SVs with ∆R < 0.5 relative to the jet axis, where the direction of flight is taken as the PV to SV vector, are collected as candidates for a so-called linking procedure.This procedure involves merging SVs that share tracks until none of the remaining SVs with ∆R < 0.5 share tracks.The SV position is taken to be the weighted average of the 2-body SV positions using the inverse of the 2-body vertex χ 2 values as the weights.
The linking procedure can produce SVs that contain any number of tracks.The linked n-track SVs are required to have p T > 2 GeV, significant spatial separation from the PV, and to contain at most one track with ∆R > 0.5 relative to the jet axis.If the SV has only two tracks and a mass consistent with that of the K 0 S [22], the SV is rejected.Interactions with material, and strange-particle decays, are suppressed by requiring that the flight distance divided by the momentum of the SV is less than 1.5 mm/GeV; this quantity serves as a proxy for the hadron lifetime.The SV position is also required to be within a restricted region consistent with that of (b, c)-hadron decays.
An important quantity for discriminating between hadron types is the so-called corrected mass defined as where M and p are the invariant mass and momentum of the particles that form the SV and θ is the angle between the momentum and the direction of flight of the SV.The corrected mass is the minimum mass that the long-lived hadron can have that is consistent with the direction of flight.The linked n-track SVs are required to have M cor > 0.6 GeV to remove any remaining kaon or hyperon decays.A few percent of jets contain multiple SVs that pass all requirements; in such cases the SV with the highest p T is chosen.The fraction of multi-SV-tagged jets is consistent in data and simulation.Two BDTs are used to identify b and c jets: BDT(bc|udsg) trained to separate (b, c) jets from light-parton jets and BDT(b|c) trained to separate b jets from c jets.Both BDTs are trained on simulated samples of b, c and light-parton jets.The inputs to both BDTs are as follows: • the SV mass M ; • the SV corrected mass M cor ; • the transverse flight distance of the two-track SV closest to the PV; • the fraction of the jet p T carried by the SV, p T (SV)/p T (jet); • ∆R between the SV flight direction and the jet; • the number of tracks in the SV; • the number of SV tracks with ∆R < 0.5 relative to the jet axis; • the net charge of the tracks that form the SV; • the flight distance χ 2 ; • the sum of all SV track χ 2 IP .
For jets that contain an SV passing all of the requirements, the two BDT responses are used to identify the jet as either b, c or light-parton.

The topological trigger
The topological trigger algorithm uses SVs that satisfy similar criteria to those used in the SV-tagger algorithm to build two-, three-and four-track SVs.The TOPO SVs are required to have p T and flight-distance significance consistent with the decay of a b hadron.
The TOPO provides an efficient trigger option for generic b-jet events, as the SV used by the TOPO to trigger recording of the event can also be used to tag a b jet.The BDT used in the TOPO algorithm uses the following inputs: • the SV mass; • the SV corrected mass; • the sum of the p T of the SV tracks; • the maximum distance of closest approach between the SV tracks; • the χ 2 IP of the SV formed using the momentum of the tracks that form the SV and SV position; • the flight distance χ 2 of the SV from the PV; • the minimum p T of the SV tracks.
To ensure stability during data-taking the TOPO BDT uses discretized inputs as described in detail in Ref. [12].Further details about the TOPO algorithm and its performance on b-hadron decays as measured in LHCb data can be found in Ref. [10].

Performance in simulation
Figure 1 shows the SV-tagger BDT distributions obtained from simulated W+jet events for each jet type.The distributions in the two-dimensional BDT plane of SV-tagged b, c, and light-parton jets are clearly distinguishable.Figure 2 shows the performance of the SV-tagger algorithm obtained from simulated events for a BDT(bc|udsg) > 0.2 requirement that is about 90% efficient on SV-tagged (b, c) jets and highly suppresses light-parton jets.The (b, c)-jet efficiencies are nearly uniform for jet p T > 20 GeV and for 2.2 < η < 4.2, but are lower for low-p T jets and for jets near the edges of the detector.The misidentification probability of light-parton jets is less than 0.1% for low-p T jets and increases to about 1% at 100 GeV. Figure 3 shows the (b, c)-jet efficiencies versus the mistag probability of light-parton jets obtained by increasing the BDT(bc|udsg) cut.
Figure 2 shows the performance of the TOPO algorithm, obtained from simulated events.In the trigger, a looser BDT requirement is applied to SVs that contain muons.In Ref. [23], this same looser BDT requirement was applied to tag a second jet in the event.The nominal trigger BDT requirement strongly suppresses c and light-parton jets, with the misidentification probability of light-parton jets being 0.01% for low-p T jets.Such a strong suppression is required during online running due to output rate limitations.
The jet-tagging performance is measured in simulated events with one pp collision and two or more pp collisions and found to be consistent.The tagging performance is also studied in simulation using different event types, e.g.top-quark and QCD di-jet events, with only small changes in the tagging efficiencies and BDT templates observed for (b, c) jets.The mistag probability of light-parton jets is found to be higher for high-p T jets in events that also contain (b, c) jets.This is discussed in detail in Sec. 5.

Efficiency measurements in data
The tagging efficiencies for b and c jets are measured in data and compared with expectations from simulation.To measure the tagging efficiencies in a given data sample, both the    is the ratio of the tagged over total (b, c)-jet yields.
An alternative approach employed by other experiments (see, e.g.Ref. [24]) is to measure the efficiency using the subsample of jets that contain a muon.This approach has the advantage that the (b, c)-jet content is enhanced due to the presence of muons from the semileptonic decays of (b, c) hadrons; however, the disadvantage is that this method assumes that mismodeling of the tagging performance is the same for semileptonic and inclusive decays.Both the highest-p T track and muon-jet methods are used in this analysis to study the jet-tagging performance.
Combined fits of several data samples enriched in (b, c) jets are performed to obtain the tagging efficiencies.It is important to include the systematic uncertainties on both the tagged and total (b, c)-jet yields for each data sample in the combined fits.
This section is arranged as follows: the data samples used are described in Sec.4.1; the BDT fits used to obtain the tagged (b, c)-jet yields are given in Sec.4.2; the highest-p T -track χ 2  IP fits used to obtain the total (b, c)-jet yields are described in Sec.4.3; the muon-jet subsample method is discussed in Sec.4.4; the systematic uncertainties on the tagged and total (b, c)-jet yields are presented in Sec.4.5; and the (b, c)-tagging efficiency results are given in Sec.4.6.

Data samples
Events that contain either a high-p T muon or a fully reconstructed (b, c) hadron, referred to here as an event-tag, are used to measure the jet-tagging efficiencies in data.The highest-p T jet in the event that does not have any overlap with the event-tag is chosen as the test jet.Each event-tag is required to have satisfied specific trigger requirements and to have ∆φ > 2.5 relative to the test-jet axis to reduce the possibility of contamination of the jet from the event-tag2 .Therefore, all events used to measure the (b, c)-tagging efficiency have passed the trigger independently of the presence of the test jet, which ensures that the trigger does not bias the efficiency measurement.The following event-tags are used (labeled by the data-set identifier): • (W+jet) a prompt isolated high-p T muon indicative of W+jet events that consists of about 95% light-parton jets.
The first three samples are used to measure the (b, c)-jet identification efficiencies and properties.The final sample is used to study misidentification of light-parton jets.In all samples the event-tag and test jet are required to originate from the same PV.The range 10 < p T (jet) < 100 GeV is considered since there are no large enough data samples to measure the efficiency for jet p T > 100 GeV.

Tagged-jet yields
The presence of an SV and its kinematic properties are used to discriminate between b, c and light-parton jets.As described in Section 3, the SV-tagger algorithm uses two BDTs while the TOPO uses one BDT for each SV.The tagged yields for each algorithm are obtained by fitting to data BDT templates obtained from simulation for b, c and light-parton jets.In all fits the template shapes are fixed and only the yields of each jet type are free to vary.Figures 4-6 show the results of fits performed to the two-dimensional SV-tagger BDT distributions in the B+jet, D+jet and µ(b, c)+jet data samples.The b and c jets are clearly distinguishable in the two-dimensional BDT distributions: b jets are mostly found in the upper right corner, while c jets are found in the center-right and lower-right regions.The light-parton jets cluster near the origin but are difficult to see due to the low SV-tag probability of light-parton jets.The BDT templates for b, c and light-parton jets describe the data well.A dedicated study of the modeling of the light-parton-jet BDT distributions is discussed in Sec. 5.
A simple cross-check on the b, c and light-parton yields is performed by fitting only two of the BDT inputs: the corrected mass defined in Eq. 1 and the number of tracks in the SV.The corrected mass provides the best discrimination between c jets and other jet types due to the fact that M cor peaks near the D meson mass for c jets3 .The number of tracks in the SV identifies b jets well since b-hadron decays often produce many displaced tracks.Figure 7 shows the results of a two-dimensional fit to these two SV properties.
The absolute fractions of b, c and light-parton SV tags agree with the BDT fit results to within 1-2%.The corrected mass has been previously used in LHCb jet analyses for determining the c-jet yield [23] and for extracting the b-jet yield [25].The clear peaking structure for c jets, which relies on the excellent vertex resolution of the LHCb detector, makes them easily identifiable.Figure 8 shows the results of fitting the TOPO BDT distributions in the various data samples using b, c and light-parton jet template shapes obtained from simulation.The ratios of SV-tagger to TOPO SV-tagged b, c and light-parton jets are each consistent with expectations from simulation.Modeling of both the SV-tagger and TOPO SV properties are sufficient to allow the SV-tagged content to be accurately determined.

Efficiency measurement using highest-p T tracks
To determine the jet-tagging efficiency, the jet composition prior to applying the SV tag must be determined.This is necessarily more difficult than determining the SV-tagged composition.The χ 2 IP distribution of the highest-p T track in the jet is used for this task.For light-parton jets the highest-p T track will mostly originate from the PV, while for (b, c) jets the highest-p T track will often originate from the decay of the (b, c) hadron.To avoid possible issues with modeling of soft radiation, only the subset of jets for which the highest-p T track satisfies p T (track)/p T (jet) > 10%, which is about 95% of all jets, is used.
Since the W+jet sample is dominantly composed of light-parton jets, the χ 2 IP detector response can be obtained in a data-driven way using this data sample.First, the twodimensional SV-tagger BDT response is fitted to determine the SV-tagged b, c and light-parton jet yields.The tagging efficiencies obtained in simulation for b and c jets are used to estimate the total number of b and c jets in the W+jet data sample.Since the b and c jets combined make up only 5% of the total data sample, any mismodeling of the SV-tagging efficiency will have negligible impact on this study.The IP resolution is studied by comparing the observed χ 2 IP distributions in data with templates obtained from simulation in bins of jet p T .The resolution in data is found to be about 10% worse than in the simulation which is consistent with previous LHCb studies of the IP resolution [7]. Figure 9 shows that the data-driven templates describe the data well.The difference in the detector response between data and simulation is assumed to be universal and is applied to correct the χ 2 IP templates for b and c jets. Figure 10 shows the results of fitting the χ 2 IP distributions in the B+jet, D+jet and µ(b, c)+jet data samples.Each sample consists of mostly light-parton jets prior to applying an SV tag.While these data samples require that an event-tag containing a (b, c) quark is reconstructed, the associated (b, c) quarks produced in hard scattering processes are often not produced within the LHCb acceptance.Furthermore, for the (B, D)+jet samples, the event-tags often have low p T so that the associated (b, c) quarks may be within the LHCb acceptance but do not form a high-p T jet.The light-parton-jet χ 2 IP template has a long tail out to large values which arises due to hyperon and kaon decays.In the χ 2 IP fits, the log χ 2 IP > 3 component of the light-parton template is allowed to vary independently to allow for different s-quark content from the W+jet calibration sample.Apart from this, all χ 2 IP templates are fixed in shape.The efficiency for tagging a jet originated by a quark of type q is determined as i.e. it is the ratio of the yield determined from fits to the SV-tagged BDT distributions, either for the SV-tagger or TOPO algorithm, to the yield obtained from fits to the χ 2 IP distributions.

Efficiency measurement using muon jets
The approach described in the previous subsection has the advantage that it involves measuring the efficiency on almost all of the jets in the data sample; however, its disadvantage is the large light-parton-jet content, which results in 10-20% uncertainties on the pre-SV-tag jet content.An approach used by other experiments is to measure the efficiency on the subset of jets that contain muons.The tagging efficiencies are also obtained using Eq. 2 for the muon-jet subsamples.Figures 11-13 show the SV-tagger BDT and χ 2 IP fit results for the muon-jet subsample of each data set.In these subsamples the χ 2 IP is that of the highest-p T muon in the jet.The muon is required to satisfy p T (µ)/p T (jet) > 10%.The initial light-parton-jet content is greatly reduced in these data subsamples; however, this approach only uses about 10% of the jets and it is possible that mismodeling of the jet-tagging performance in semileptonic decays is not the same as for other decays.

Systematic uncertainties
The systematic uncertainty on N (b,c) (SV) is estimated using the difference between the (b, c) SV-tagged yields obtained from two different fits: the fit to the BDT distributions and the fit to the M cor versus track multiplicity distributions.The latter approach removes jet quantities such as jet p T from the yield determination.While the absolute uncertainty on the SV-tagged quark content as determined by the difference in these two methods is only a few percent, the relative uncertainty is large for cases where a given jet type makes up a small fraction of the SV-tagged data sample.For example, the relative uncertainty on the c-jet yield in the B+jet data sample is large.As a further cross-check the (B, D)+jet data samples are used to obtain data-driven BDT templates.The difference in (b, c) yields obtained by fitting the W+jet data sample using the data-driven and simulation templates is found to be negligible.
The systematic uncertainty on N (b,c) (χ 2 IP ) has several components.The nominal χ 2 IP fits allow the large-IP component of the light-parton-jet template to vary.The χ 2 IP fits are repeated fixing this component to that observed in W+jet data, with the difference in (b, c)-jet yields assigned as a systematic uncertainty.This uncertainty is sizable for the case of high-p T c jets whose χ 2 IP template is less distinct from that of light-parton jets which has a variable large-IP component in the fit.Possible dependence of the mismodeling of the IP resolution on the origin point of the particle is studied and found to be negligible.For the case of muon jets, the misidentification probability of hadrons as muons and the jet track multiplicity must be modeled properly to obtain an accurate χ 2 IP distribution.Mismodeling of these properties does not lead to large uncertainty on N b (χ 2 IP ), since the vast majority of reconstructed muons in b jets are truly muons that arise due to semileptonic decays.For c jets, however, mismodeling of these properties can produce sizable shifts in N c (χ 2 IP ) due to the smaller fraction of c jets that contain muons from semileptonic decays.A comparison between W+jet data and simulation of the jet fraction that satisfies the muon-jet requirements, in bins of jet p T , is used to obtain an estimate of the probability of misidentifying a jet as a muon jet.Based on this study a 5% relative uncertainty is assigned to N b (χ 2 IP ) and 20% to N c (χ 2 IP ) for muon jets.Another possible way of misidentifying muon jets is if the semileptonic decay of a b hadron outside of the jet produces a muon reconstructed as part of the jet 4 .The ∆R distribution between the SV direction of flight and jet axis for all muons found in an SV is used to conclude that this effect is at the per mille level; it is taken to be negligible.
Jets produced in different types of events can have different properties.The b-tag efficiency is found to agree to about 1% in simulated W +b, top and QCD multi-jet events.The BDT shapes are studied in simulated single-jet b and di-jet b b events and found to be consistent for low-p T jets but to show small discrepancies for large jet p T .For example, the absolute difference in efficiency of requiring BDT(bc|udsg) > 0.2 for b jets is less than 1% up to a jet p T of 50 GeV but reaches about 3% at a jet p T of 100 GeV.In the data samples considered in this study, such effects are negligible as using BDT templates from different event types results in differences in the SV-tagged yields of less than 1%.Events where multiple b hadrons are produced could affect the SV BDT shapes.The fraction of SVs that contain a track with ∆R > 0.5 relative to the jet axis is studied in data with the back-to-back requirement for the event-tag and test jet removed.The fraction of SVs that contain such a track is found to vary by at most a few percent as a function of ∆R between the event-tag and test jet.This could indicate percent-level cross-talk between multiple b jets or could be due to changes in the jet composition.For the efficiency measurements presented in this paper the effect of (b, c)-hadron decays outside of the jet is negligible; however, such decays could have an important impact on the tagging performance in some event types, e.g. in four b-jet events.
Gluon splitting to b b or cc can produce jets that contain multiple (b, c) hadrons which have a higher tagging efficiency.The requirement that a (b, c)-hadron-decay signature is back-to-back with the test jet suppresses gluon-splitting contributions.The fraction of jets that contain multiple SVs in data is a few percent, which agrees to about 1% in all bins with simulated jets that contain only a single (b, c) hadron.The systematic uncertainty due to jets that contain multiple (b, c) hadrons from g → (b b, cc) is taken to be 1%.Finally, there is no evidence in simulation of dependence on the number of pp interactions in the event, so the uncertainty due to mismodeling of the number of pp interactions is taken to be negligible.The systematic uncertainties are summarized in Table 1.

Results
A combined fit to the B+jet, D+jet and µ(b, c)+jet data samples, including the systematic uncertainties in Table 1, is performed to obtain the (b, c)-jet tagging efficiencies.In these fits, both N (b,c) (SV) and N (b,c) (χ 2 IP ) are determined simultaneously under the constraint that the (b, c)-tagging efficiency in a given jet p T and η region must be the same in each data sample.The highest-p T track and muon-jet subsamples are fitted independently since the scale factors between data and simulation could be different for semileptonic and inclusive decays.The scale factors for b and c jets are allowed to vary independently since these may be different for different jet types.The misidentification probability of light-parton jets is allowed to vary freely in each data sample, although the results obtained are all consistent and agree with simulation.
The scale factors for the SV-tagger algorithm are measured versus jet p T in the region 2.2 < η < 4.2, where the efficiencies are expected to be nearly uniform versus η, and in the region 2 < η < 2.2 for jet p T > 20 GeV, where the efficiencies are nearly uniform versus jet p T (there are not sufficient statistics to measure the efficiencies in the η > 4.2 region).The results versus jet p T are shown in Fig. 14 and are summarized as follows: • The scale factors obtained from the highest-p T track approach are all consistent with unity at the ±20% level.They show no trend in p T for b or c jets.
• The scale factors for muon jets are found to be consistent, albeit with large uncertainties, with those obtained using the highest-p T track approach.The results are combined assuming that the scale factors are the same for semileptonic and inclusive (b, c)-hadron decays (see Fig. 14) and are summarized in Table 2.The scale factors are consistent with unity for jet p T > 20 GeV, but 10-20% below unity for low-p T jets.
• The scale-factor results obtained from the global fits are strongly anti-correlated between b and c jets.It is likely that the true scale factors are similar between b and c jets since many of the contributing factors, e.g.mismodeling of the SV position resolution, are expected to affect b and c jets in a similar manner.The highest-p T track fits are repeated assuming that the scale factors are the same for b and c jets (see Fig. 14) and summarized in Table 2.The results for jet p T > 20 GeV are consistent with unity at about the 5% level, while at low jet p T the scale factor is again less than unity by about 10%.The muon jet results are not combined for b and c jets since the b-jet results are much more precise.
Neither of the assumptions made in the combinations has to be completely valid; however, they should each be a good approximation.Overall, the efficiencies measured in data are consistent with those in simulation for jet p T > 20 GeV with a conservative systematic uncertainty estimate of 10%.At low jet p T the scale factors are about 0.9 for b jets and 0.8 for c jets.Using the difference in central values obtained from the highest-p T track, combined highest-p T track and muon jet, and combined b and c jet results, produces a conservative systematic uncertainty estimate of 10%.The absolute efficiencies measured assuming the scale factors are the same for b and c jets are given in Table 2.For jet p T > 20 GeV and 2.2 < η < 4.2, the mean SV-tagging efficiency is about 65% for b jets and 25% for c jets.Finally, the TOPO algorithm efficiencies are measured in data and found to be consistent with simulation to about 5% for b jets and 20% for c jets (see Fig. 15).The absolute efficiencies measured using the TOPO for b jets are: 21 ± 1% for 10-20 GeV; 44 ± 4% for 20-30 GeV; 60 ± 5% for 30-50 GeV; and 66 ± 6% for 50-100 GeV.

Light-parton jet misidentification
Light-parton jets contain SVs due to any of the following: (1) misreconstruction of prompt particles as displaced tracks; (2) decays of long-lived strange particles; or (3) interactions with material.Type (1) can be studied in data using jets that contain an SV whose inverted direction of flight lies in the jet cone (referred to as a backward SV).Types (2) and (3) can be studied using SVs for which the ratio of the SV flight distance divided by the SV momentum is too large for the decay of a (b, c) hadron (referred to as a too-long-lived SV).The mistag probability for simulated light-parton jets using backward and too-long-lived SVs is consistent with the nominal mistag probability at the 20% level (the nominal mistag probability is shown in Fig. 2).Furthermore, the SV BDT distributions obtained    using backward and too-long-lived SVs are similar to the nominal light-parton-jet BDT distributions.Therefore, the mistag probability of light-parton jets and SV properties can be studied in data using backward and too-long-lived SV-tagged jets.
Such a study is complicated by the fact that prompt tracks in (b, c) jets can also be misreconstructed as displaced, and that (b, c) jets also produce strange particles and material interactions.Therefore, both backward and too-long-lived SVs are also found in (b, c) jets.The W+jet data sample, which is dominantly composed of light-parton jets, is used to mitigate effects from mistagged (b, c) jets.Figure 16 shows the BDT distributions from backward and too-long-lived SVs observed in data compared to simulation.The backward and too-long-lived BDT templates are similar for all jet types.The (b, c) yields here are fixed by fitting the nominal SV-tagged data to obtain the total (b, c)-jet content then taking the backward and too-long-lived SV-tag probabilities for (b, c) jets from simulation.The distributions in data and simulation are consistent, which demonstrates that the SV properties are well-modeled for light-parton jets.The total light-parton-jet composition of this sample, without applying any SV-tagging algorithm, is found to be 95%, by fitting the nominal SV-tagged BDT distributions and applying the data-driven (b, c)-tagging efficiencies from the previous section.The mistag probability of light-parton jets is obtained as the ratio of the number of SV-tags for those jets (obtained by fitting the SV BDT distributions) to the total number of light-parton jets.The ratio of this probability in data to that in simulation is shown in Fig. 17; data and simulation agree at about the ±30% level integrated over jet p T .A detailed study of W+jet production in LHCb using the SV-tagger algorithm introduced in this paper, in which the jets are required to satisfy p T > 20 GeV and 2.2 < η < 4.2, finds that the nominal light-parton-jet mistag probability is 0.3% which is consistent with simulation [26].The same ratio for the TOPO algorithm is also shown in Fig.The performance of any tagging algorithm on light-parton jets can be affected by the presence of (b, c) jets in the event.The misidentification probability of light-parton jets is studied in simulated di-b-jet events and compared to the performance obtained in simulated events that contain no (b, c) jets.The absolute difference in the fraction of light-parton jets that are SV-tagged and have BDT(bc|udsg) > 0.2 is found to be at the per mille level for low-p T jets, but increases to about 1% for jet p T of 50 GeV and to about 2-3% at 100 GeV.The BDT shapes are distorted relative to those obtained in events that contain no (b, c) jets, but there is still significant discrimination between the light-parton and (b, c) distributions.The difference is largely due to particles originating from a b-hadron decay and produced with ∆R < 0.5 relative to the light-parton-jet axis.These tracks may then form SVs with misreconstructed prompt tracks in the light-parton jets.

Summary
The LHCb collaboration has developed several algorithms that efficiently identify jets that arise from the hadronization of b and c quarks.The performance of these algorithms has been studied in data and is found to agree with that in simulation at about the 10% level for (b, c) jets, and at the 30% level for light-parton jets.The SV properties of all jet types are found to be well modeled by LHCb simulation.The efficiency for identifying a b(c) jet is about 65%(25%) with a probability of misidentifying a light-parton jet of 0.3% for jets with transverse momentum p T > 20 GeV and pseudorapidity 2.2 < η < 4.2.

Figure 2 :Figure 3 :
Figure 2: Efficiencies and mistag probabilities obtained from simulation for the SV-tagger and TOPO algorithms for (top) b, (middle) c and (bottom) light-parton jets.The left plots show the dependence on p T for 2.2 < η < 4.2, while the right plots show the dependence on η for p T > 20 GeV (see text for details).The "loose" label for the TOPO refers to the BDT requirement used in the trigger for SVs that contain muon candidates.
a fully reconstructed b-hadron decay which enriches the b-jet content of the test-jet sample; • (D+jet) a fully reconstructed c-hadron decay which enriches the c-jet and b-jet content of the test-jet sample (due to b → c decays); • (µ(b, c)+jet) a displaced high-p T muon which enriches the c-jet and b-jet content of the test-jet sample;

Figure 4 :
Figure 4: SV-tagger BDT fit results for the B+jet data sample with 10 < p T (jet) < 100 GeV: (top left) distribution in data; (top right) two-dimensional template-fit result; and (bottom) projections of the fit result with the b, c, and light-parton contributions shown as stacked histograms.

Figure 9 :
Figure 9: Results of χ 2 IP calibration using W+jet data for 10 < p T (jet) < 100 GeV.The tail out to large χ 2IP values in the light-parton-jet sample is largely due to strange particle decays.

Figure 12 :Figure 13 :
Figure 12: Same as Fig. 11 but for the D+muon-jet data sample.

Figure 14 :
Figure 14: Efficiencies of the SV-tagger algorithm measured in data relative to those obtained from simulation for 2.2 < η < 4.2: (top left) results from the (closed markers) highest-p T track and (open markers) muon-jet samples; (top right) the combined results assuming the scale factors are the same for semileptonic and inclusive (b, c)-hadron decays; and (bottom left) the combined results for (b, c)-jet using the highest-p T -track approach assuming the scale factors are the same for b and c jets.The absolute efficiencies corresponding to the combined (b, c)-jet results (bottom right).

Table 2 :
SV-tagger algorithm (b, c)-tagging efficiencies measured in data compared to those obtained in simulation.The b and c results are obtained by combining the highest-p T track and muon-jet results under the assumption that the scale factors are the same for semileptonic and inclusive (b, c)-hadron decays.The (b, c) results are obtained by fitting the highest-p T -track sample under the assumption that the scale factors are the same for b and c jets.The absolute efficiencies observed in data are provided using the "(b, c) jets" results.

Figure 15 :
Figure15: TOPO algorithm (b, c)-tagging efficiencies, using the "loose" BDT requirement, in data relative to those obtained in simulation.

Figure 16 :
Figure 16: SV-tagger algorithm BDT distributions for backward and too-long-lived SVs in the W+jet data sample: (top left) distribution in data; (top right) two-dimensional template-fit result; and (bottom) projections of the fit result with the b, c, and light-parton contributions shown as stacked histograms.

Figure 17 :
Figure 17: Ratio of light-parton-jet mistag probabilities observed in data to those in simulation for the (left) SV-tagger and (right) TOPO algorithms.

Table 1 :
Summary of relative systematic uncertainties (− denotes negligible).Systematic uncertainties that dependent on jet type and p T are marked by a * (see text for details).