A new calibration method for charm jet identification validated with proton-proton collision events at $\sqrt{s}$ = 13 TeV

Many measurements at the LHC require efficient identification of heavy-flavour jets, i.e. jets originating from bottom (b) or charm (c) quarks. An overview of the algorithms used to identify c jets is described and a novel method to calibrate them is presented. This new method adjusts the entire distributions of the outputs obtained when the algorithms are applied to jets of different flavours. It is based on an iterative approach exploiting three distinct control regions that are enriched with either b jets, c jets, or light-flavour and gluon jets. Results are presented in the form of correction factors evaluated using proton-proton collision data with an integrated luminosity of 41.5 fb$^{-1}$ at $\sqrt{s}$ = 13 TeV, collected by the CMS experiment in 2017. The closure of the method is tested by applying the measured correction factors on simulated data sets and checking the agreement between the adjusted simulation and collision data. Furthermore, a validation is performed by testing the method on pseudodata, which emulate different miscalibration conditions. The calibrated results enable the use of the full distributions of heavy-flavour identification algorithm outputs, e.g. as inputs to machine-learning models. Thus, they are expected to increase the sensitivity of future physics analyses.


Introduction
Quarks and gluons are abundantly produced in proton-proton (pp) collisions at the CERN LHC and an energetic interaction typically yields collimated streams of particles in the detector, referred to as a jet. The particles in a jet arise from the showering, fragmentation, and hadronisation of an initially coloured particle and constitute a colour-neutral final state. Several measurements of standard model processes, as well as searches for beyond the standard model physics, rely on the identification of the flavour of the coloured particle that initiates a jet. The jets originating from bottom (b) and charm (c) quarks are identified using heavy-flavour tagging, which distinguishes them from jets initiated by light flavour quarks (up, down, strange) or gluons [1,2]. These algorithms exploit the hard fragmentation, long lifetimes, and relatively high masses of b and c hadrons to identify heavy-flavour jets. Although b jet identification algorithms have been deployed for several decades, the more challenging task of identifying c jets at CMS was addressed for the first time during the preparation for the data taking with the first 13 TeV centre-of-mass energy pp collisions [2,3].
The heavy-flavour tagging identification process calculates the probability (discriminator) that a jet is initiated by a specific quark flavour. Such tagging typically uses machine-learning classification algorithms to exploit the vast information available for a jet, such as the properties of the individual constituents as well as the jet as a whole. Currently the CMS Collaboration uses two such algorithms, called DeepCSV [2] and DeepJet [4,5], which are based on the use of deep neural networks (DNNs) [6]; these algorithms differ in the amount of input information, as well as the internal structure of the network, as explained in Section 5. Simulated pp collisions are used to train these algorithms, and care must be taken to avoid spurious effects resulting from mismodelling along the simulation chain. Discrepancies between simulation and data can arise from a variety of sources ranging from the imperfect simulation of the detector to the matrix element calculation, which has limitations in the modelling of the parton shower, hadronisation and fragmentation processes. When such algorithms are applied in physics analyses, a calibration of the heavy-flavour tagging output probabilities in simulated events is needed to match those in actual collision data.
Traditional approaches of flavour tagging usually involve labelling each jet in an event as "tagged" or "untagged" depending on whether the discriminator output for the jet is higher or lower than a fixed threshold (working point), thereby making it possible to count the number of b-or c-tagged jets in each event. To ensure agreement of this number between simulation and data, on average, the efficiency of selecting (rejecting) each tagged (untagged) jet in a simulated event is adjusted by a scale factor (SF), that quantifies the difference in selection (rejection) efficiencies of various flavours of jets between simulation and data for the working point being used. Such SFs are measured for each flavour of jet for various working points using different selections of jets in data. SFs for b jets are derived from top quark pair (tt) production and/or from quantum chromodynamics (QCD) multijet events where a jet contains a low-energy muon. A selection of events in which a W boson is produced in association with a charm jet (W+c) is used to derive SFs for c jets. Finally, light-flavour and gluon jet SFs are derived using QCD multijet events [2].
However, additional information can also be gained from the full output distribution (discriminator distribution shapes) of the heavy-flavour tagging discriminators, e.g. by using the discriminator output for each jet as an input to a machine-learning classifier or by performing a fit of the discriminator distributions to data. This gives rise to the need for a second type of calibration that maps the full simulated distribution to the one observed in data. Such a shape calibration technique has already been developed for the identification of b jets, and has been successfully applied in the observation of the Higgs boson H → bb decay mode [7]. This paper presents for the first time a calibration method to correct the differential c tagging discriminator distribution shapes. This novel technique is based on an iterative fit procedure that exploits three control regions that are enriched in c, b, and light-flavour or gluon jets. The first successful attempt to perform such a charm tagging shape calibration was reported in a study of tt production with additional c jets [8]. The present paper describes a more advanced strategy that uses higher-purity control regions and an optimised granularity in the discriminator binning to derive shape calibration SFs. A similar technique has been demonstrated to work in a search for associated production of a Higgs boson with a vector boson, where the Higgs boson decays into a pair of charm quarks [9].
In Sections 2 to 4, the CMS detector, details of the data and simulated events, and the object reconstruction are discussed. The c tagging algorithms are introduced in Section 5, followed by a description of the event selection used to derive three control regions in Section 6. The iterative fit strategy is outlined in Section 7, and the relevant sources of systematic uncertainties are discussed in Section 8. The results are discussed in Section 9, the validation tests are documented in Section 10, and the paper is summarised in Section 11.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid with an internal diameter of 6 m, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. The silicon tracker consists of 1440 silicon pixel and 15 148 silicon strip detector modules and its new inner part was installed in 2017 as a part of the Phase-1 pixel detector upgrade. The upgraded high-efficiency and low-mass detector with four barrel layers provides four-hit pixel coverage to efficiently detect charged particles within the pseudorapidity range |η| < 2.5 [10]. Transverse impact parameter resolution of each track ranges from 20 to 75 µm depending on the transverse momentum (p T ) and η of the track [11]. Forward calorimeters, made of steel and quartz fibres, extend the η coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionisation chambers embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [12].
Events of interest are selected using a two-tiered trigger system [13]. The first level [14], composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of less than 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimised for fast processing, and reduces the event rate to around 1 kHz before data storage.

Data and simulated samples
The data used to derive the calibration of the c tagger discriminator distributions were collected from pp collisions recorded at a centre-of-mass energy of 13 TeV with the CMS detector in 2017, and correspond to an integrated luminosity of 41.5 fb −1 [15]. The 2017 data set is the first data collected after the Phase-1 upgrade of the CMS pixel tracker [10] and is expected to represent the current heavy-flavour tagging performance of CMS. The collision events are selected using a set of single-lepton and dilepton triggers. The single-electron (single-muon) trigger requires the presence of at least one isolated electron (muon) with a p T above 32 (27) GeV. The dielectron (dimuon) trigger requires at least two isolated electrons (muons), one with p T > 23 (17) GeV and another with p T > 12 (8) GeV. Finally, the electron-muon trigger requires the presence of at least one electron and at least one muon, where the lepton with the largest p T is required to have p T > 23 GeV and the one with the second largest p T to have p T > 12 (8) GeV if it is an electron (muon). These trigger requirements are also imposed on the simulated collision events.
The simulated events are produced using Monte Carlo (MC) generators, which provide a fixedorder perturbative QCD calculation of up to four noncollinear high-p T hard partons, supplemented with parton showering (PS) and typical underlying event particles. For all the simulated events, the PS is simulated using PYTHIA v8.230 [16], using the CP5 underlying event tune [17] with the NNPDF 3.1 [18] parton distribution function set.
The matrix element (ME) generation of the tt simulation is performed with POWHEG (v2) [19][20][21][22] at next-to-leading order (NLO) accuracy in QCD, and its cross section is scaled to a theoretical prediction at next-to-NLO (NNLO) in QCD, including resummation of next-to-nextto-leading logarithmic soft-gluon terms. The ME generation of the W + jets and Drell-Yan (DY) processes is performed with MADGRAPH5 aMC@NLO 2.4.2 [23] at leading order (LO) precision, with MLM jet matching [24] and the cross sections normalised to NNLO calculations [25]. The single top quark production in the s-channel was also simulated using MAD-GRAPH5 aMC@NLO in the four-flavour scheme, whereas the single top quark production in the t-channel was simulated using POWHEG also in the four-flavour scheme. The tW channel was simulated using POWHEG in the five-flavour scheme [26,27], with its cross section normalised to the NLO calculations [28]. The diboson samples are simulated at LO using PYTHIA for the ME generation. Their cross sections are normalised to those calculated at NLO for WZ and ZZ [29], and NNLO for WW [30].
The interactions between particles and the material of the CMS detector are simulated using GEANT4 [31]. The effect of additional pp interactions within the same or nearby bunch crossings (pileup) on top of the hard scattering processes is modelled by additional minimum bias collisions generated with PYTHIA.

Object reconstruction
The particle-flow (PF) event algorithm [32] reconstructs and identifies each particle (physicsobject) in an event with an optimised combination of all subdetector information. In this process, the identification of the particle type (photon, electron, muon, charged and neutral hadrons) plays an important role in the determination of the particle direction and energy. Photons (e.g. coming from π 0 decays) are identified as ECAL energy clusters not linked to the extrapolation of any charged-particle trajectory to the ECAL. Electrons are identified as a primary charged-particle track and potentially many ECAL energy clusters corresponding to this track and to possible bremsstrahlung photons. Muons are identified as tracks in the central tracker consistent with either a track or several hits in the muon system, and associated with calorimeter deposits compatible with the muon hypothesis. Charged hadrons are identified as charged-particle tracks identified neither as electrons, nor as muons. Finally, neutral hadrons are identified as HCAL energy clusters not linked to any charged-hadron trajectory, or as a combined ECAL and HCAL energy excess with respect to the expected charged-hadron energy deposit.
The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the track momentum at the main interaction vertex, the corresponding ECAL cluster energy, and the energy sum of all bremsstrahlung photons attached to the track. The energy of muons is obtained from the corresponding track curvature. The energy of charged hadrons is determined from a combination of the track curvature and the corresponding ECAL and HCAL energies, corrected by the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
The missing transverse momentum vector, p miss T , is defined as the negative vector p T sum of all the PF candidates in an event, and its magnitude is denoted as p miss T [33]. The p miss T is modified to correct the energy scale of the reconstructed jets in the event.
The candidate vertex with the largest value of summed physics-object p 2 T is the primary vertex (PV) of the pp interaction. These physics objects are the jets, the leptons, and the p miss T . The jets are reconstructed by the jet-finding algorithm [34,35] with the tracks assigned to candidate vertices as inputs.
To identify and reconstruct the prompt leptonic decay modes of vector bosons, electrons (muons) are used, and are required to be well isolated from jet activity within a cone of radius ∆R = (∆η) 2 + (∆φ) 2 = 0.3 (0.4), where φ is the azimuthal angle in radians. The relative isolation is defined as the scalar p T sum of the PF candidates within the cone divided by the lepton p T . Additionally, a set of quality requirements are imposed on these leptons based on the quality of the track reconstruction, hit multiplicities in the tracking and muon subdetector layers, and the displacement of these particles with respect to the PV. Electrons and muons that arise from (semi)leptonic decay modes of hadrons typically have a lower momentum and are surrounded by hadronic activity of the underlying jet in which these hadrons are created. The branching fraction of the (semi)leptonic decay modes of heavy hadrons gives rise to the presence of an electron or a muon inside 20 (10)% of b (c) jets [2]. A set of relaxed kinematic and inverted isolation criteria is imposed to identify such low-energy (soft) leptons inside jets.
For each event, jets are clustered from these reconstructed particles using the infrared-and collinear-safe anti-k T algorithm [34,35] with a distance parameter of 0.4. Jet momentum is determined as the vector sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5-10% of the true momentum over the entire p T spectrum and detector acceptance. To mitigate the effect of pileup contributing additional tracks and calorimetric energy depositions to the jet momentum, the charged particles identified as originating from pileup vertices are discarded, and an offset correction is applied to correct for remaining contributions. Jet energy corrections are derived from simulations to bring the measured average response of jets to that of generator-level jets. In situ measurements of the momentum balance in dijet, γ + jet, Z + jet, and multijet events are used to correct any residual differences in jet energy scale in data and simulation [36]. The jet energy resolution amounts typically to 15-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [36]. Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures. Jets overlapping with isolated electrons or muons within ∆R < 0.4 are disregarded to remove isolated charged leptons reconstructed as jets. Since heavy-flavour tagging algorithms strongly rely on the track reconstruction, only jets within the tracker acceptance (|η| < 2.5) are used throughout this work. Additionally, a lower threshold on the jet p T at 20 GeV is imposed, and jets are required to pass tight identification as well as tight pileup rejection criteria [37].
For simulated jets, a generator-level definition of the jet flavour is needed to train the heavy-flavour tagging algorithms, as well as to calibrate them using data. This definition is based on a procedure referred to as ghost-matching [38], which involves reclustering the generator-level jet constituents after adding the intermediately decayed b or c hadrons to the list of particles used for the clustering, as already used in Ref. [2]. In case there are multiple b (c) hadrons in a decay chain, only the last b (c) hadron in the chain, i.e., the b (c) hadron that does not further decay into another b (c) hadron, is reclustered. The moduli of the four-momenta of these generated hadrons are rescaled to a very small number (resulting in so-called ghost hadrons), to ensure that they do not affect the reconstructed jet momentum, and only their directional information is kept. If at least one ghost b hadron has been clustered inside the jet, it is referred to as a bottom jet. If no b hadron is found, but instead at least one c ghost hadron is clustered inside the jet, it is referred to as a charm jet. In all other cases the jet is categorised as a lightflavour jet, and hence this category includes jets originating from light quarks (uds), as well as gluon-initiated (g) jets. Particles from pileup interactions are not included in the reclustering.
Displaced secondary vertices (SV) from the decays of b or c hadrons are reconstructed using the inclusive secondary vertex finding algorithm [39]. This algorithm does not a priori assume any matching of the jets to a SV, but instead takes as input all reconstructed tracks in the event with p T > 0.8 GeV, and having a distance of closest approach from PV to track, projected onto the direction of the beam axis, below 0.3 cm. After the iterative procedure of track clustering, the final set of tracks is matched to the reconstructed jets if the distance between the direction of the SV displacement (pointing from PV to SV) and the jet axis satisfies ∆R < 0.4.

The c tagging algorithms
The identification of c jets relies on the long lifetime and the mass of the c hadron. The average lifetime of such charmed hadrons in the rest frame is typically a picosecond, resulting in a displacement of the decay vertex in the detector frame by a few millimetres to a centimetre from the position of the primary vertex. The reconstructed invariant mass, computed from the four-vectors of the particles that are assigned to such a secondary vertex, is strongly correlated with the c hadron mass. The existing heavy-flavour tagging algorithms exploit these properties by combining the features of tracks inside the jet (momentum, displacement, multiplicity) with the features of reconstructed SVs (mass and both the direction and the magnitude of the SV displacement from the PV).
These are the same properties that are used to identify b jets. Since the average lifetime and mass of b hadrons is larger than that of c hadrons, the signatures of b jets are more easily distinguishable from those of light-flavour jets than those of c jets. The fact that the discriminating properties of c jets are distributed midway between those of light-flavour and b jets requires the definition of two discriminating variables; one to distinguish c jets from light-flavour jets (CvsL) and one to distinguish c jets from b jets (CvsB). Historically, the first c jet identification algorithm used in the CMS Collaboration was based on a combination of two boosted decision trees (BDTs), each trained for one of the above mentioned discrimination purposes [3]. The state-of-the-art heavy-flavour tagging algorithms are currently based on multi-class deep neural network architectures and predict separate probabilities for each jet to originate from a given quark flavour (or gluon). Through an appropriate combination of the probability values computed by these algorithms, these taggers can function either as a bottom jet or as a charm jet identification algorithm.
The DNN architecture of the DeepCSV algorithm is composed of five fully connected hidden layers with 100 nodes in each layer. It takes as input a set of 66 reconstructed observables related to the charged-particle tracks and secondary vertices that are assigned to a given jet, and outputs four probabilities, P(b), P(bb), P(c) and P(udsg), that denote the probability of a jet to originate from one b quark, two b quarks merged into the same jet, one or more c quarks or a light-flavour quark or gluon, respectively. The DeepJet algorithm uses an architecture composed of subsequent convolutional, recurrent and fully connected hidden layers. Its input is composed of a set of up to 613 observables related to charged and neutral PF candidates (without a priori selection criteria and without explicitly classifying charged PF candidates as charged hadron or leptons, and neutral PF candidates as photons or neutral hadrons) as well as the SVs that are assigned to the jet. Apart from the fact that DeepJet exhibits a higherdimensional input space and a more complex architecture, it further subdivides the output classes into additional categories. In addition to the DeepCSV output categories, P(b lep ) is added to identify leptonic b hadron decays and P(udsg) is split further into P(uds) and P(g) with the goal of separately identifying jets originating from light quarks and gluons, respectively. More detailed information on the inputs, architecture, and training of these algorithms can be found in Refs. [2,4,5]. Table 1: Summary of the heavy-flavour tagging definitions for both b and c tagging using the DeepCSV and DeepJet taggers. P(a) represents the probability of having an a-type jet (see text).

Tagger
BvsC  These output probabilities can be appropriately combined to define a set of b and c tagging discriminators as summarised in Table 1. For b jet identification, a discriminant is defined to distinguish b jets from either c or light-flavour jets using one single discriminator (BvsC/L). For c jet identification, two distinct discriminators are defined as the ratios in the second and third columns in Table 1. The normalised distributions of these discriminators for both algorithms are shown in Fig. 1 for jets with p T > 20 GeV and |η| < 2.5 from simulated hadronically decaying tt events. The performance of these algorithms can be assessed by evaluating the selection efficiency for c jets as a function of the misidentification rate for either b or lightflavour jets for different selection thresholds on the discriminator values. These result in a socalled receiver operating characteristic (ROC) curve which is shown in Fig. 2. It can be seen that the DeepJet algorithm has a lower mistagging efficiency than the DeepCSV algorithm in both CvsL and CvsB discrimination. Furthermore, the selection efficiency for c jets can be evaluated as a simultaneous function of b and light-flavour jet misidentification rates, which produces a two-dimensional (2D) ROC contour plot as shown in Fig. 3. This plot shows that the DeepJet algorithm outperforms the DeepCSV algorithm in simultaneous CvsL and CvsB discrimination over the entire 2D phase space, as well. The DeepCSV algorithm itself has already shown a significantly improved performance over the original c tagging algorithm, which was based on a combination of two BDTs [2], demonstrating the significant advancements that have been made in heavy-flavour identification over the last five years.
The 2D (normalised) distributions of the CvsL and CvsB discriminators of both algorithms are shown in Fig. 4 for different jet flavours. In this 2D phase space spanned by the CvsL and CvsB discriminators, light-flavour jets are situated almost exclusively in the upper left corner, whereas c jets have a significant fraction along the right edge, and b jets are distributed largely towards the lower right corner, as expected.  An a priori track selection is applied to define the collection of tracks considered as input to the DeepCSV algorithm. This selection is optimised to identify tracks from b and c hadrons, while rejecting tracks from pileup interactions and poorly reconstructed tracks that do not match any genuine particle passing through the detector. It is therefore possible that a given jet has no tracks that pass this preselection and therefore no information is present to calculate a heavyflavour tagging discriminant. These jets are assigned a default output value of −1. Given the known differences in track reconstruction efficiency and the poorly reconstructed track rate between the simulation and collision data, it is important to calibrate the rate of jets with such default discriminator values. No such preselection criteria are used in the DeepJet algorithm, which instead takes as input the entire collection of PF candidates associated with a given jet. Therefore, no default values are expected to appear in the output probabilities of the DeepJet algorithm. Nevertheless, because of the definition of the CvsL and CvsB discriminators shown in Table 1, an undefined discriminator value can appear if the denominator becomes zero. This situation is observed when the b jet probability, P(b)+P(bb)+P(b lep ), is evaluated to be exactly 1 by the DeepJet algorithm, resulting by construction in an output value of 0 for all other output probabilities.
Consequently, this appears almost exclusively for b jets. Whenever such a situation appears for a given jet, its c tagging discriminator values, CvsL and CvsB, are evaluated to be undefined and 0, respectively, and hence constitute a special category of jets separated from the continuous jet distribution in the 2D CvsL-CvsB plane. For simpler representation, both CvsL and CvsB values of these jets are defaulted to a value of −1 in this paper.

Mismodelling of the simulated c tagging discriminators
The need for calibrating the heavy-flavour tagging discriminants arises from possible mismodelling of the simulated inputs. Important discriminating properties such as the track displacement or the SV displacement are not very well modelled in simulation and are subject to changes in the detector alignment. As already evident from other heavy-flavour calibration methods [2], the performance of the heavy-flavour tagging algorithms is overestimated in the simulation, resulting in higher simulated b and c tagging efficiencies and lower misidentification probabilities compared with those observed in the data. Rather than correcting the efficiency of a selection in the discriminator, the full simulated differential shape of that discriminator can be calibrated to match the shape observed in data. Such a strategy has been  The left-hand column shows the discriminators of the DeepCSV algorithm, whereas the righthand column shows those of the DeepJet algorithm.
developed in the past for the b tagging discriminator distribution [2]. For the first time this paper presents a method of calibrating the differential c tagging discriminator shapes. Since c jet identification relies on both the CvsL and CvsB discriminators simultaneously, the calibration of the c tagger is performed as functions of both the CvsL and CvsB distributions. Jets with defaulted values of CvsL and CvsB are calibrated as a separate category. The rest of this paper is devoted to the discussion of this novel calibration method.

Event selections for calibration
The strategy for calibrating the DeepCSV-and DeepJet-based c taggers involves identifying three samples of jets in the data that are enriched respectively in c, b, and light-flavour jets. This has resulted in the definition of three jet samples enriched either in W+c events (c enriched), tt events (b enriched), and DY + jet events (udsg enriched). The flavour tagging algorithms being calibrated are not used in the construction of these selections to keep the resulting calibration free from potential biases. However, the sampled c and b jets are required to contain a soft muon inside the jet cone, to enrich the samples with jets containing semileptonically decaying c and b hadrons, which increases signal purity. Since soft muons within jets are treated as any other charged PF candidate in both DeepCSV and DeepJet trainings, as opposed to training the network with explicit soft muon information, the bias in the discriminator responses arising from using muon-containing c and b jets as a proxy for all c and b jets, is expected to be minimal. In Section 10.2 we further demonstrate the applicability of the correction factors to inclusive samples despite the nonuniversality of the c and b jets used for calibration.
The SM physics processes, namely, W+c, tt and DY + jet, chosen to yield the three jet samples, were experimentally studied with great precision in previous publications [40][41][42][43]. These papers did not observe any significant deviation from the SM expectations. Therefore, effects of BSM physics, if any, on these samples are expected to be insignificant when compared with the statistical and systematic uncertainties associated with the data and simulation. Hence, the distributions of the c tagger outputs for these jet samples in data are expected to agree with the corresponding calibrated distributions in simulation within uncertainties.
In this section, the event selection criteria to obtain each of these three topologies are outlined, followed by a discussion on the purity of the different jet flavours after each selection.

The c jet enriched selection
Charm jets in data are selected from events with a jet produced in association with a W boson, following the strategy discussed in the CMS W+c cross section analysis [40] and the c jet identification SF derivation [3]. The relevant events involve a leptonically decaying W boson (W → lν, l = e, µ) and a jet originating from the hadronisation of a charm quark. These charm jets are identified using the semileptonic decay of the c hadron, which produces a soft muon within the jet in the final state. Charm hadron decays into electrons are not considered, because of the lower efficiency of electron reconstruction at low energies [44]. Thus, the targeted final state consists of an isolated charged lepton (electron or muon) from the W boson decay, p miss T due to the presence of a neutrino from the W boson decay, and at least one jet with a soft, nonisolated muon inside it.
The selected events are divided into two categories:  Figure 5: Feynman diagrams showing production of charm quarks in association with a W boson (left and middle) and the major background (right).
• Same-sign (SS) events, where the soft muon inside the jet and the isolated lepton from the W boson have similar charges.
If there are multiple soft muons inside the jet, the muon with the highest p T determines the sign of the charge.
The major background for the W+c selection, as described above, is the production of a W boson associated with the radiation of a virtual gluon that decays into a pair of b or c quarks, with one or more of the resulting jets in the final state having a soft muon inside it. These backgrounds have 50%-50% probabilities of being tagged as OS or SS. In contrast, the leading-order contributions to the production of a W boson with one c jet always result in an OS signature, as illustrated in Fig. 5. Hence, an OS-SS subtraction largely reduces the major background and yields a distribution enriched in W+c events. All distributions related to W+c selection in this paper are OS-SS subtracted.
A few details of the selection are described as follows: Isolated lepton: A charged lepton (electron or muon) from the W boson decay that originates close to the PV and has a relative isolation smaller than 0.05 is required. The events are categorised into electron and muon channels, depending on the flavour of each charged lepton. The electron channel requires exactly one isolated electron with a transverse momentum of at least 34 GeV and no isolated muon, and the muon channel requires exactly one isolated muon with a p T of at least 30 GeV and no isolated electron. The events are triggered by a single-electron and/or single-muon HLT. The W bosons decaying into τ leptons are not explicitly excluded, and may enter the selection through their leptonic decay products.

W boson candidate: The W boson transverse mass is defined as
where p lep T is the p T of the isolated lepton and φ lep (φ p miss T ) is the φ of the isolated lepton (missing momentum). An m T of at least 50 GeV is required to ensure the presence of a neutrino produced from a W boson decay.

Jets:
Events are required to have at least one and at most three jets lying inside the tracker coverage (|η| < 2.5) and with p T > 20 GeV. At least one of these jets is required to contain a low-energy muon with p T < 25 GeV and a relative isolation of at least 0.2 (0.5) in the electron (muon) channel. If there are multiple such jets in an event, the jet with the highest p T (referred to as the muon jet) is selected. Only jets separated from the isolated lepton (from W boson decay) by ∆R > 0.5 are considered.

DY suppression:
To discard events where a prompt muon is misidentified as the muon jet, an upper threshold of 0.4 (0.6) is applied to the muon energy fraction of the muon jet in the muon (electron) channel. For only the muon channel, the sum of the muon and the neutral electromagnetic energy fractions of the muon jet is required to be smaller than 0.7. These two requirements heavily suppress the DY background in the muon channel by rejecting events where one of the prompt muons is misidentified as the muon jet, e.g. when the prompt muon undergoes final-state photon radiation. Furthermore, the invariant mass of the muon inside the jet and the isolated muon in the muon channel cannot match that of the Z boson (in between 80 and 100 GeV) or other dimuon resonances (below 12 GeV).
The muon jets selected in these events are the c jet candidates. The OS-SS distributions of the CvsL and CvsB variables of these candidates are presented in Fig. 6 for both DeepCSV and DeepJet taggers. These plots, as well as all other jet distribution plots in this paper, are plotted after scaling the simulated jets such that the total number of simulated jets matches the number of jets in data in every channel of every selection. This ensures that any remaining discrepancies between data and simulation, in terms of cross sections of the contributing processes, are largely reduced.

The b jet enriched selection
The b jet enriched sample is obtained from tt events by selecting one of the b jets originating from the decay of the two top quarks. The selection includes both the semileptonic and dileptonic decay channels of the tt process, each of which is discussed separately below.

Semileptonic tt selection
The strategy for the semileptonic tt selection is to select one of the b jets produced in a semileptonic tt decay by identifying a leptonically decaying W boson, a jet with a soft muon from a bottom hadron decay chain inside it, and several additional jets from the other b quark and the hadronically decaying W boson. The selection is similar to that of the W+c discussed in the previous subsection and is kept orthogonal to the latter by requiring at least four jets in the event.

Dileptonic tt selection
The dileptonic tt selection chooses one of the two b jets in dileptonic tt events. The target final state consists of two isolated, oppositely charged leptons from the two W boson decays, and a minimum of two jets, at least one of which has a soft muon inside it. The selection can be subdivided into three channels depending on the flavour of the isolated leptons, namely dielectron, dimuon and electron-muon channels. The events in those channels are triggered using the dielectron, dimuon and electron-muon HLTs, respectively. Although the muon-electron channel thus selected is quite pure in b jets, the two other channels are contaminated by the DY background. Therefore, in the dielectron and dimuon channels, the event is required to have a p miss T of at least 40 GeV, and the invariant mass of the dilepton candidate is required to be incompatible with that of the Z boson (i.e. outside the range 75-105 GeV) or other dimuon resonances (below 12 GeV), to suppress the DY contamination.
The muon jet selected in each semileptonic or dileptonic tt event is the b jet candidate. In case an event has multiple muon jets, the jet with the highest p T is selected. The CvsL and CvsB distributions of these candidates combined from both semileptonic and dileptonic decays are presented in Fig. 7 for both DeepCSV and DeepJet taggers. We have verified that the c tagger discriminator shapes of individual flavours agree quite well between semileptonic and dileptonic decay channels for both of the taggers.

Light-flavour jet enriched selection
Jets arising from light quarks (uds) and gluons (g) are selected in events containing one or more jets produced in association with a leptonically decaying Z boson (DY + jet events). The selection is split into two channels (electron and muon), depending on the flavour of the charged leptons from the Z boson. The Z bosons decaying into τ leptons are not explicitly considered, but may enter the selection through the leptonic τ decays.
The final state in the electron (muon) channel consists of exactly two isolated, oppositely charged electrons (muons) having p T of at least 27 and 15 (20 and 12) GeV and at least one jet. Each of the two leptons is also required to have a relative isolation less than 0.15 and their invariant mass is required to be within 10 GeV of the Z boson mass. The events are triggered with the dielectron and dimuon HLTs, respectively. The requirement of a soft muon inside the jet is not imposed for light-flavour jets. In case there is more than one jet in the event, the jet with the highest p T is considered as the light-flavour jet candidate. The CvsL and CvsB distributions of these candidates are presented in Fig. 8

Summary of selections
To summarise, W+c events (OS-SS subtracted), tt events, and DY + jet events are used to probe the c tagger distributions of jet samples in data enriched in c jets, b jets and light-flavour jets, respectively. The jet yield and percentage of jets of each flavour in each of these samples are summarised in Table 2. These selections yield a c jet purity of 92.9%, b jet purity of 81.0% and light-flavour jet purity of 86.1% in their respective samples.

Iterative calibration of the c tagging discriminators
A calibration of the full discriminator shape requires deriving SFs to correct for the datasimulation discrepancy at every value of the discriminator. Since c tagging requires using both CvsL and CvsB values of a jet simultaneously, this translates into deriving data-to-simulation SFs as functions of both these parameters to simultaneously adjust the discriminator shape in the 2D plane of CvsL and CvsB. In addition, mismodelling in simulation is expected to be different for light-flavour, c, and b jets. This entails deriving different sets of SFs for different jet flavours.
The general idea of deriving SFs employed in this paper is to perform a fit iteratively on three flavour-enriched jet samples allowing the three flavour components (light, c and b) in each to float freely until the difference between the distribution of jets in data and simulation in all the samples is minimised. The simultaneous usage of three samples with different flavour compositions ensures the existence of a solution, while an iterative fitting approach helps in converging to a physical value. After motivating two distinct choices of binning in the 2D phase space of the CvsL and CvsB discriminators below, the algorithm employed in the iterative fit method is discussed in detail. Finally, an interpolation scheme is discussed that combines the SF results obtained in two different binning schemes into one set of SFs with finer binning.

Adaptive binning
Ideally, a perfect calibration would require deriving the flavour-wise correction factors in infinitesimally small areas in the CvsL-CvsB plane. However, the smaller the bin area, the lower the yield of jets in that bin, and hence the larger the statistical uncertainty in the SF. A large bin area, on the other hand, results in largely discontinuous SFs, which could be unrepresentative and could result in an incorrect calibration. To find a trade-off, an adaptive binning approach is adopted with the aim of having smaller bin areas for SFs of a given flavour in the part of the 2D plane where the yield of jets of that flavour is large. This results in a more continuous correction in high jet count regions, and ensures optimised statistical uncertainties over the entire phase space.
In practice, the binning is kept constant along one axis and made adaptive along the other in the first run, and then the choice is reversed in the second. The 2D plane of SFs for all three flavours is first divided into ten equal-width bin-slices along one axis (the "fixed axis"). The algorithm treats each of these slices independently and proceeds to determine the binning along the other axis (the "variable" axis) on a per-bin-slice basis and independently for each flavour. Thus, for each bin slice of the SF map of a given flavour, the binning along the variable axis is sequentially increased starting from a minimum threshold, b min , in steps of b min , until at least one of the following two conditions is fulfilled for each bin: • the resulting SF for that flavour in that bin has a relative statistical uncertainty smaller than max , or, • the bin width along the variable axis is equal to b max .
A dedicated scan of these three parameters (b min , b max , and max ) is performed and values of b min = 0.02, b max = 0.10, and max = 2% are chosen as an optimal compromise between the granularity of the binning and the statistical uncertainties.
After the SF results are derived using such a binning scheme, the choice of fixed and variable axes is reversed to yield another binning scheme. For example, at first, the CvsB axis could be chosen as the fixed axis and CvsL as the variable axis to yield a binning scheme and perform the iterative fit. Next, the CvsL axis could be chosen as the fixed axis and the CvsB axis as the variable one. This would yield a second binning scheme using which the iterative fit can be performed.

Iterative fit
The iterative fit method minimises a global χ 2 value that quantifies the data-to-simulation discrepancy in the projected CvsL distribution over a given (fixed-width) CvsB bin slice, or vice versa. The method proceeds by iteratively evaluating and applying SFs for all three jet flavours in the corresponding selected topology in which they are enriched.
The algorithm begins by scaling the total number of simulated jets across all contributing processes in each channel (i.e. muon, electron, or electron-muon, as applicable) of each sample (i.e. c, b or light-flavour jet enriched) to match the number of jets selected in data in that channel. This removes any remaining data-simulation discrepancies in terms of the total jet yield, without altering the discriminator shape, as explained in Section 6, and simultaneously ensures that the SFs thus derived correct only the discriminator shape without significantly altering the jet yield.
Then for each fixed bin slice: • The three samples are ranked by the purity of the samples for the given bin slice.
The purity of a sample is defined as the percentage contribution of the jet flavour that the sample is meant to enrich, e.g. fraction of c jets in the W+c selection (see Table 2).
• Starting with the purest sample, "signal" is defined as the CvsL (CvsB) distribution of simulated jets of the flavour f 1 that this sample aims to enrich, when CvsB (CvsL) is chosen as the fixed axis. Similarly, "background" is defined as the sum of the templates of the other two flavour components of the same sample in simulation. "Signal in data" is defined as the difference between the simulated background template and the data distribution. The ratio of the "signal in data" to "signal", evaluated as a function of the quantity on the variable axis, gives the SF for f 1 in this bin slice in this iteration. The ratio (an array of SFs) is calculated in discrete intervals following the binning scheme determined as explained in Section 7.1. If the yield of simulated jets of flavour f 1 in the enriched sample in any of these bins is less than one, or if the purity of the sample in any bin is less than a threshold (10%), the resulting ratio in this bin may have an unphysical central value with a large statistical uncertainty. In all such bins, a default value of 1 with a 100% statistical uncertainty is assigned as the SF.
In practice, the array of SFs is initialised with the value 1 for all bins and all flavours with the goal of updating them to the current best estimate of the SF values after every ratio evaluation. When the difference between the current and previous best estimates after a ratio evaluation in a given bin is larger than 0.002 (0.005), a change of only 0.002 (0.005) is allowed in that bin for DeepCSV (DeepJet). This ensures a smooth convergence of the χ 2 parameter and prevents convergence to a local minimum.
• Moving to the second purest sample for this bin slice, the SFs for f 1 are applied to the simulated template of f 1 in this sample with the same discrete binning used to determine the SFs. Again, "signal" is defined as the template of the flavour f 2 that this distribution enriches and thus the array of SFs for f 2 is evaluated in the same way. The binning of these SFs can, in general, be different from that of f 1 , and will depend on the distribution of jets of flavour f 2 in this bin slice.
• Proceeding to the third sample, the SFs for f 1 and f 2 are applied to the respective simulated templates following their respective binnings, and the SFs for f 3 corresponding to this sample are evaluated. This completes one full iteration for this bin slice.
• The next iteration begins by applying all three SF arrays to the respective simulated templates of the first sample. The SFs for f 1 are then updated. Proceeding to the next samples, f 2 and f 3 are updated likewise.
• The iterations are continued until a global χ 2 is minimised and no longer improves with further iterations. The χ 2 for a given sample, "sel" (sel = Wc, tt, DY), in this bin slice is defined as where n is the number of bins of width b min in the distribution along the variable axis, N sim,i = ∑ f =c,b,udsg SF f ,i N f ,i is the SF-adjusted simulated jet count in the ith bin along the variable axis, N data,i is the jet count in data in the i-th bin, with σ 2 sim,i and σ 2 data,i as the corresponding statistical uncertainties in simulation and data, respectively. This quantity is always well defined since the width of any given bin is a multiple of b min by construction, as explained in Section 7.1. The global χ 2 of a bin slice is defined as the sum of the three χ 2 sel values corresponding to the three samples in this bin slice, and is updated after every iteration.
The same procedure is used for every bin slice and for both DeepCSV and DeepJet taggers. Thus, three 2D maps of SFs corresponding to three different flavours are obtained for each tagger. Next, the entire procedure is repeated with the fixed and variable axes interchanged. This gives an additional set of SFs for the same phase space, but with a different binning. The method of combining the two sets of results is presented in Section 7.3.

Treatment of jets with default discriminator values
Jets that are assigned a default value of −1, because they either do not have sufficient information to be processed by the tagging algorithm (DeepCSV) or have an undefined value for at least one of the c tagging discriminators (DeepJet), need to be treated separately in the calibration.

DeepCSV
Jets that have no tracks passing the DeepCSV preselection criteria, and hence have been assigned default discriminator values CvsL = CvsB = −1, remain excluded from the iterative fit treatment described above. Therefore, the iterative fit procedure is repeated in the same way for the jets with defaulted discriminator values in the three samples. However, only the jets from the electron channel of the W+c selection are used as the c jet enriched sample in this particular fit, because such jet candidates in the muon channel of W+c are heavily con-taminated with prompt muons from DY events misidentified as jets and hence are unsuitable for use in a three-flavour fit. The χ 2 for a sample in this case is defined as: , and the global χ 2 , defined as the sum of the χ 2 sel values of the three samples, is minimised iteratively. This yields three correction factors corresponding to the three flavours. These SFs for the bin containing jets with default DeepCSV values, when combined with the 2D maps of SFs for the CvsL-CvsB plane, give the full set of SFs required for the calibration of the DeepCSV c taggers.

DeepJet
Since the CvsL = CvsB = −1 bin of DeepJet consists almost entirely of b jets, a lack of data for light-flavour and c jets in this bin makes it both impossible and unnecessary to perform a three-flavour iterative fit to find DeepJet SFs for udsg and c. Therefore, in this bin, c and udsg SFs are both set to 1 ± 1, and the b SF is evaluated as: where N represents the jet counts of the respective flavours in the bin containing jets with default discriminator value in the tt sample.

Interpolation
For each algorithm, an interpolation of the SF values in the 2D plane is performed on a perflavour basis to find more representative corrections on a finer grid. Such an interpolation is feasible because the SFs have a fairly smooth distribution, as evident to a first degree of approximation from the data-to-simulation ratios in Figs. 6-8. Moreover, combining the two sets of SFs, which are derived using two different binning schemes with such an interpolation approach, is expected to improve the SF values, given the fact that the two sets of SFs potentially contain complementary information about rapidly varying SF values at certain parts of the CvsL-CvsB plane. For example, the SF set derived with an adaptive binning scheme along the CvsB axis could potentially reflect sharp SF variations along the CvsB axis for a certain flavour of jets in parts of the plane enriched in jets of that flavour.
At first, a nodal point for the interpolation (a point where the functional value is assumed to be known exactly) is defined for each bin in the 2D plane for each flavour in each SF set. The coordinates of this nodal point are defined as the means of the coordinates of all the jets of the corresponding flavour that are contained in the corresponding bin. Thus, the nodal point (x, y) of a bin in an SF map for a given flavour containing N jets of that flavour, each with a (CvsL, CvsB) value of (x i , y i ), is given by: The value of the SF corresponding to this bin is assigned to this nodal point and is assumed to be known exactly at this point for the purpose of the interpolation.
For each flavour, the nodal points obtained from the two SF sets are merged together to form an irregular grid of points in the 2D plane; an SF value is assigned to each from the respective SF set. When the same nodal point is found in both the SF sets, only one is used to form the grid.
A bivariate interpolation is performed for each flavour using the SF values at the nodes of the corresponding irregular grid using the INTERPOLATE subpackage of SCIPY [45]. The algorithm first triangulates the nodal points in the 2D plane using QHULL [46] and then constructs a piecewise cubic interpolating Bézier polynomial [47] on each triangle using a Clough-Tocher scheme [48]. Then the interpolant is constructed by choosing gradients that approximately minimize the curvature of the interpolating surface [49,50].
However, such an interpolant is defined only inside the convex polygon (convex hull) containing all the nodal points. Furthermore, the interpolant is seen to have unphysical properties in some parts of the plane inside the convex hull but outside the concave hull of the nodal points. Therefore, a nearest-neighbour extrapolation (assigning the functional value of the nodal point nearest to a given point) is performed to assign values to all points in the 2D plane that lie outside the concave hull of the nodal points. The interpolant along with the extrapolant together define SF values at all points of the 2D plane (called interpolated SFs hereafter, for simplicity). Even though this allows for a continuous correction over all the plane, the interpolated SF values are computed at 50 discrete bins along each axis for convenience, which is a good approximation for a continuous correction.
We have verified that interpolating using nodal points from two sets of SFs derived with two different binning schemes provides a final set of SFs that exhibits a better closure of the method than SFs derived with interpolation using nodal points from any one set alone, in case of both DeepCSV and DeepJet algorithms. Furthermore, we also verified that the piecewise-cubic bivariate interpolant used here outperforms other methods of interpolation, such as piecewiselinear bivariate interpolation and smoothing bivariate spline approximation.

Uncertainty estimation
The statistical uncertainty in the SF for a given flavour in a given bin is evaluated from the statistical uncertainty in the number of jets in both data and simulation in that bin in the selection enriched in that flavour. In this regard, the per-bin uncertainty in simulation, as well as data, of the W+c OS-SS selection accounts for the individual statistical uncertainties in the OS and SS selections, and is hence propagated to the statistical uncertainties in the c jet SFs.
For systematic uncertainty estimation, all selections are performed with different sources of systematic uncertainties shifted by ±1 standard deviation independently, and the whole iterative fit procedure is performed for every systematic variation. This gives modified SF maps for every source of uncertainty. The sources of systematic uncertainties considered are: • Lepton identification: Observed differences in electron and muon identification criteria [44,51,52] between data and simulation are included by p T -and η-dependent SFs and the corresponding uncertainties are propagated to the calibration of the c tagger discriminator, separately for electrons and muons.
• Pileup modelling: A weighting is applied to the simulated pileup vertex multiplicity to match the profile observed in data. This weighting is based on the total inelastic pp cross section of 69.2 mb [53], whereas the corresponding uncertainty is computed by varying this inelastic cross section by ±4.6%. • Jet energy resolution: The jet energy resolution (JER) [33,36] is observed to be worse in data than in simulation. To mimic more closely the resolution that is observed in data, an additional smearing is applied to the simulated jet energy. The uncertainties related to this smearing are a source of systematic uncertainties in the calibration of the c tagger discriminator.
• Jet energy scale: Similarly, differences in the observed jet energy scale (JES) [33,36] between data and simulation result in corrections to the simulated jet four-momenta. The corresponding uncertainties related to these adjusted four-momenta are included in the calibration of the c tagger discriminator.
• Factorisation and renormalisation scales: The choice of the renormalisation and factorisation scales during the ME calculation may affect kinematical distributions of the final-state objects and therefore also the heavy-flavour tagging discriminants. Uncertainties in these scales are estimated by varying each of the two scales independently at the ME level by factors 0.5 and 2, simultaneously across all contributing processes.
• Initial-and final-state radiation in the parton shower: During the parton shower, the uncertainty in the value of the strong coupling at a given energy scale is estimated by varying the renormalisation scale of QCD emissions separately in the initial-state radiation (ISR) and final-state radiation (FSR) up and down by factors of 2 and 0.5, respectively. The resulting uncertainty of these variations in the c tagger calibration is included in the measurement.
• Cross sections of various physics processes: The cross sections of the different relevant processes are varied within the uncertainties of the corresponding theoretical predictions [25][26][27][28][29][30], up to the order in perturbation theory to which they are scaled in the simulation (see Section 3). Such variations are expected to alter the signal-tobackground ratios in the selections and hence may affect the overall shapes of the heavy-flavour tagging discriminants in simulation.
• b fragmentation: The uncertainties in the internal parameters of the Bowler-Lund model [54][55][56] that is used to parametrise the b quark fragmentation in PYTHIA, are obtained experimentally from the ALEPH [57], DELPHI [58], OPAL [59], and SLD [60] experiments. The effect of varying this parametrisation within the uncertainties in the kinematics of the b jet is covered by a weighting of the "transfer function", at the generator level. The effect of this on the c tagging discriminator shapes, in turn, can be parametrised as a 2D linear function of CvsL and CvsB values of b jets, separately for DeepCSV and DeepJet. The up and down variations in this quantity are hence propagated to the b jet template of each selection.
• Flavour composition of V+jet processes: Since the method can potentially be sensitive to the flavour composition of the background jets, additional uncertainties are assigned to the b and c jet yields in the DY+jet process and the c jet yield in the W+c process. These uncertainties are estimated to be 2.5 and 9%, respectively, for b and c jets produced in association with a Z boson, and 8% for c jets produced in association with a W boson; these results are based on the uncertainties in the experimental measurement of Z+b/c processes [43] and the experimental uncertainty in the measurement of the W+c cross section [40].
• Interpolation: Although there is no straightforward method to estimate the bounds of the interpolation uncertainties at non-nodal points in the 2D plane in case of a piecewise-cubic bivariate interpolant, an approximate order-of-magnitude estimate is attempted. For each non-nodal point (probe point) in the concave hull of the SF nodal points for a given flavour, the interpolation uncertainty is quantified as the sum of 10 terms corresponding to the 10 control points of the cubic Bézier triangle that contains the point. Each of these terms is proportional to (a) the maximum absolute deviation of the directional derivative of the interpolant at each point between the control and probe points, from the average slope of the interpolant between the control and probe points, and (b) the linear distance between the control point and the probe point. This quantity is expected to be larger in parts of the plane where the interpolant has a high curvature, and at points farther away from the nodal points. In all cases, the quantity is computed approximately using numerical methods. This uncertainty is set to zero for points outside the concave hull of the nodal points.
• Extrapolation: The extrapolation uncertainty at a point outside the concave hull of the SF nodal points for a given flavour, is defined as the difference between the SF value obtained from a bicubic bivariate smoothing spline extrapolation [45] and the nominal SF value assigned (using nearest neighbour extrapolation). This uncertainty is set to zero for points inside the concave hull.
An estimation of the total uncertainty is made by adding statistical and all systematic uncertainties in quadrature, assuming no correlation. An estimation of the relative contributions of each source of uncertainty is presented in Section 9.

Scale factors
The shape calibration SFs, obtained using the iterative fit method for both DeepCSV-and DeepJet-based c taggers, are shown in Figs. 9, 10, and 11 for c, b, and light-flavour jets, respectively. These figures also show the statistical and total systematic uncertainties associated with each SF value. Tabulated results are provided in the HEPData record for this measurement [61].
Additionally, the effect of each source of systematic uncertainty in these SFs, when projected onto each of the 1D CvsL and CvsB distributions, is demonstrated by applying the nominal SFs separately to c, b, and light-flavour jets taken from a simulated hadronically decaying tt sample, and then measuring the relative changes in the jet yield distributions by applying each systematic variation of the SFs. The relative contributions of all systematic uncertainty sources thus projected for both the taggers are shown in Figs. 12 (DeepCSV) and 13 (DeepJet). Interpolation and extrapolation uncertainties have the highest contribution in the SFs in case of c jets. This is a result of the sparse binning of the nodal points in the c jet SFs all over the 2D plane, which, in turn, is a result of high statistical uncertainties in the W+c OS-SS selection that predominantly constrains the c jet SFs. Uncertainties in the factorisation scale alone dominate among the light-flavour SF uncertainties, especially in parts of the phase space having low yields for the corresponding flavour of jets. The uncertainties in the ISR and FSR in the PS have the highest contribution to the SF uncertainties for b jets, since these variations are relatively larger in the tt selections from which the SFs for b jets are predominantly constrained.

Adjusted c tagging performance in data
The c tagging shape calibration SFs are used to adjust the tagger discriminator distributions of simulated jets of each flavour to obtain an estimate of the corresponding distributions of jets in data. This is performed using an independent sample of jets taken from simulated hadronically decaying tt events and the adjusted distributions are then used to plot the ROC curves to estimate c tagging performance in data. Furthermore, the statistical and systematic uncertainties in the SFs are suitably propagated to the ROC curves. The adjusted ROC curves for , denotes the SF for b jets with the default discriminator value, along with the statistical (first term) and systematic (second term) uncertainties.     Individual contributions from each source of uncertainty are also quantified by the change in the area under the ROC curve from that of the central ROC curve. The relative contributions, including that of statistical uncertainties, evaluated using the ROC curve variations as described here are presented graphically in Fig. 16. The approach of presenting uncertainties as a function of discriminator values in Figs. 12 and 13 is essentially different from this interpretation, because the former disregards the relative abundance of jets at different discriminator values, whereas the latter includes the absolute change in the jet yields in different parts of the phase space and could therefore be more representative of the corresponding effect in a physics analysis. In this approach, uncertainties in the factorisation scale are the dominant contributions to the overall uncertainties in CvsL discrimination, whereas uncertainties in b fragmentation and in the ISR and FSR in the PS are among the highest contributors to the uncertainties in the CvsB discriminator, owing to their large contributions to the b jet SFs.

Closure test
A closure test is performed by applying the derived shape calibration SFs to the CvsL and CvsB distributions of the same jet selections for which the SFs were derived. The effect of applying the DeepCSV (DeepJet) SFs on the DeepCSV (DeepJet) c tagger distributions is shown in Fig. 17 (18). A good agreement between the c tagging discriminator distributions of simulated jets and those of jets in data establishes the closure of this method.     While the derivation of correction factors as well as this closure test were performed inclusively in jet p T , we have additionally verified that the same inclusively-derived corrections result in a good agreement between the discriminator distributions in data and simulation in each case when applied to jets in exclusive ranges of jet p T in between 20-30 GeV, 30-50 GeV, 50-80 GeV, and greater than 80 GeV. This indicates that there is no strong kinematical dependence of these corrections.

Validation on jets not biased with muons
Two additional tests are performed to demonstrate applicability of the SFs, which have been derived using b and c jets containing a soft muon inside them, when applied to jet samples that do not necessarily contain a soft muon. To construct a sample of jets enriched in b jets without using soft muons to identify them, the same tt dileptonic selection is used, where the jet with the highest p T that was not tagged with a soft muon in it is treated as the "second" b jet candidate.
Similarly, a second sample relatively enriched in c jets is constructed from the tt semileptonic selection already described in Section 6.2.1. All possible triplets of jets in the events are considered to reconstruct the hadronically decaying W boson candidate, and hence, the top quark candidate. For each combination, the first two jets are considered as the c and s candidates to reconstruct the W boson, whereas the third jet is considered as the b jet candidate. All combinations where the jet tagged with a soft muon is selected as the c jet or s jet candidate are rejected. For all other combinations, a mass-based χ 2 is defined as with m W and m t as the invariant masses of the reconstructed W boson and top quark candidates, respectively, in GeV. The combination of jets that yields the lowest value of χ 2 for a given event is considered the best combination, and the highest-p T jet assigned to the hadronically decaying W boson candidate is considered the c jet candidate.
Each jet in both these samples may or may not contain a soft muon, and the proportion of jets with and without a soft muon should be distributed as it would be in any inclusive selection of jets with similar event selections and flavour composition. The first sample is expected to contain mainly b jets and light-flavour jets. Since light-flavour jet SFs are already known to be free from any bias from soft muons, a validation for this sample would demonstrate applicability of the derived SFs on b jets that are selected without any requirement related to the presence of a soft muon. The second sample, on the other hand, is expected to contain c jets with significant contamination from b and light-flavour jets. A validation using this sample, in combination with the conclusion from the aforementioned validation on b jets, would demonstrate the applicability of the SFs for all c jets as well.
The pre-and post-calibration distributions of the c tagging discriminator values of the two samples of jets thus selected are shown in Figs. 19 (dileptonic tt) and 20 (semileptonic tt). A much flatter data-to-simulation ratio is achieved after the corrections are applied, compatible with unity within a few percent for most of the discriminator values.
This extension to a more general selection of jets, however, does not take into account potential differences in mismodelling across different selections. Such differences need to be accounted for in physics analyses by constructing separate control regions and studying the   data-to-simulation agreement in them after application of these SFs, and hence fitting the discriminator shapes in the control regions simultaneously with those in the signal regions, during the extraction of the signal.

Fit validation with pseudodata
A test is performed to validate the accuracy of the SF values from the iterative convergence. A set of distributions of "pseudodata" is created by applying artificial, handcrafted SF maps to simulated jet distributions. To mimic the statistical independence of data from simulation, the central value of every bin of the pseudodata is set by sampling a random number from a normal distribution, whose mean is equal to the central value of the corresponding bin in the rescaled simulation and standard deviation is equal to the statistical uncertainty in the same bin. The relative uncertainty of every bin in the pseudodata, in turn, is set to the relative uncertainty in the corresponding bin of data. The iterative fit algorithm is now allowed to run on the same selections replacing data with pseudodata.
The artificial SFs are generated as an arbitrary, continuous 2D function of DeepCSV CvsL and CvsB values, individually for each jet flavour. A single arbitrary number is assigned as the SF for jets with the default DeepCSV discriminator value for each flavour. Subsequently, the SF map of a given flavour is normalised in such a way that the total jet yield of an independent selection of simulated jets of that flavour taken from a hadronically decaying tt sample remains unchanged upon application of the SF map under consideration.
Two such maps of artificial SFs are designed, the first (second) of which consists of "mild" ("strong") SFs, that is, SFs with values in the range 0.5-1.4 (0.1-3.0). For each map, a total of 50 pseudodata models are created injecting the same set of artificial SFs but using a different random number seed in each model. For each map, the difference between the injected SFs and the extracted SFs ("SF pull") is plotted in Fig. 21. The differences are measured in units of the statistical uncertainty of the extracted SFs across all bins, weighted by the relative jet abundance in each bin in the CvsL-CvsB plane. The figure shows the cumulative results from all 50 models for c, b, and light favours separately. A mean of around 0 of the SF pulls illustrates a correct convergence of the method. A standard deviation of around 1 illustrates an optimal estimation of the SF statistical uncertainties in most cases. A standard deviation smaller than 1 for b jets in case of the "mild" map illustrates a conservative estimation of SF statistical uncertainties in case of "mild" deviations from data in simulation.
Whereas these studies show that the statistical uncertainties might be somewhat conservative in regions of the phase space where SFs for b jets are close to 1, this might not be the case for other regions where SFs show larger deviations. Since the SFs that map simulated b jets to b jet candidates in collision data (Fig. 10) are measured to have high central values close to 2 at some parts of the phase space, we have decided to not apply any a posteriori reduction of the uncertainties in the case of collision data.

Conclusion
This paper presents a novel method to calibrate the full differential shape of the discriminator distributions used for charm (c) jet identification at CMS. The method uses three different sets of event selection criteria, targeting topologies enriched in W+c, top quark pairs, and Drell-Yan+jet events. These topologies are highly enriched in c, bottom (b) and light-flavour jets, respectively, resulting in purities of a given jet-flavour that range between 81 and 93%. By employing an iterative fitting approach in each of these three regions, scale factors (SFs) are derived to match the simulated discriminator distributions to those observed in data. Since the c tagging algorithm is composed of two discriminators, one to discriminate c from b jets (CvsB) and another to discriminate c from light-flavour and gluon jets (CvsL), the SFs are derived as functions of CvsL and CvsB discriminator values. An adaptive binning is used to optimise the granularity of the provided calibration with respect to the statistical uncertainty in each bin. Finally, an interpolation is performed to obtain more representative corrections over the entire two-dimensional plane.
Validation and closure tests confirm the robustness of the method. Although this paper reports calibration results with only 2017 data, similar calibrations are obtained with 2016 and 2018 data separately that are used for the analysis of data collected in the respective years. The calibration of the full differential discriminator shape allows the use of the c tagging discriminators as inputs to multivariate techniques (based on machine learning) or by fitting the discriminator shapes to data to extract observables that are sensitive to the jet flavour. The shape calibration extends the use of c tagging algorithms beyond the application of discrete working points, and facilitates more advanced uses for c jet identification in physics analyses.