Modelling Z → ττ processes in ATLAS with τ-embedded Z → μμ data

This paper describes the concept, technical realisation and validation of a largely data-driven method to model events with Z→ττ decays. In Z→μμ events selected from proton-proton collision data recorded at √s=8 TeV with the ATLAS experiment at the LHC in 2012, the Z decay muons are replaced by τ leptons from simulated Zarrowττ decays at the level of reconstructed tracks and calorimeter cells. The τ lepton kinematics are derived from the kinematics of the original muons. Thus, only the well-understood decays of the Z boson and τ leptons as well as the detector response to the τ decay products are obtained from simulation. All other aspects of the event, such as the Z boson and jet kinematics as well as effects from multiple interactions, are given by the actual data. This so-called τ-embedding method is particularly relevant for Higgs boson searches and analyses in ττ final states, where Zarrowττ decays constitute a large irreducible background that cannot be obtained directly from data control samples. In this paper, the relevant concepts are discussed based on the implementation used in the ATLAS Standard Model H→ττ analysis of the full datataset recorded during 2011 and 2012.


Introduction
The experimental sensitivity of searches for (and eventually studies of) Higgs bosons in ττ final states at the LHC is driven by analyses of intricate event signatures that are not restricted to the Higgs candidate decay products.For example, the missing transverse momentum enters the reconstruction of the di-τ invariant mass m ττ , which is a key quantity in these analyses.The shape of the reconstructed m ττ distribution also depends on the boost of the ττ system and thus on the presence and kinematics of additional jets in the event.In addition, details of the final-state topology are used to define event categories, for example based on vector-boson fusion topologies characterised by two high-energy jets with large rapidity separation, and recent ATLAS analyses [1] also combine them into multivariate classifiers to extract the Higgs boson signal.
In these analyses, events with Z/γ * → ττ decays constitute a large irreducible background, and thus a reliable and detailed model of these processes is a critical ingredient.In view of the complexity of the relevant event properties it is highly desirable to rely as little as possible on simulation; moreover, it has been shown in dedicated measurements [2][3][4] that existing Monte Carlo simulations of Z+jets events need to be corrected in order to model the data.Ideally the model would be obtained directly from the collision data.However, due to background contributions, e.g. from events with other objects misidentified as τ decays, it is difficult to select a sufficiently pure Z/γ * → ττ sample from the data, and doing so without also including Higgs boson decays to τ lepton pairs is conceptually impossible.
Z/γ * → ττ events can still be modelled in a largely data-driven way by using Z/γ * → µµ events as a starting point. 1 Except for effects due to the difference in muon and τ lepton masses, the two processes are kinematically identical assuming lepton universality.In particular the kinematics of the Z boson and additional jets in the event are independent of the Z decay mode.By requiring two isolated, high-energy muons with opposite charge, Z → µµ decays can be selected from the data with high efficiency and purity, and due to the small muon mass and correspondingly small Higgs-muon coupling, the H → µµ contamination is expected to be negligible for all practical purposes.The detector response to the Z decay muons can be removed from the data events and replaced by corresponding information for τ leptons from simulated Z → ττ decays, where the τ kinematics are derived from the kinematics of the original muons (taking into account both the τ-µ mass difference and the τ-τ spin correlation).This substitution results in a Z → ττ event model where only the well-understood decays of the Z boson and τ leptons and the detector response to the τ lepton decay products are obtained from the simulation.All other aspects of the event -including, for example, the kinematics of the Z boson and additional jets, the underlying event as well as effects from multiple interactions -are directly taken from the data.The simulated and collisiondata information are combined based on reconstructed tracks and calorimeter cells, followed by a rereconstruction of the resulting hybrid events.In the following, this technique is referred to as embedding of simulated Z → ττ decays in Z → µµ data events (or, in short, τ embedding).It has been used in all H → ττ searches by ATLAS [5-8] to date, including the most recent analysis [1] establishing evidence for this decay.Corresponding CMS analyses [9, 10] have applied a similar technique.In addition, the method was adapted to single-τ processes for use in the analysis of W → τν τ decays [11] and searches for charged Higgs bosons [12,13].
This paper describes the concept, technical realisation and validation of the τ embedding corresponding to the implementation used in the ATLAS H → ττ analysis [1] of the full pp collision dataset recorded during 2011 and 2012 at centre-of-mass energies of √ s = 7 TeV and √ s = 8 TeV, respectively.The method is valid for all τ lepton decay channels.However, here the discussion and examples focus on final states where one of the τ leptons decays leptonically and the other one hadronically, also referred to below as the lepton-hadron ττ decay mode.This corresponds to the most sensitive H → ττ channel and tests the embedding of both the leptonic and hadronic τ decays.After a description of the ATLAS detector and the final-state reconstruction algorithms in Section 2, Section 3 provides an overview of the relevant event samples and selections.Section 4 outlines the concept and implementation of the τ-embedding method.
Studies to validate the procedure and associated systematic uncertainties are discussed in Section 5. A summary and conclusions are given in Section 6.
2 Experimental setup

The ATLAS detector
The ATLAS detector [14] at the LHC covers nearly the entire solid angle around the collision point.It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroid magnets, each with eight coils.The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the pseudorapidity range 2 |η| < 2.5.The high-granularity silicon pixel detector covers the vertex region and typically provides three measurements per track.It is followed by the silicon microstrip tracker which usually provides four two-dimensional measurement points per track.These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to |η| = 2.0.The transition radiation tracker also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energydeposit threshold corresponding to transition radiation.The calorimeter system covers the pseudorapidity range |η| < 4.9.Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and end cap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr presampler covering |η| < 1.8, to correct for energy loss in material between the interaction vertex and the calorimeters.Hadronic calorimetry is provided by the steel/scintillating-tile calorimeter, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters.The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements respectively.The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by superconducting air-core toroids.The precision chamber system covers the region |η| < 2.7 with three layers of monitored drift tubes, complemented by cathode strip chambers in the forward region, where the background is highest.The muon trigger system covers the range |η| < 2.4 with resistive plate chambers in the barrel, and thin gap chambers in the endcap regions.A three-level trigger system is used to select interesting events [15].The Level-1 trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to a design value of at most 75 kHz.This is followed by two software-based trigger levels which together reduce the event rate to about 400 Hz.
2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe.The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards.Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the beam pipe.The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2).Angular distance is measured in units of ∆R ≡ (∆η) 2 + (∆φ) 2 .

Final-state reconstruction
Muon candidates are reconstructed using an algorithm [16] that combines information from the ID and the MS.The distance between the z-position of the point of closest approach of the muon inner-detector track to the beam-line and the z-coordinate of the primary vertex3 is required to be less than 1 cm.This requirement reduces the contamination due to cosmic-ray muons and beam-induced backgrounds.Muon quality criteria such as inner-detector hit requirements are applied in order to achieve a precise measurement of the muon momentum and reduce the misidentification rate.Muons are required to have a momentum in the transverse plane p T > 10 GeV and a pseudorapidity of |η| < 2.5.Isolation requirements on close-by tracks and energy depositions in the calorimeter are applied in order to distinguish prompt muons from other candidates originating e.g. from hadronic showers.
Electron candidates are reconstructed from energy clusters in the electromagnetic calorimeters matched to a track in the ID.They are required to have a transverse energy, E T = E sin θ, greater than 15 GeV, be within the pseudorapidity range |η| < 2.47 and satisfy the medium shower shape and track selection criteria defined in Ref. [17].Candidates found in the transition region between the end-cap and barrel calorimeters (1.37 < |η| < 1.52) are not considered.Like for the muons, isolation criteria are applied to suppress non-prompt candidates originating e.g. from hadronic showers.
Jets are reconstructed using the anti-k t jet clustering algorithm [18,19] with a radius parameter R = 0.4, taking topological energy clusters [20] in the calorimeters as inputs.Jet energies are corrected for the contribution of multiple interactions using a technique based on jet area [21] and are calibrated using p T -and η-dependent correction factors determined from simulation and data [22][23][24].Jets are required to be reconstructed in the range |η| < 4.5 and to have p T > 30 GeV.To reduce the contamination by jets from additional pp interactions in the same or neighbouring bunch crossings (pile-up), tracks originating from the primary vertex must contribute a large fraction of the p T when summing the scalar p T of all tracks associated with the jet.This jet vertex fraction (JVF) is required to be at least 50% for jets with p T < 50 GeV and |η| < 2.4.Jets with no associated tracks are retained.
Hadronically decaying τ leptons are reconstructed starting from clusters of energy depositions in the electromagnetic and hadronic calorimeters.The τ had4 reconstruction is seeded by the anti-k t jet-finding algorithm with a radius parameter R = 0.4.Tracks with p T > 1 GeV within a cone of size ∆R = 0.2 around the cluster barycentre are assigned to the τ had candidate.Its momentum is calculated from the topological energy clusters associated with the jet seed after applying a dedicated τ had energy calibration.The τ had charge is determined from the sum of the charges of the associated tracks.The rejection of jets is provided in a separate identification step using discriminating variables based on tracks with p T > 1 GeV and the energy deposited in calorimeter cells found in the core region (∆R < 0.2) and in the region 0.2 < ∆R < 0.4 around the τ had candidate's direction.Such discriminating variables are combined in a boosted decision tree and three working points, labelled tight, medium and loose [25], are defined, corresponding to different τ had identification efficiency values.In the studies presented in this paper, τ had candidates with p T > 20 GeV and |η| < 2.47 are used.The τ had candidates are required to have one or three reconstructed tracks with a total charge of ±1 and to satisfy the medium criteria, which provide an identification efficiency of the order of 55-60%.Dedicated criteria [25] to separate τ had candidates from misidentified electrons are also applied, with a selection efficiency for true hadronic τ decays of 95%.The probability to misidentify a jet with p T > 20 GeV as a τ had candidate is typically 1-2%.
Following their reconstruction, candidate leptons, hadronically decaying τ leptons and jets may point to the same energy deposits in the calorimeters.Two reconstructed objects are considered to overlap if their separation ∆R is smaller than 0.2.Such overlaps are removed by selecting objects in the following order of priority (from highest to lowest): muons, electrons, τ had , and jet candidates.Objects with lower priority are discarded when overlapping with another object with higher priority.The leptons that are considered in overlap removal with τ had candidates need only to satisfy looser criteria than those defined above, to reduce misidentified τ had candidates from leptons.The p T threshold of muons considered in overlap removal is also lowered to 4 GeV.
The missing transverse momentum (with magnitude E miss T ) is reconstructed using the energy deposits in calorimeter cells calibrated according to the reconstructed physics objects (e, γ, τ had , jets and µ) with which they are associated [26].The transverse momenta of reconstructed muons are included in the E miss T calculation, with the energy deposited by these muons in the calorimeters taken into account.The energy from calorimeter cells not associated with any physics object is scaled according to a soft-term vertex fraction and also included in the E miss T calculation.This fraction is the ratio of the summed scalar p T of tracks from the primary vertex not matched with objects to the summed scalar p T of all tracks in the event also not matched to objects.This method allows a better reconstruction of the E miss T in high pile-up conditions [27].
3 Data samples and event selection

Event samples
The studies presented in this paper are based on data recorded with ATLAS during the 2012 LHC run at a proton-proton centre-of-mass energy √ s = 8 TeV.After data-quality requirements, these correspond to an integrated luminosity of 20.3 fb −1 .
For the validation of the τ-embedding procedure, samples of Monte Carlo simulated (MC) events with Z → µµ and Z → ττ decays are used as input or as reference, respectively.Simulated events are produced with the Alpgen [28] event generator employing the MLM matching scheme [29] between the hard process (calculated with leading-order matrix elements for up to five partons) and the parton shower.The Cteq6L1 parameterisation of the parton distribution functions [30] is used and the Pythia8 program [31] provides the modelling of the parton shower, the hadronisation and the underlying event.A full simulation of the ATLAS detector response [32] using the Geant4 program [33] is performed.In addition, events from minimum-bias interactions are simulated using the AU2 [34] tuning of Pythia8.They are overlaid with the simulated signal and background events according to the luminosity profile of the recorded data.The contributions from these pile-up interactions are simulated both within the same bunch crossing as the hard-scattering process and in neighbouring bunch crossings.Finally, the resulting simulated events are processed through the same reconstruction programs as the data.
In the simulation of the Z → ττ decays that are embedded into the Z → µµ input events as described in Section 4.1, the τ decay products are generated using Tauola [35], and Photos [36] provides photon radiation from charged leptons.
From these datasets, the following event samples are derived: • Replacing the muons from recorded Z → µµ data events with τ leptons from simulated Z → ττ decays as described in Section 4.1 results in τ-embedded data, which are the standard event samples used in physics analyses to model Z → ττ processes.
• µ-embedded data are obtained by using simulated Z → µµ decays instead of Z → ττ decays to replace the muons in the Z → µµ input data events.These make it possible to study systematic effects of the embedding procedure in comparatively simple final states.While the τ-embedded samples are based on the full 2012 dataset, the µ-embedded validation is restricted to a subset corresponding to an integrated luminosity of 1.0 fb −1 .
• Using simulated instead of data Z → µµ events as input yields µor τ-embedded MC samples.These can then be compared to direct simulations of these processes.
• In order to study effects originating from the reconstruction of the input muons as well as of finalstate radiation, alternative embedded MC samples are produced, where the kinematics of the embedded objects are derived from the generator-level muons instead of the reconstructed momenta.
In the following, this is referred to as generator-seeded embedding, as opposed to the standard detector-seeded procedure.

Event selection
For the studies presented below, events are selected from one or several of the samples listed in Section 3.1 using one of the following sets of criteria.In all cases, standard quality criteria are applied to ensure a fully operational detector and well-reconstructed events.
• Z → µµ selection: Collision events are selected using a combined dimuon trigger, with p T thresholds of 18 GeV for the leading muon and 8 GeV for the sub-leading muon, or a single-muon trigger (p T (µ) > 24 GeV).
Only events with at least two good-quality muons (cf.Section 2.2) are accepted.The leading (subleading) muon is required to fulfil p T (µ) > 20 (15) GeV.Both muons must be isolated in the ID, which is ensured by requiring the scalar sum of other track transverse momenta in an isolation cone of size ∆R = 0.4 to be smaller than 20% of the muon transverse momentum (I(p T , 0.4)/p T (µ) < 0.2).Only events containing at least one such opposite-charge muon pair with an invariant mass m µµ > 40 GeV are considered.
• Z → ττ selection: The ττ selection is adopted from the H → ττ lepton-hadron-channel analysis documented in Ref. [1].Both in simulated and recorded data samples, single-electron or single-muon triggers with a lepton p T threshold of 24 GeV are used to select events, in which exactly one τ candidate with p T (τ had ) > 20 GeV fulfilling the medium identification criteria and either exactly one electron or exactly one muon with p T (e/µ) > 26 GeV are required.In addition to a track isolation of I(p T , 0.4)/p T (e/µ) < 0.06, a calorimeter isolation of I(E T , 0.2)/p T (e/µ) < 0.06 is applied to the leptons, i.e. the scalar sum of the transverse energy deposited in calorimeter cells within ∆R < 0.2 not associated with the candidate is calculated and required to be smaller than 6% of the total transverse momentum of the muon or the total transverse energy of the electron.
• Boosted Z-enriched selection: The H → ττ lepton-hadron-channel analysis documented in Ref. [1] considers two signal event categories: a VBF category enriched in vector-boson fusion Higgs production events and a boosted category targeting mainly events with high-p T Higgs bosons produced via gluon-gluon fusion.For the boosted category, a corresponding Z-enriched control sample is defined, which is adopted here to illustrate the τ-embedding performance within physics analyses, see Section 5.2.This sample includes events that pass the Z → ττ selection described above but fail the VBF category selection detailed in Ref. [1].In addition, the p T of the Z candidate reconstructed from the vector sum of momenta of the visible τ decay products and the missing transverse momentum is required to exceed 100 GeV.In order to further enhance the fraction of Z events, W decays are suppressed by considering only events with a transverse mass 5 m T < 40 GeV.Potential contamination by Higgs signal events is avoided by requiring the invariant mass m MMC ττ of the ττ pair not to exceed 110 GeV.This mass is reconstructed from the visible τ decay products and the missing transverse momentum with the so-called missing mass calculator (MMC) [37].

Embedding
In the following, the τ embedding method is described in more detail.Special properties of the resulting event samples and embedding-specific systematic uncertainties are also discussed.

Procedure
The τ embedding procedure can be separated into five consecutive steps as depicted in the flowchart shown in Figure 1.After selecting the Z → µµ input event, a corresponding Z → ττ decay is generated and passed to a full detector simulation.The muons in the input event are then replaced by the τ leptons from the simulated Z decay.As a final step, a re-reconstruction of the resulting hybrid event is necessary, since it would be insufficient to combine the event information at the level of fully reconstructed physics objects.For example, the additional calorimeter energy depositions from pile-up events can change the results of the E miss T reconstruction, and the identification of hadronic τ decays is particularly sensitive to the details of the calorimeter response.In contrast, corresponding effects on the reconstruction of charged-particle tracks from the individual tracking detector hits are expected to be negligible for the data-taking conditions and the phase space relevant to the Higgs analyses of the 8 TeV data.Therefore, the embedding procedure is performed at the level of calorimeter cells and reconstructed tracks, as described in more detail below.

Selection of the Z → µµ input events from the collision data:
Input events for the embedding procedure are obtained according to the Z → µµ selection described in Section 3.2.For events with more than two muon candidates, all possible oppositely-charged pairs with a common vertex are formed, and the muon pair with m µµ closest to the Z boson mass is chosen as the Z → µµ candidate decay products.
; ∆φ is the azimuthal angle between the directions of the electron or muon and the missing transverse momentum vector.From the selected muons in a collision data event, the four-momenta of a corresponding Z → ττ decay are derived: the production vertex of the τ leptons is set to the common production vertex of the reconstructed muon pair, and each muon is then replaced by a τ lepton.The τ four-momenta are rescaled according to thus keeping the energy E µ unchanged but replacing the muon mass with the τ mass m τ .
The resulting Z → ττ kinematics as obtained from the Z → µµ events is processed with Tauola and Photos.Here, the decay of each τ lepton pair by Tauola takes into account the polarisation and spin correlations of the τ leptons.The Z polarisation, however, depends on the parton configuration of the initial state, which is not directly available here.During the generation of the decays, Tauola therefore assumes an average polarisation of zero and assigns a random helicity of ±1 to each Z boson.The actual non-zero average Z polarisation is correctly accounted for by applying event weights obtained with the TauSpinner program [38,39], which infers the most probable configuration of the initial partons and thus the helicity of the Z boson from the decay product kinematics.
b) Kinematic filter for the decay products: If the generation of Z → ττ decays were purely based on the probability distributions of the actual decay kinematics, a large fraction of the embedded Z → ττ decay products would fail the selection criteria of typical physics analyses.In particular the leptonic τ decays would often end up below the relevant transverse momentum thresholds.Therefore, a kinematic τ decay filter is implemented at generator level in order to increase the effective number of τ-embedded Z → µµ events entering the ττ selection.Instead of generating only one ττ decay for each Z collision data event, the Tauola program is used to produce 1000 different kinematic configurations of the decay products according to the appropriate probability distributions.Only the first of the 1000 decay configurations in which the generated transverse momenta of the visible decay products (e/µ/τ had ) exceed certain threshold values is Figure 2: Generator-level distributions of (a) the τ had transverse momentum and (b) the summed transverse momenta of all neutrinos for τ-embedded events without filter (red open circles), after applying the filter (blue squares) and after applying the filter with filter weights (black triangles) as described in the text.The lower panels show the relative deviation of the corrected distributions from the unfiltered ones.The red shaded error band and the black error bars correspond to the statistical uncertainty from the unfiltered and filtered events, respectively.then selected for further processing.The thresholds can be chosen based on the final ττ analysis selection; for this paper, as for the H → ττ lepton-hadron-channel analysis presented in Ref. [1], they were set to p T (τ had ) > 15 GeV, p T (e) > 18 GeV and p T (µ) > 15 GeV, i.e. safely below the ττ analysis selection thresholds of p T (τ had ) > 20 GeV and p T (e/µ) > 26 GeV.The selection of ττ decays according to these thresholds introduces kinematic biases as shown in Figure 2(a) for the visible momentum of hadronic τ decays and in Figure 2(b) for the vector sum of the neutrino transverse momenta, which corresponds to the expected missing transverse momentum in the event.Based on all 1000 ττ decays generated for the given Z kinematics, the probability to accept a random ττ decay configuration is evaluated for each event.These probabilities correspond to event-by-event filter efficiencies and are thus propagated as weights, which correct the kinematic biases as demonstrated in Figure 2.

Detector simulation of the Z → ττ decay:
The result corresponds to a standard event generator output for a Z → ττ decay without any underlying-event effects, but otherwise based on the standard ATLAS MC configuration [32], which is then handed over to the full ATLAS detector simulation and reconstruction.In order to avoid double counting in the later merging with the corresponding collision data event, the calorimeter noise is switched off during the simulation.In the following, the output of this simulation step is referred to as a mini event.

Merging of data and simulated event:
In order to replace the muons in the selected Z → µµ data events with the corresponding simulated τ leptons, all tracks associated with the original muons are removed from the data event.
The calorimeter cells associated with the muons are subtracted according to the following procedure: a Z → µµ decay with the same kinematics as the original event (and without the underlying event or pile-up interactions) is simulated.The calorimeter cell energies in the simulated event are subtracted from the data event.
All calorimeter cell energies from the simulated mini event are then added to the corresponding data cell energies, and all tracks are copied into the corresponding event.This inserts the pure Z → ττ decay into the data environment, keeping the event properties as close to data conditions as possible.
5. Reconstruction of the embedded events: Starting from the modified cell energies and the merged set of tracks, the hybrid Z → ττ events are submitted to the ATLAS event reconstruction for collision data, which recreates the complete physics object final state by re-running all standard event reconstruction algorithms except for the track reconstruction.
The procedure is further illustrated by Figure 3, showing example displays of a Z → µµ input event, a correspondingly simulated Z → ττ mini event (with one τ lepton decaying into a muon and the other one hadronically), and the resulting embedded hybrid event.

Special properties of the τ-embedded event samples
While in most respects the τ-embedded samples can be treated within physics analyses as standard collision data, there are a few special properties to be considered: • The Z → µµ input data are subject to trigger and offline selection efficiencies, which particularly affect analyses with low p T selection thresholds for the τ decay products.To account for these efficiencies, correction factors as a function of the transverse momenta and pseudorapidities of the input muons are extracted according to Refs.[15,16] and applied to the τ-embedded samples.
• As discussed in Section 4.1, instead of recreating the charged-particle tracks from the tracking detector hits, the embedding procedure is performed with reconstructed tracks.As a side effect, the trigger response for the τ-embedded events is not available, since it would require the hit-level information.Therefore, any effect of the analysis-specific trigger selection needs to be evaluated and corrected for, e.g. through a parameterisation of the trigger efficiency measured in data.For the validation in Section 5.2, such corrections were derived corresponding to the Z → ττ selection described in Section 3.2 and applied to the τ-embedded samples.
• The selected Z → µµ input data sample is of high purity, but small contaminations from other processes, e.g.t t production, might be enhanced to relevant levels by selection requirements applied during physics analyses.Double counting of these contributions must hence be avoided when combining the τ-embedded events with other samples to construct a complete background model.In recent analyses, e.g. in Ref. [1], this is achieved by rejecting events from simulated samples of other background processes if they produce two τ leptons that fulfil the kinematic Z → µµ input selection at generator level.The corresponding ττ final states are already included in the τ-embedded sample as obtained from the corresponding µµ background contamination from other processes.
• In deriving the kinematics of the embedded τ leptons from the reconstructed muons selected from the ATLAS data, the true kinematics of the Z decay are folded with the resolution of the muon reconstruction.Final-state radiation (FSR) from the input muons can also modify the kinematics of the embedded objects.Both effects are unavoidable and inseparable in the embedding of data events, but they can be studied separately using simulated samples and are found to be small (cf.Section 5.2).
• While the τ-embedded samples constitute a largely data-driven model of Z → ττ events, the τ leptons and their decay products are based on simulation, and systematic uncertainties associated with the MC description of τ decays and the corresponding detector response need to be considered within physics analyses.Further documentation of these systematic uncertainties, e.g. for the hadronic τ decays, can be found in Ref. [25].
• The size of the τ-embedded samples is naturally limited by the available number of Z → µµ data events.Compared to a corresponding selection of ττ final states from the data, this number is effectively enhanced by applying the kinematic filter described in Section 4.1.

Systematic uncertainties
Two different sources of systematic uncertainty are considered, which are motivated by the technical implementation of the embedding method and are thus estimated from the following variations of the embedding procedure: 1.The isolation requirement applied in the selection of the Z → µµ input events can affect the environment of the embedded objects in the final event.It is thus varied in two alternative selections: the nominal isolation criterion of I(p T , 0.4)/p T (µ) < 0.2 is either completely removed or tightened to I(p T , 0.4)/p T (µ) < 0.06 and I(E T , 0.2)/p T (µ) < 0.04.These variations mainly affect the properties of the embedded objects, but they additionally provide an estimate of the background contamination from µµ final states with non-prompt muons in the τ-embedded samples.
2. The subtraction of cell energy associated with the muon is based on the simulated calorimeter response, which can be subject to large uncertainties.Therefore, the simulated energy in each cell is scaled by ±20% before the subtraction from the data event.The size of this variation was motivated by the results of comparisons of τ-embedded collision-data and simulated events to standard Z → ττ MC samples.
For all embedded event samples listed in Section 3.1, the different variations are produced in parallel.
The resulting datasets are then used to derive and validate the embedding-related systematic uncertainties.Different selection efficiencies, e.g.due to the modified isolation requirements, are absorbed by normalising the systematic variations to the default sample.For both estimates of systematic uncertainties, the remaining shape uncertainties are later symmetrised to the larger of the two variations, in particular compensating for the non-symmetric isolation criteria.Figure 4 illustrates the effect on the distributions of two example quantities after the Z → ττ selection as described in Section 3.2.
Modifications of the input muon kinematics due to final-state radiation or the detector resolution, which could be considered as somewhat more fundamental sources of systematic effects, do not directly enter the above definitions of embedding-related uncertainties.Their impact is, however, expected to be correlated with the variations of the cell energy subtraction and the muon isolation and in fact turns out to be small in comparison, as demonstrated in Section 5.2.
Normalised events / 0.25 GeV Figure 4: Distributions of (a) the calorimeter isolation of the selected lepton and (b) the ττ invariant mass obtained with the MMC, illustrating the effects of systematic variations as described in the text: scaling the subtracted cell energy by ±20% and applying tight / no isolation requirements in the Z → µµ selection.The ratios of the distributions before and after specific systematic variations are included as well: the upper ratio plot shows the effect of no (tight) isolation in blue (green), in the lower one the effect of scaling the subtracted cell energy by +20% (-20%) is illustrated by triangles pointing upwards (downwards).In both plots the red lines correspond to the nominal embedded sample.

Validation
A careful validation of the embedding procedure is performed based on different combinations of the event samples described in Section 3.1.The results of these studies are discussed in the following.All distributions are normalised to unit area unless stated otherwise.

Z → µµ-based validation
The first set of studies is based on muon-embedded data and MC samples, where the original muons are removed and replaced with the decay products of correspondingly simulated Z → µµ decays.In this case, events with Z → µµ decays and jets constitute both the input and the output samples and thus distributions of any quantity for the same events before and after the embedding can be compared directly.Such comparisons provide a powerful validation of most aspects of the procedure by testing for biases introduced in the removal of tracks and cells associated with the input muons, the stand-alone simulation of the Z mini event or the creation and re-reconstruction of the embedded hybrid event.None of the trigger and selection efficiency corrections discussed in Section 4.2 are applied here.
In order to investigate possible distortions of the detector response close to the input muons, Figure 5 compares the distributions of the absolute (I(E T , 0.2)) and relative (I(E T , 0.2)/p T ) muon calorimeter isolation as defined in Section 3.2, before and after µ embedding.Here, the displayed errors do not include the isolation systematic uncertainty, which is obtained by varying an explicit cut on the relative calorimeter isolation, cf.Section 4.3, and is thus not well defined in these specific comparisons.The observed changes in the distributions, which indicate fluctuations in the estimation of the calorimeter energy associated to the input muons based on an independent simulation discussed in Section 4.1, are not fully covered by the remaining embedding-specific uncertainties.However, this mainly concerns negative isolation values, which are far away from standard isolation requirements as also used for the studies presented in this paper, and the region with I(E T , 0.2)/p T > 0.04, where the undisplayed isolation uncertainty becomes very large by construction.In corresponding comparisons, the kinematics of additional jets in the event are found to be unaffected by the embedding procedure.
For quantities directly related to the muon four-momenta, most changes are found to be within the uncertainties; for example, Figure 6(a) shows the transverse momentum of the leading muon.In some cases, however, larger effects are observed, in particular for the dimuon invariant mass as depicted in Figure 6(b); small differences are also found at the low end of the distributions of the transverse momentum of the dimuon system and of the missing transverse momentum.Such differences are actually expected since the kinematics of the embedded events are based on reconstructed input muons and thus are potentially modified by the detector resolution and final-state radiation (FSR), as explained in Section 4.2.This is investigated further by using generator-seeded embedded samples, where simulated Z → µµ events are used as input and the kinematics of the embedded objects is derived from the generator-level muon momenta instead of the reconstructed information, cf.Section 3.1, thus removing FSR and muon reconstruction effects.This indeed improves the agreement in the muon-related distributions shown for the leading muon p T and the dimuon mass in Figure 7.While these simulation-based studies confirm the source of the differences in Figure 6, muon reconstruction and FSR effects unavoidably enter the embedding of data events.For the eventual applications of τ embedding, however, these differences turn out to be negligible as demonstrated in the next section.

Z → ττ-based validation
The Z → µµ-based results presented above already provide confidence that the technical implementation of the embedding procedure is working correctly.Nevertheless, direct comparisons of ττ final states must also be performed in order to conclusively validate the modelling of Z → ττ events provided by the final τ-embedded samples.Since it is difficult to obtain a sufficiently pure Z → ττ reference sample from the collision data, the validation is mainly based on comparisons of τ-embedded Z → µµ MC events to standard Z → ττ MC samples.Still, comparisons of selected Z → ττ collision data to a combined background model including τ-embedded data are also provided in the last part of this section.

Input muon radiation and reconstruction effects
The embedding procedure includes two effects related to the input muons that are unavoidable by construction: the resolution of the reconstructed muon momenta used to derive the kinematics of the embedded mini event and FSR from the input muons.In order to judge if the resolution effects observed in Section 5.1 are significant for the eventual τ embedding, Figure 8 compares the distributions of the τ decay lepton transverse momentum and of the invariant mass of the visible ττ decay products, m vis ττ , for generator-and detector-seeded τ embedding.These comparisons demonstrate that the uncorrected resolution and final-state radiation of the input muons are negligible in the case of reconstructed ττ final states, for which the mass resolution is dominated by the neutrinos produced in the τ decay.Figure 8: Comparison of generator-seeded (gen.-s.), in blue, and detector-seeded (det.-s.), as black points, τembedded Z → µµ MC events: (a) transverse momentum of the leading lepton and (b) invariant mass of the visible ττ decay products, each including ratios showing the relative differences of the distributions from detector-seeded τ embedding.The blue error band in the ratio plots corresponds to the statistical uncertainties of the generatorseeded events, and the black error bars are the statistical uncertainties associated with the detector-seeded embedded events.The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical uncertainties of the detector-seeded τ-embedded events.
Comparison of τ-embedded Z → µµ MC samples with standard Z → ττ MC In contrast to a data-to-data comparison of ττ final states, which necessarily includes contaminations from other background processes, the τ embedding of simulated Z → µµ events and subsequent comparison to standard Z → ττ MC samples provides a well-defined way to further study the method at the ττ level.Here, as opposed to the studies presented in Section 5.1, the two compared distributions are obtained from statistically independent event samples.Also, the corrections discussed in Section 4.2, including those related to the selection of the Z → µµ events used as input for the embedding procedure and to the trigger selection of the τ decay products, now need to be applied.The combined effect of these corrections is shown in Figure 9 for the distributions of two quantities closely related to their source: the pseudorapidity and the transverse momentum of the τ decay lepton.The mismodelling of the pseudorapidity before corrections, cf. Figure 9(a), is due to detector acceptance differences between the input muons and embedded τ objects.While the corrections have a visible effect here, their impact is found to be very small for the lepton p T shown in Figure 9(b) and also for any other of the investigated quantities.Even after corrections, the modelling of the pseudorapidity is not perfect but, as demonstrated below, this has no impact on observables relevant for physics analyses.
Further examples of such comparisons, from here on omitting the uncorrected distributions, are collected in Figure 10 and Figure 11.Figures 10(a Figure 9: Comparison of τ-embedded Z → µµ MC events (black points) with Z → ττ MC events (blue) for (a) the pseudorapidity and (b) the transverse momentum of the τ decay lepton, each including ratios showing the relative differences of the τ-embedded distributions.In addition, the red squares show the distributions obtained from the τ-embedded MC sample before applying the embedding-specific corrections.The blue error band in the ratio plots corresponds to the statistical uncertainties of the Z → ττ MC sample.The black error bars are the statistical uncertainties associated with the corrected τ-embedded events.The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical uncertainties of the corrected τ-embedded events.deposited within ∆R< 0.1 and ∆R< 0.2 around the τ candidate direction, and the leading-track momentum fraction, i.e. the transverse momentum of the highest-p T charged particle divided by the calorimetric transverse energy within ∆R< 0.2 [25].Agreement of the distributions within statistical and embeddingrelated systematic uncertainties indicates that the detector response to embedded τ leptons does not differ significantly from the standard Z → ττ MC samples.This is further confirmed by the fact that the τ identification efficiency is found to agree for τ-embedded and standard MC samples for all working points defined in Ref. [25] within uncertainties.Agreement is also observed for the kinematics of the Z decay products, as demonstrated for the τ had p T and m vis ττ in Figures 10(c) and 10(d).Figures 11(a) and 11(b) compare the distributions of the missing transverse momentum, arising from the simulated τ decay neutrinos and reconstruction effects, and of m MMC ττ .Again, no significant differences are observed and the same conclusions are reached for jet-related quantities, such as the leading-jet p T and the pseudorapidity separation of the two leading jets shown in Figures 11(c) and 11(d) .
Thus, the τ-embedded Z → µµ MC events and standard Z → ττ MC events are found to agree in all distributions identified to be relevant for physics analyses within the statistical and embedding-related systematic uncertainties described in Section 4.3.These comparisons include effects from the modification of the input muon kinematics due to final-state radiation and resolution and thus confirm that such effects are also covered by the current τ-embedding uncertainties.

Performance within physics analyses
In a final step, the τ-embedded Z → µµ collision data events are used as part of a combined background model and compared to data in the boosted Z-enriched control region defined in Section 3.2.Due to showing the relative differences of the data to the total background estimate.The background contributions from other processes and the systematic uncertainties are estimated as described in Ref. [1].
significant contributions from other background processes, this is not a clean, stand-alone validation of the embedding method but involves other background estimation procedures, performed exactly as in Ref. [1].Since the selection of Z → µµ data events used as input for the embedding procedure includes a cut on the invariant mass m µµ > 40 GeV, low mass Drell-Yan processes with ττ final states are not modelled via the embedding technique.Instead, these contributions are separately estimated from simulated event samples.A few example comparisons are given in Figures 12 and 13.In those distributions the embedded samples are normalised to data in a dedicated region as described in Ref. [1].The combined background distributions, dominated by the embedding-based Z → ττ model, are found to provide a good description of the ATLAS data within the uncertainties, which here also include other relevant uncertainties related to the estimation of the other background contributions as described in Ref. [1]. Figure 13: Comparison of data with the combined background model for example observables in the boosted Zenriched control region: transverse momentum of (a) the Z boson and (b) the leading jet, each including ratios showing the relative differences of the data to the total background estimate.The background contributions from other processes and the systematic uncertainties are estimated as described in Ref. [1].

Summary and conclusions
This paper presented the motivation, concept and technical implementation of a τ-embedding method, which models events with Z → ττ decays and possibly additional jets in a largely data-driven way.In Z → µµ events selected from pp collision data recorded with the ATLAS experiment during the LHC Run1, tracks and calorimeter cell energies associated with the Z decay muons are replaced by the corresponding tracks and energy depositions of the τ leptons from simulated Z → ττ decays.For each event, the τ kinematics are derived from the original Z → µµ decay, so that their correlations with other event properties such as additional jets and the reconstructed missing transverse momentum are preserved in the resulting hybrid Z → ττ events.Systematic uncertainties are estimated by varying the muon isolation requirement and the subtracted energy depositions associated with the muons.Extensive validation studies were performed using both the µµ and ττ final states, presented here only for the example where one τ lepton decays into an electron or muon and the other hadronically.The µµ-based results demonstrate that the procedure successfully replaces objects in the data events without affecting other event properties.Comparing τ-embedded Z → µµ MC events with standard Z → ττ MC, agreement was found for distributions of all quantities relevant to current physics analyses within the combined statistical and embedding-related systematic uncertainties.Other conceptual limitations of the method related to the input of reconstructed muon kinematics are found to introduce only small effects compared to the uncertainties estimated from variations of the method.For Higgs analyses in ττ final states, which exploit intricate signatures of additional jets and their correlation with the ττ decay kinematics, the τembedded data thus provide a reliable model of the irreducible background from events with Z → ττ decays and jets.
Villeurbanne, France a Also at Department of Physics, King's College London, London, United Kingdom b Also at Institute of Physics, Azerbaijan Academy of Sciences, Baku, Azerbaijan c Also at Novosibirsk State University, Novosibirsk, Russia d Also at TRIUMF, Vancouver BC, Canada e Also at Department of Physics, California State University, Fresno CA, United States of America f Also at Department of Physics, University of Fribourg, Fribourg, Switzerland g Also at Departamento de Fisica e Astronomia, Faculdade de Ciencias, Universidade do Porto, Portugal h Also at Tomsk State University, Tomsk, Russia i Also at CPPM, Aix-Marseille Université and CNRS/IN2P3, Marseille, France j Also at Universita di Napoli Parthenope, Napoli, Italy k Also at Institute of Particle Physics (IPP), Canada l Also at Particle Physics Department, Rutherford Appleton Laboratory, Didcot, United Kingdom m Also at Department of Physics, St. Petersburg State Polytechnical University, St. Petersburg, Russia n Also at Louisiana Tech University, Ruston LA, United States of America o Also at Institucio Catalana de Recerca i Estudis Avancats, ICREA, Barcelona, Spain p Also at Department of Physics, National Tsing Hua University, Taiwan q Also at Department of Physics, The University of Texas at Austin, Austin TX, United States of America r Also at Institute of Theoretical Physics, Ilia State University, Tbilisi, Georgia s Also at CERN, Geneva, Switzerland t Also at Georgian Technical University (GTU),Tbilisi, Georgia u Also at Manhattan College, New York NY, United States of America v Also at Hellenic Open University, Patras, Greece w Also at Institute of Physics, Academia Sinica, Taipei, Taiwan x Also at LAL, Université Paris-Sud and CNRS/IN2P3, Orsay, France y Also at Academia Sinica Grid Computing, Institute of Physics, Academia Sinica, Taipei, Taiwan z Also at School of Physics, Shandong University, Shandong, China aa Also at Moscow Institute of Physics and Technology State University, Dolgoprudny, Russia ab Also at Section de Physique, Université de Genève, Geneva, Switzerland ac Also at International School for Advanced Studies (SISSA), Trieste, Italy ad Also at Department of Physics and Astronomy, University of South Carolina, Columbia SC, United States of America ae Also at School of Physics and Engineering, Sun Yat-sen University, Guangzhou, China a f Also at Faculty of Physics, M.V.

Figure 1 :
Figure 1: Flowchart of the embedding procedure.

2 .
Generation of a corresponding Z → ττ decay: a) Substitution of muons with τ leptons and subsequent τ decays:

Figure 3 :
Figure 3: Displays of (a) a Z → µµ candidate event selected from the collision data, (b) the corresponding simulated Z → ττ mini event and (c) embedded hybrid event.Here, one of the τ leptons decays into a muon and the other one hadronically.

Figure 5 :Figure 6 :
Figure5: Comparison of Z → µµ data events before (blue) and after µ embedding (black points) in terms of (a) the calorimeter isolation and (b) the relative calorimeter isolation in a cone ∆R = 0.2, each including ratios showing the relative differences of the distributions after µ embedding.The grey hatched error band corresponds to the cell energy systematic uncertainties of the µ-embedded events, as described in Section 4.3.

Figure 7 :
Figure 7: Comparison of Z → µµ MC events (blue) and generator-seeded µ embedding (black points): (a) transverse momentum of the leading muon and (b) dimuon mass, each including ratios showing the relative differences of the distributions after generator-seeded µ embedding.The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical uncertainties of the µ-embedded events.
) and 10(b) show the distributions of two of the input quantities for the hadronic τ identification: the central energy fraction, which is the ratio of the transverse energy )

Figure 10 :Figure 11 :
Figure 10: Comparison of τ-embedded Z → µµ MC events (black points) with Z → ττ MC events (blue): (a) central energy fraction, (b) leading-track momentum fraction for three-prong hadronic τ decays, (c) τ had transverse momentum and (d) mass of the visible ττ decay products, each including ratios showing the relative differences of the τ-embedded distributions.The blue error band in the ratio plots corresponds to the statistical uncertainties of the Z → ττ MC sample, and the black error bars are the statistical uncertainties associated with the τ-embedded events.The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical uncertainties of the τ-embedded events.

Figure 12 :
Figure 12: Comparison of data with the combined background model for example observables in the boosted Z-enriched control region: (a) τ had transverse momentum, (b) invariant mass of the visible ττ decay products (c) missing transverse momentum and (d) the ττ invariant mass obtained with the MMC, each including ratiosshowing the relative differences of the data to the total background estimate.The background contributions from other processes and the systematic uncertainties are estimated as described in Ref.[1].
Lomonosov Moscow State University, Moscow, Russia ag Also at National Research Nuclear University MEPhI, Moscow, Russia ah Also at Department of Physics, Stanford University, Stanford CA, United States of America ai Also at Institute for Particle and Nuclear Physics, Wigner Research Centre for Physics, Budapest, Hungary a j Also at Department of Physics, The University of Michigan, Ann Arbor MI, United States of America ak Also at University of Malaya, Department of Physics, Kuala Lumpur, Malaysia * Deceased