Search for Higgs decaying to WW at CMS

A search for a standard model Higgs boson decaying into a W+W− pair is presented, using CMS proton-proton collision data samples both at 7 TeV and 8 TeV center-of-mass energy, corresponding to an integrated luminosity of 4.9 ± 0.2 fb−1 and 12.1 ± 0.5 fb−1 respectively. Both fully-leptonic and semi-leptonic W decay final states are considered. Upper limits are set on the Higgs boson production relative to the standard model expectation. The standard model Higgs boson is excluded at 95% confidence level in the mass ranges [128–600] GeV and [215–490] + [525–600] GeV using the fully-leptonic and semi-leptonic channels respectively. In the fully-leptonic final state, an excess of events is observed above background which is consistent with the expectations from a standard model Higgs boson of mass 125 GeV and has a statistical significance of 3.1 standard deviations.


Introduction
In the context of the Standard Model (SM) of particle physics, the origin of the masses of fundamental particles remains an open issue. The electroweak symmetry breaking mechanism, which causes vector bosons to acquire masses, rely on the existence of the Higgs field and its quantum, the Higgs boson, which is yet to be proven experimentally. One of the main goals of the CERN Large Hadron Collider (LHC) physics program is to investigate on the existence of the SM Higgs boson.
In particular, searches of the Higgs boson are being carried on in the two multipurpose experiments located on the LHC ring, ATLAS and CMS. In these proceedings, I report about two particular searches performed at CMS, both exploiting the Higgs decay into two W bosons: the first exploits the fully-leptonic final state, where each W boson decays in a lepton and a neutrino; the second explores the semi-leptonic final state, where one W decays leptonically and the other hadronically.
The fully-leptonic analysis, H ! W + W !`⌫``⌫` [1], is performed in the 110 600 GeV Higgs mass range, while the semi-leptonic one, H ! W + W !`⌫`qq [2], is restricted to the 170 600 GeV range, due to its dependency on single lepton high p T triggers. Both the analyses exploit the full 2011-2012 LHC pp collision dataset, corresponding to an integrated luminosity of 12.1 ± 0.5 fb 1 , at p s = 8 TeV, and 4.9 ± 0.2 fb 1 , at p s = 7 TeV.

Event reconstruction and physics objects
The CMS detector [3] is used to reconstruct proton-proton collisions. A full event reconstruction aimed at characterizing and identifying all stable particles in the event is obtained via a particleflow (PF) technique [4]. This approach combines the information from all CMS sub-detectors to identify and reconstruct individual particles, classifying them into mutually exclusive categories: charged hadrons, neutral hadrons, photons, electrons, and muons.
The electron reconstruction algorithm links clusters of energy deposits in the ECAL to trajectories in the inner tracker. To reconstruct the trajectories in the tracker, a dedicated model of electron energy loss is developed, and series of hits are fitted with a Gaussian sum filter. The electron candidates are filtered with an identification algorithm, relying on a multivariate technique that combines observables sensitive to the amount of bremsstrahlung along the electron trajectory, the geometrical and momentum matching between the electron trajectory and the associated clusters, and shower-shape observables.
The muon reconstruction algorithm exploits both the silicon tracker and the muon system. Segments in the muon spectrometer are matched with energy deposits in the calorimeters and tracks in the silicon detector, and muon candidates are built and filtered with quality criteria.
PF candidates are clustered together into Jets, using the anti-k T algorithm [5] with distance parameter R = 0.5. Then, to account for the non-linear response of the calorimeters to the particle energies, and other instrumental e↵ects, jet energy corrections are applied. The corrections are based on in-situ calibration using dijet and + jet data samples. The pileup energy, computed as the median energy density in each event, is subtracted from each jet. Jets are additionally required to originate from the primary vertex, which is identified as the vertex with the highest summed p 2 T of its associated tracks. Out of this collection, b-jets are tagged by looking at the jet displacement with respect to the primary vertex in the transverse direction. In addition, a multivariate selection is applied to separate jets from the primary interaction from those reconstructed due to energy deposits associated with pile-up, based on the di↵erences in the jet shapes, in the relative multiplicity of charged and neutral components and in the di↵erent fraction of transverse momentum which is carried by the hardest components.
At the end of the reconstruction sequence, the missing transverse energy E miss T is found as the modulus of the negative vector sum of the transverse momenta of all reconstructed particles in an event.
Since charged leptons from W and Z boson decays are typically expected to be isolated from other activity in the event, lepton isolation is ensured by computing the sum of the transverse energies of all reconstructed particles, charged or neutral, within a cone of R = p ( ⌘) 2 + ( ) 2 < 0.4 around the lepton direction. In the computation, pileup is taken into account by subtracting the median energy density estimated on an event-by-event basis.
At trigger level, depending on the decay mode, events have to contain a pair of electrons or muons, one lepton with p T > 17 GeV and the other with p T > 8 GeV, or a single electron (muon) with p T > 27(24) GeV.
E ciencies for trigger selection, reconstruction, identification, and isolation of leptons are measured from data, using the tag-and-probe technique [6], based on a pure sample of Z events. These measurements are performed in bins of pT and |⌘`|. The overall e ciency for selecting electrons in the ECAL barrel (endcaps) vary from around 82% (73%) at p e T ' 10 GeV to 90% (89%) for p e T ' 20 GeV. It drops to about 85% in the ECAL barrel-endcap transition region, at 1.44 < |⌘ e | < 1.57. Muon o✏ine e ciencies are greater than ⇠98% in the whole |⌘ µ | < 2.4 range. The muon trigger e ciency for events selected for this analysis ranges from 96% to 99%.

The
The search strategy for the H ! W + W !`⌫``⌫`analysis is based on a signature with two isolated, oppositely charged, high p T leptons (electrons or muons) and large missing transverse momentum E miss T , due to the undetected neutrinos. To improve the signal sensitivity, events are classified by jet multiplicity into three mutually exclusive categories, characterized by di↵erent signal yields and signal-to-background ratios. In the following, these will be referred to as the 0-jet, 1-jet and 2-jet samples, where the selected jets are required to have E T > 30 GeV and |⌘| < 4.7. Events with more than three jets are discarded, while events with exactly three jets are incorporated in the 2-jet category if the lowest-p T jet is not between the two other ones in pseudo-rapidity. Additionally, the signal candidates are splits into three final states: e + e , µ + µ , e ± µ ⌥ .
Tag leptons are required to have opposite charge, p T greater than 20 GeV for the leading lepton (p`, max T ) and p T greater than 10 GeV for the trailing lepton (p`, min T ), and to originate from the primary vertex of the event (chosen as the vertex with the highest P p 2 T ). Only electrons (muons) with |⌘| < 2.5(2.4) are considered in the analysis.
In addition to high momentum isolated leptons and low jet activity, missing transverse momentum is present in signal events but generally not in background. In this analysis, a projected E miss T variable is devised. It is defined as the component of E miss T transverse to the nearest lepton if the di↵erence in azimuth ( ) between this lepton and the E miss T vector is less than ⇡/2. If no leptons closer than ⇡/2 with respect to the direction of E miss T in are present, then the projected E miss T is equal to E miss T . Since the projected E miss T resolution is degraded by pile-up, the minimum of two E miss T observables is used: the first includes all reconstructed particles in the event, while the second uses only the charged particles associated with the primary vertex. Events with projected E miss T above 20 GeV are selected for the analysis.
Top-quark background is suppressed by means of a top tagging technique based on soft-muon and b-jet tagging. The first method is designed to veto events containing soft-muons produced by b-quarks resulting from top-quark decays. The second one exploits the b-jet tagging algorithm, which searches for jet tracks with large impact parameter.
W+jets background is reduced by requiring a minimum dilepton transverse momentum (p`T ) of 45 GeV.
To reduce the background from WZ production, any event with a third lepton passing the identification and isolation requirements is rejected.
The contamination from W production, where the photon is misidentified as an electron, is reduced by about 90% in the e + e state by conversion rejection requirements.
The background from low mass resonances is rejected by requiring a dilepton mass (m``) greater than 12 GeV.
The huge Drell-Yan process, which greatly a↵ects same-flavour final states, is dealt with a more complex strategy. First of all, the resonant component of the Drell-Yan production is rejected by requiring a dilepton mass outside a 30 GeV window centered on the Z mass pole. Then, the remaining o↵-peak contribution is suppressed by exploiting various E miss T -based approaches depending on the number of jets. In the 0/1-jet categories, Drell-Yan events are more di cult to separate from the signal, therefore, in this case, a dedicated multivariate selection, combining missing transverse momentum with kinematic and topological variables, is used to reject background events and maximize the surviving signal yield. In the 2-jet category, the dominant source of fake E miss T is the mis-measurement of the hadronic recoil, therefore the optimal performance is obtained by requiring E miss T greater than 45 GeV. Additionally, for the 2-jet category, the momenta of the dilepton system and of the most energetic jet must not be back-to-back in the transverse plane. The combination of these selections reduce the Drell-Yan background by three orders of magnitude, while rejecting less than 50% of signal events.
Furthermore, to enhance the sensitivity to a Higgs boson signal, a cut-based approach is devised for the final Higgs selection in all categories. Due to di↵erent signal kinematics for different Higgs mass hypotheses, separate optimizations are performed for di↵erent values of m H in a cut-based analysis. In addition, a two-dimensional shape analysis technique is also pursued for the e ± µ ⌥ final state in the 0/1-jet categories, since it allows for a direct physical interpretation of the observed data with a sensitivity comparable to other more complex techniques.
The optimized cut-based approach consists in extra requirements placed on p`, max ``a nd the transverse mass m T , defined as , where E miss T``i s the azimuthal di↵erence between E miss T and the dilepton system. Also for the 2-d analysis, which exploits the independent variables m T and m``, an additional loose set of requirements is applied: m T must be greater than 80 GeV and smaller than 280 (600) GeV for m H hypotheses smaller or equal than 250 GeV (greater than 250 GeV), while m`m ust be smaller than 200 (600) GeV for m H hypotheses smaller or equal than 250 GeV (greater than 250 GeV). Finally, p`, max T is required to be larger than 50 GeV for m H hypotheses greater than 250 GeV. The two-dimensional distributions for the m H = 125 GeV and m H = 200 GeV Higgs signal hypotheses, the background processes and the data, are shown in Figure 1 for the 0-jet sample. This specific binning size is chosen to avoid empty bins for the overall background contribution for the present Monte Carlo statistics, while still keeping enough cells to di↵erentiate the signal shape from the background shape. All the 80 bins enter in a binned likelihood fit of the data to the signal and background hypotheses in this two-dimensional shape analysis.
The 2-jet category is mainly sensitive to the vector boson fusion (VBF) production mode, whose cross section is roughly ten times smaller than that of the gluon-gluon fusion mode, but it is characterized by a relatively low background environment and it probes a di↵erent production mechanism to test the compatibility of a signal with the SM Higgs boson hypothesis. The H ! W + W events from VBF production are characterized by a pair of energetic forwardbackward jets and very little hadronic activity in the rest of the event. Therefore, 2-jets events are further required to have zero hard jets and both the tag leptons present in the pseudorapidity region between the two leading jets. Additionally, to reject the main background from top-quark decays, two additional requirements are applied to the two jets, j 1 and j 2 : ⌘(j 1 , j 2 ) > 3.5 and m j 1 j 2 > 500 GeV. Finally, the cut-based approach in the 2-jet category makes use of the same m H dependent requirements, with the exception that m T is required to be larger than 30 GeV.

Background determination
To evaluate the background contamination in the signal region after the Higgs selection, a combination of techniques is used. When possible, data are used to estimate directly background contributions, thus avoiding large uncertainties related to the limited Monte Carlo simulation. The remaining contributions taken from simulation are small.
The W + jets and QCD multi-jet backgrounds are due to leptonic decays of heavy quarks, hadrons misidentified as leptons, and electrons from photon conversion. Hence, their estimation is derived directly from data using a control sample of events in which one lepton passes the standard analysis criteria and the other does not, but instead satisfies a relaxed set of requirements, resulting in a tight-fail sample. Firstly, the e ciency for a jet selected by the loose selection to pass the tight selection is measured using data from an independent multi-jet event sample dominated by non-prompt leptons. Secondly, this e ciency is used to extrapolate the tight-fail counts into the signal region (tight-tight sample). Systematic uncertainties related to the e ciency measurement dominate the overall uncertainty of this method, which is estimated to be about 36%.
Top-quark background is also estimated from data, by counting the number of toptagged events and by extrapolating into the signal region using the corresponding top-tagging e ciency, which is measured on data with one b-tagged jet. This method is limited by the statistical uncertainty of the control sample and from the systematic uncertainties related to the measurement of the tagging e ciency (about 20% in the 0-jet category and about 4% in the 1-jet category). For the low-mass signal region, m H  200 GeV, the non-resonant W + W contribution is estimated from data. Events with a dilepton mass larger than 100 GeV, where the Higgs boson signal contamination is negligible, are used to measure this contribution and the MC simulation is used to extrapolate into the signal region. The total uncertainty is about 10%. For larger Higgs boson masses the overlap between the non-resonant W + W and the Higgs boson signal becomes significant, and the sole simulation is used for the background estimation.
The Drell-Yan contribution to the same flavour final states is based on an extrapolation from the observed number of events with a dilepton mass within ±7.5 GeV of the Z mass. The residual background in the control region is subtracted using e ± µ ⌥ events. This method is limited by the statistical uncertainty of the control sample, which is about 20% to 50%. The Z/ ⇤ ! ⌧ ⌧ contamination is estimated using Z/ ⇤ ! ee and Z/ ⇤ ! µµ events selected in data, where the leptons are replaced with simulated ⌧ decays, thus providing a better description of the experimental conditions with respect to the simulation.
The W ⇤ contamination, from asymmetric virtual photon decays where one lepton escapes detection, is determined using the MADGRAPH generator with dedicated cuts.
The normalization scale of the simulated events is estimated through a control sample of high purity W ⇤ events with three reconstructed leptons. A measured factor of 1.6 ± 0.5 with respect to the leading order cross-section is found. The remaining minor backgrounds, from WZ, ZZ (when the two selected leptons come from di↵erent bosons) and W , are estimated from simulation.
For the two-dimensional analysis, the simulation is employed to build the templates, which are then cross-checked in data control samples. For the W + jets background the nominal shape is derived from the same control sample used to determine the normalization.

The H ! W + W !`⌫`qq search
The W + W semi-leptonic channel has two main advantages with respect to the fully-leptonic final state: it has a largest branching fraction and a reconstructible Higgs boson mass peak. On the other hand, it is contaminated by a large W + jets background. The precision on the estimation of this background largely determines the sensitivity of the analysis.
The reconstructed electrons (muons) are required to have p T > 35(25) GeV, and are restricted to |⌘| < 2.5(2.1). The jets are required to have p T > 30 GeV and |⌘| < 2.4. Events with the electrons and muons, and with exactly two or three jets are analyzed separately, giving four categories in total. In addition, the electron(muon) categories are required to have E miss T > 30(25) GeV. The two highest-p T jets are assumed to be the hadronic W decay products. According to the simulation, in the case of 2(3)-jet event categories, this criteria correctly identifies the jet pair at a rate which varies from 68(26)% for m H = 200 GeV to 88(84)% for m H = 600 GeV. Events with an incorrect dijet combination result in a broad non-peaking background in the m WW spectrum.
The (`, E miss T ) system is used to reconstruct the leptonic W candidate. To reduce the background from processes that do not contain W !`⌫ decays, requirements on m T (`, E miss T ) > 30 GeV and | leadingjet,E miss T | > 0.8 (0.4) are imposed for electrons (muons). These selection criteria reduce the QCD multi-jet background, for which in many cases the E miss T is generated by the mis-measurement of the hadronic recoil.
In addition to the listed cuts, a likelihood discriminant is built to enhance the signal sensitivity. The discriminant input variables comprise five angles between the Higgs decay products, fully describing the Higgs production kinematics [7], the p T and rapidity of the W + W system, and the lepton charge. The likelihood discriminating power is optimized independently with dedicated simulation samples for several Higgs mass hypotheses, for each lepton flavor and for each jet multiplicity (2-jet, 3-jet): four di↵erent optimizations are therefore obtained per mass hypothesis. The figure of merit used for the optimization is the expected limit for the Higgs cross-section. For each of the event categories, events are retained if they survive a simple selection on the likelihood discriminant.
Finally, the m WW resolution is improved by means of a kinematic fit, where both the W candidates are constrained to the W-boson mass to within its known width, with the longitudinal component of the neutrino momentum, |p z |, treated as a free parameter.
To estimate the normalizations of all background components in the signal region, an unbinned maximum likelihood fit is performed on the invariant mass distribution of the dijet system, m jj . The fit is performed independently for each Higgs boson mass hypothesis. The signal region corresponding to the W mass window, 65 GeV < m jj < 95 GeV, is excluded from the fit.
The measured invariant mass of the W + W system,`⌫`jj, is used as final discriminating variable. The binned distributions of`⌫`jj for the total background, signal and data for each mass hypothesis and event category are constructed and used as input for the limit-setting procedure. The`⌫`jj shape for the major background, W + jets, is determined from data as an interpolation of the shapes measured in two signal-free sideband regions of m jj (55 GeV < m jj < 65 GeV, 95 GeV < m jj < 115 GeV). The relative fraction of the two sidebands into the signal region is found through simulation, separately for each Higgs boson mass hypothesis, by minimizing the 2 between the interpolated shape in the signal region and the expected one. In addition, to avoid statistical fluctuations due to the low event count, the interpolated distribution is fit with an exponential function. This function is then used for the limit-setting.
The m jj and m`⌫ jj distributions with final background estimates are shown in Figure 2, with selections optimized for a 500 GeV Higgs boson mass hypothesis, for the muon 2-jets category. All uncertainties arising from interpolation and fit procedures are propagated to the limit calculation as systematic uncertainties.

Results
An excess of data with respect to the total background prediction is observed in the H ! W + W !`⌫``⌫`analysis for low Higgs mass selections. The significance of the excess with respect to the background only hypothesis is 3.1 standard deviations, while 4.1 standard deviations are expected assuming the SM prediction.
No other significant deviations are observed, and therefore upper limits on the Higgs boson production relative to the standard model Higgs expectation are derived. To compute the upper limits the modified frequentist construction CL S [8] is used. The standard model Higgs boson is excluded in the mass range 128 600 GeV at 95% confidence level (CL), while the expected exclusion limit in the hypothesis of background only is 118 565 GeV.
With the H ! W + W !`⌫`qq analysis, the presence of a SM Higgs boson in the mass range 215 490 and 525 600 GeV is excluded at 95%, while the expected exclusion limit in