Frameworks to monitor and predict rates and resource usage in the ATLAS High Level Trigger

The ATLAS High Level Trigger Farm consists of around 40,000 CPU cores which filter events at an input rate of up to 100 kHz. A costing framework is built into the high level trigger thus enabling detailed monitoring of the system and allowing for data-driven predictions to be made utilising specialist datasets. An overview is presented in to how ATLAS collects in-situ monitoring data on CPU usage during the trigger execution, and how these data are processed to yield both low level monitoring of individual selection-algorithms and high level data on the overall performance of the farm. For development and prediction purposes, ATLAS uses a special ‘Enhanced Bias’ event selection. This mechanism is explained along with how it is used to profile expected resource usage and output event rate of new physics selections, before they are executed on the actual high level trigger farm.


The ATLAS Trigger Systems
This document focuses on the ATLAS trigger systems used during Run 2 of the LHC (since 2015) at a proton-proton centre of mass energy √ s = 13 TeV.
The ATLAS trigger system is briefly outlined below; more details on the trigger in Run 1 and Run 2 are found at [1] and [2]. The ATLAS detector is described in [3]. The data-driven rate prediction methods described in Section 2.2 allow for trigger strategies to be developed for current and future LHC conditions which respect the physics goals of the collaboration and the limitations of the trigger and Data Acquisition (DAQ) systems. Similar methods for CPU usage predictions are described in Section 3.1 which estimate the size of the computing farm required to apply the High Level Trigger (HLT) filtering described by a trigger menu at different LHC conditions. This allows the collaboration to estimate the amount of computing resources required in the future and to plan accordingly. This paper presents a summary of the work from [4]. Resource utilisation studies from Run 1 are presented in [5] and [6].
ATLAS operates a two level trigger system: a hardware based L1 system of 512 L1 trigger items reduces the LHC 40 MHz bunch-crossing rate to 100 kHz at which point the detector is read out. These events are subsequently filtered by the HLT, a computer farm of up to approximately 40,000 CPU cores.
Event selections in the HLT are referred to as chains, as they chain together a sequence of selection steps. The collection of all trigger chains is called the trigger menu. Each chain specifies the L1 seeds which, if present, will activate the chain. Chains are comprised of steps, where a step is a sequence of algorithms. A typical step will execute multiple feature-extraction algorithms within a geometric Region of Interest (RoI) terminating on a hypothesis algorithm which applies selections on the reconstructed objects. The output of single algorithms and entire steps is cached and any subsequent requests for the same algorithm/step (within the same RoI) in the event processing returns the cached version.
Events which are accepted by the HLT are transferred to CERN's Tier-0 computing centre at a rate of around 1 kHz on average, for physics events with full detector readout.

The Enhanced Bias Dataset
The rate of events passed by any trigger selection is calculable from a sample of data recorded with a zero-bias trigger 1 , based on the probability of the selection to pass events in the unbiased sample. However, most selections of particular interest at a hadron collider, for example high p T leptons, come from processes whose cross sections are much smaller than the total inelastic cross section requiring infeasibly large data samples.
Enhanced bias data samples differ in that they are overly-represented with high p T events which are likely to be selected by the trigger. The data are collected using a variety of L1 triggers of all physics-object types, combinations and p T ranges to produce a compact dataset which has the statistical power to assess the trigger rate of algorithmic selections on the data of the type typically performed by the HLT.
Enhanced bias data are taken with an invertible trigger menu. This means that a single weight is calculable per event which corrects for the prescales 2 applied during the enhanced bias data-taking and restores an effective zero-bias spectrum.
In the following sections, the composition of an enhanced bias dataset is detailed along with the formulas used to calculate the weights required to estimate a trigger rate.

Composition of Enhanced Bias Data
Each enhanced bias dataset reflects the configuration of the L1 trigger and the beam parameters of the LHC (most importantly the mean number of inelastic proton-proton collisions per bunchcrossing referred to as pileup, μ ) at the time it is taken.
Five trigger chains are used to collect enhanced bias data in ATLAS. Each chain targets a group of physics objects and is multi-seeded by around 10-30 L1 items of similar rate, except for a zero-bias trigger. Prescales are set such that events accepted by the zero-bias trigger make up 20% of the dataset.
The collection of an enhanced bias dataset occurs in parallel with standard data taking. Hence it must be taken using the same set of L1 prescales being used to take physics data. The enhanced bias chains are constructed so that for each event, each chain will be activated, or not, by the whole set of L1 items it is seeded by. This is necessary to preserve the information on the correlation of the L1 items.
For highly discriminating (e.g., 'primary', high p T ) L1 items which are unprescaled at L1 in the main operating menu, the relevant enhanced bias HLT chains are directly seeded from their set of unprescaled L1 items.
For chains whose set of L1 seeds are prescaled at L1 in the main operating menu, the activation of the chain by the whole set of L1 items is achieved by using a second random trigger at L1 (with a typical rate of 5 kHz). These enhanced bias chains maintain an internal list of L1 items that they are to select, and pass the event if one or more of the L1 items from this internal list passed raw 3 at L1 in the random event.
Events recorded by enhanced bias chains are only biased by the L1 system. No selection is applied at the HLT beyond the application of HLT prescales to control the output rates. Around one million enhanced bias events are recorded per sample, at around 300 Hz for a period of one hour. Of these, around 150,000 events satisfy the L1 primary isolated electron trigger (p T ≥ 22 GeV) and 90,000 the L1 primary muon trigger (p T ≥ 20 GeV). At lower thresholds, 320,000 events satisfy a single L1 jet requirement (p T ≥ 30 GeV) and 35,000 events satisfy a L1 requirement of at least one muon with p T ≥ 4 GeV and at least one (different) muon with p T ≥ 6 GeV (for B physics). Thus a single sample is obtained that contains sufficient statistical power to determine the rate of all primary, supporting and backup chains which together make up a trigger menu.

Predicting Trigger Rates
In ATLAS, the reprocessing of an enhanced bias dataset is used to validate updates to the current software release and trigger menu before these are deployed on the live system. Part of this validation is the calculation of the rates of individual chains, groups of chains and the entire menu using enhanced bias weighting.
The trigger is re-executed over all events in the enhanced bias sample using the trigger menu which is to be validated. This stage is performed on the grid. In the reprocessing, all L1 items and HLT chains are run unprescaled. This yields an ntuple containing the raw trigger decision for all L1 items and all HLT chains in the menu for every event in the enhanced bias sample. The effects of prescales are applied after the reprocessing via weighting factors to make full use of the available statistics.
As well as the rates for individual L1 items and HLT chains, more complex combinations are considered. These include the total rate of all HLT chains defined within different physics groups, the total rate at L1 and the HLT, the unique rate of a chain (or group of chains) and a chain's overlap with all other chains in the menu.
The rate of a chain, or combination of chains, is Where R is the rate in Hz and R Err its statistical uncertainty, the sum runs over all e = 1, 2, . . . , N events in the enhanced bias dataset, the weight w(e) is the effective number of events passed by the chain/combination in event e and Δt is the time period over which the enhanced bias data sample was collected, typically around one hour. The weight w(e) is divided into three sub-weights which are discussed over the following sections. w Here for event e, w EB (e) ≥ 1 is the enhanced bias weight; the chain/combination weight w C (e) = 0 if the chain/combination fails, otherwise it is bounded 0 < w C (e) ≤ 1 based on the prescale value(s) being applied to the chain(s); finally w L (e) > 0 is a luminosity extrapolation weight.

Calculating Enhanced Bias Weights
Each event in the sample has a fixed enhanced bias weight. It is a property of the event and it corrects for the online prescales used to take the enhanced bias sample. For each event, e, the weight w EB (e) is Where the product runs over the j = 1, 2, 3, 4, 5 enhanced bias chains used to take the dataset, with raw decision r je = 0, 1 and total prescale p j ≥ 1 (here for simplicity it is assumed that the enhanced bias chain's prescales were constant over the data taking period, but in ATLAS these prescales are also permitted to change). For the simple case of a single enhanced bias chain j, w EB (e) = p j where r je = 1 is implied by the event being accepted into the enhanced bias sample.

Luminosity Extrapolation Functions
Two principal factors in instantaneous luminosity determination at a collider such as the LHC are the number of colliding bunches, N B , and the pileup, μ . The instantaneous luminosity, L, scales linearly with both of these quantities: where k is a constant of proportionality. For proton-proton collisions at √ s = 13 TeV, A luminosity extrapolation is used to take an enhanced bias dataset with properties μ EB (e), N EB B , and use it to predict rates for a target luminosity defined by μ T and N T B . Two of μ T , N T B and L T must be specified, with the third parameter calculable via k from (4). The majority of triggers in a high energy physics experiment are highly selective and these typically scale linearly with luminosity such that w L (e) = L T /L EB (e). The functional form of w L (e) should be motivated by a chain's sensitivity to detector conditions, for example for chains with an exponential dependence on pileup, the form should be used. Here f is a constant extracted from fits to the data.

Calculation of Chain/Combination Weights
Rates are calculated for individual chains, for the union of multiple chains (group & total rates) and for the overlaps among chains (intersections). In the following, a set of i = 1, 2, . . . , N L1 L1 items and j = 1, 2, . . . , N HLT HLT chains in event e with raw decisions (before any prescales) r L1 ie , r HLT je and prescale values p L1 i , p HLT j are considered. The relationship between the L1 items and HLT chains is encoded in a binary matrix M ij such that M ij = 1 L1 item i seeds HLT chain j 0 L1 item i does not seed HLT chain j.
The most appropriate formula for w C (e) depends on the properties of this matrix.

Single HLT Chain with One Seed
For a single HLT chain, j, seeded by a single L1 item i, the chain weight w C (e) is

Union of Multiple HLT Chains
For the case where M is doubly stochastic such that each HLT chain has exactly one L1 seed and vice versa, the combination weight of the union over the HLT chains is Another common topology in ATLAS is when M is a left stochastic matrix. That is to say each HLT chain j has exactly one seed, but each L1 item i may seed multiple chains. Here the combination weight is

Intersection of Multiple HLT Chains
The intersection combination weight for the doubly stochastic case is Additional topologies and further details on weighting algorithms are available in [4,7].

Unique Rates, Overlaps and Coherent Prescales
An individual chain's rate ignores the correlations between itself and other trigger chains. Any non-zero correlations will act to reduce the HLT total rate, such that it is smaller than the sum of the rate of all individual chains in the menu. A chain's unique rate is the subset of its rate which is uncorrelated with all other chains in the menu, after prescales are applied. The expression to obtain the unique rate of the N th chain in the set of j = 1, 2, . . . , N chains, C, is By excluding more than one chain from the right half of the equation, the unique rate of groups of chains is calculable. When a chain's rate is larger than its unique rate, it must have at least partial overlap with other chains in the menu.

Validation of Predicted Rates
Two examples of the enhanced bias mechanism in use are given in Figure 1. In Figure 1a, predicted rates are presented over a range of transverse energies. The smooth p T spectra obtained via the enhanced bias weighting procedure illustrate the statistical power of the data sample over several orders of magnitude in rate. In Figure 1b, HLT rate predictions are compared to actual online rates for all 957 physics chains in a trigger menu for which there was a non-zero rate online. The online rates are corrected for prescales at both levels and the difference in rate between the prediction and online is normalised to the combined statistical uncertainty from both samples. The Gaussian fit in the range −3 to 3 indicates that the prediction for the majority of HLT chains is normally distributed. A tail is visible to negative significance for a small number of chains where the predicted rate was too low. This is due to a small bias arising from the chosen set of L1 seeds of the enhanced bias dataset and their available statistics. The mean fractional statistical uncertainty is 10% for the predicted rates and 2% for the online rates.

HLT Monitoring
The ATLAS cost monitoring framework consists of a suite of tools which are executed on a sample of events processed by the HLT, irrespective of whether the events pass or fail the HLT selection. A monitoring fraction of 10% is chosen so as to sample a representative sub-set of all events.
Monitored data include algorithm execution time, data request size and the logical flow of the trigger execution for all L1-accepted events.
Example monitoring distributions are given for two of the many algorithms used by ATLAS in Figure 2: calorimeter topological clustering and electron tracking. This monitoring data is collected over a period of 180 s of data-taking at L = 1 × 10 34 cm −2 s −1 . Histograms are filled with a weight w = 10 to correct for the 10% random sampling fraction. Topological clustering can run either within a RoI or as a full detector scan, giving a characteristic double peaked structure.

Performing CPU Usage Predictions
In Section 2.2, the procedure to predict the rate of individual HLT chains and trigger menus was described. An equivalent procedure allows for an estimation of the number of HLT processor cores which will be required to run a given trigger chain, or menu.
Monitoring ntuples are generated for this case by running the trigger menu to be profiled over an enhanced bias dataset on the grid. Algorithm caching within the HLT inter-correlates the HLT chains such that prescales can not be simulated later via the application of weights. Prescales are instead directly applied during the trigger execution.
The output monitoring ntuple is processed and enhanced bias weights from equation (3) are applied. If the prescale set applied on the grid is not designed for the same luminosity as the enhanced bias dataset then an additional weight is applied to scale the prediction. One possible luminosity scaling factor is the ratio between the predicted L1 rate for the prescale set and the enhanced bias sample L1 rate, multiplied by the ratio of the prescale set's target μ to the enhanced bias sample μ . This assumes that the mean increase in processing time is linear in μ . Other form factors may be more applicable depending on the menu.
By normalising the total CPU usage to the enhanced bias dataset collection time (Δt, from  equation (1)) the number of processor cores required to execute a given HLT chain, or full menu, is obtained. This allows for predictions to be made about the number of processing cores required for higher-luminosity conditions.
As prescales are applied in a way such that the statistical power of the dataset is reduced, the statistical uncertainty on the CPU utilisation by prescaled chains is larger.

Conclusion
The enhanced bias mechanism allows for fast data-driven rate predictions to be performed utilising dedicated ATLAS datasets of manageable size. These datasets contain events only biased by the L1 decision which over-sample high p T triggers and other interesting physics objects. The datasets are collected with a small set of enhanced bias trigger chains which allow for a single weight to be derived per event which corrects for the online prescales applied to the event.
Rate and CPU usage predictions make use of the enhanced bias weights, along with luminosity extrapolation and prescale emulation weights. These techniques are employed in the prediction of the rates of individual chains, groups of chains, the total rate, unique rates and overlaps among chains. These mechanisms have allowed ATLAS to validate the impact of new triggers, both in terms of output rate and HLT CPU requirements, before the new triggers are deployed on the live system. When live, the triggers are subsequently monitored in every data taking run. During winter LHC shutdown periods, the collaboration uses enhanced bias data to test large changes to the trigger menu such that a prescale strategy is prepared for when the LHC restarts.