System Design and Prototyping for the CMS Level-1 Trigger at the High-Luminosity LHC

For the High-Luminosity Large Hadron Collider era, the trigger and data acquisition system of the Compact Muon Solenoid experiment will be entirely replaced. Novel design choices have been explored, including ATCA prototyping platforms with SoC controllers and newly available interconnect technologies with serial optical links with data rates up to 28 Gb/s. Trigger data analysis will be performed through sophisticated algorithms, including widespread use of Machine Learning, in large FPGAs, such as the Xilinx Ultrascale family. The system will process over 60 Tb/s of detector data with an event rate of 750 kHz. The system design and prototyping are described and examples of trigger algorithms reviewed.


Introduction
The High-Luminosity LHC (HL-LHC) [1] presents the opportunity for a very rich and ambitious physics program, exploiting an integrated luminosity of 3000-4000 fb −1 . The LHC will undergo major upgrades of its components leading to an increase of the instantaneous luminosity to 5 × 10 34 cm −2 s −1 , five times the accelerator's original design value. In its "ultimate" configuration, the HL-LHC will reach a peak instantaneous luminosity of 7.5 × 10 34 cm −2 s −1 , increasing the average number of proton-proton collisions per bunch crossing (pileup) to around 200. The ultimate performance of the HL-LHC would enable the collection of 400 to 450 fb −1 of integrated luminosity per year, potentially providing a total of 4000 fb −1 to each of the CMS and ATLAS experiments. The CMS detector requires a trigger and data acquisition system with exceptional performance to collect the required information-rich datasets with these challenging running conditions. Along with the sub-detector upgrades [2], a complete replacement of the trigger, including the Level-1 (L1) and the high level trigger (HLT), and data acquisition (DAQ) system, with increased throughput, is planned. The Phase-2 1 upgrade of the trigger and DAQ system will keep a two-level strategy, while increasing the L1 maximum rate to 750 kHz to maintain the acceptance for physics. The total latency will be increased from 3.8 µs to 12.5 µs to allow, for the first time, the tracker and high-granularity calorimeter information to be included. Trigger data analysis will be performed through sophisticated algorithms, including widespread use of Machine-Learning in large FPGAs [3].  In order to fully exploit the HL-LHC running period, major consolidations and upgrades of the CMS detector are planned [2]. Given the high particle multiplicity expected, the performance required on event object reconstruction to achieve the extraction of physics signatures relies on the implementation of higher granularity detectors along with robust readout electronics. The CMS collaboration plans to replace both the Strip and Pixel tracking detectors, with an Inner Tracker featuring small-size pixel sensors and an Outer Tracker equipped with strip and macro pixel sensors, extending their coverage to |η| = 3.8. The Outer Tracker will implement stacked strip modules, reducing the hit multiplicity and allowing track candidates for the trigger (L1 tracks) to be reconstructed up to |η| = 2.4. The readout electronics for the barrel calorimeters will be replaced to achieve finer granularity and provide timing information. The endcap calorimeters will be replaced by the high-granularity calorimeter (HGCAL), implementing over 6 million readout channels. This sampling calorimeter will provide shower separation and identification adapted to harsher conditions in the forward region of the detector. The muon detection system redundancy achieved through the combination of drift tubes (DTs), resistive plate chambers (RPCs), and cathode strip chambers (CSCs) will remain with consolidated electronics. Additional improved RPC (iRPC) chambers and gas electron multiplier (GEM) chambers will be installed to extend the coverage up to |η| = 2.4 and 2.8, respectively. A minimum ionizing particle timing detector placed in front of the barrel and endcap calorimeters will provide precise timing measurement of charged tracks.

The Level-1 trigger Phase-2 upgrade
The Phase-2 upgrade of the L1 trigger system is designed not only to maintain the efficiency of the signal selection to the level of the Phase-1 performance but also to significantly enhance, or enable, the selection of any possible new physics manifestations that could lead to unconventional signatures [3]. High-precision measurements of physics processes will benefit from the extension of the available phase space such as enhanced trigger coverage in the forward region of the detector or the ability to exploit fully hadronic final states. Moreover, a longer latency will enable higher-level object reconstruction and identification, as well as the evaluation of complex global event quantities and correlation variables to optimize physics selectivity. The implementation of sophisticated algorithms using particle-flow (PF) reconstruction techniques or Machine-Learning based approaches can now be contemplated. In addition, the design includes a dedicated scouting system streaming data from key parts of the trigger at 40 MHz, via FPGAs into HPC resources. The scouting system provides unprecedented flexibility for parasitic debugging and commissioning of new ideas and is also being investigated for physics channels which are impossible through traditional triggering techniques.

Conceptual design and hardware developments
The conceptual design of the Phase-2 Level-1 Trigger system is the result of several considerations: the design has to efficiently distribute and process the input trigger primitives, provision appropriate resources and interconnections and retain enough headroom for future flexibility and robustness to evolve with running conditions and physics needs. The highlevel functional diagram of the system is shown in Fig. 1. The system features four distinct and independent trigger processing paths with a calorimeter trigger, a muon trigger, a track trigger and a particle-flow trigger. This division reflects the need to generate complementary types of trigger objects to achieve the best physics selectivity. The key design feature is the implementation of a correlator trigger combining all detector information and running sophisticated algorithms. The final trigger decision is performed at the global trigger level. This architecture meets additional constraints, such as the allowed maximum FPGA occupancy remaining below 50% (to ensure future flexibility in the design of algorithms) and the total latency remaining under 9.5 µs (to retain 20% contingency). The Phase-2 Level-1 Trigger system will need to handle an increased data volume from higher granularity detectors and a higher particle multiplicity. The total amount of input data which is required to be processed is over 60 Tb/s in comparison with the 2 Tb/s for the Phase-1 system [4]. The upgrade project includes a program of R&D to produce the required prototype electronics based on modern technology [3]. Generic high input/output processing boards based on the Advanced Telecommunications Standard (ATCA) have been designed and equipped with Xilinx Virtex Ultrascale+ (VU9P) FPGAs [5] (providing 8 times more computing resources than the Virtex 7 family used in the Phase-1 upgrade). The boards feature more than 100 bidirectional high-speed serial optical links running at up to 28 Gb/s (compared to the 10 Gb/s of the Phase-1 system) to transport the large data volumes. Processors on board running commercial linux are used for flexible configuration and monitoring.
The board designs have evolved since the Technical Design Report to provide increased input/output and computing power, allowing the deployment of more sophisticated algorithms and a gain in architecture modularity. Xilinx Virtex Ultrascale 13P (VU13P) FPGAs (A2577 pin package) provide almost 50 % more logic cells, while very dense optical modules are available, such as Samtec Firefly ×12 [6] modules. Alternative options for optical interconnections are also being explored, such as QSFP (×4 Rx/Tx) [7] or QSFP double density (×4 Rx/Tx) with low bit error rates, as widely used in industry. Figure 2 shows an example of a processing board evolution called X2O presenting a modular design with 2 FPGAs and 112 optical links (25 Gb/s) through QSFP modules currently under qualification.

Trigger algorithms, firmware implementation and testing
The trigger algorithms are designed with the extensive use of tracking information to reach near offline performance. The availability of fully reconstructed tracks translates into sharper turn-on efficiency curves. The trigger object reconstruction performance is close to offline physics object reconstruction with optimised response and resilience to high pileup conditions. Dedicated trigger algorithms to select specific physics topologies can be implemented, including final states with displaced objects coming from new physics signatures. Algorithm implementation in firmware relies heavily on High-Level-Synthesis, allowing for faster turn-around and the development of new approaches, such as those based on Machine-Learning techniques. With the availability of tracking and high granularity detector data, global event reconstruction algorithms such as particle-flow can be implemented. Particle-flow reconstruction [8] has been successfully used by CMS in offline data analyses and the HLT. Additionally, the reconstructed event primary vertex from tracks is used by the PUPPI [9] algorithm to filter particles based on a measure of their probability of coming from pileup. The combination of PF and PUPPI leads to a large reduction of the event complexity, while preserving the core physics information. This translates into a smaller bandwidth and reduced FPGA resource utilization. Trigger objects are formed from PF and PUPPI candidates, such as the H T trigger algorithm, for which efficiency curves are shown in Fig. 3 (left). This ambitious prototype algorithm was implemented in firmware and demonstrated in hardware using Vivado-HLS targeting a Xilinx-VU9P FPGA. The algorithm uses less than 50% of the FPGA resources (see Fig. 3 (right)) and a latency of 0.7 µs meeting the requirements of the project. Algorithm software emulators have been developed to test and validate the firmware implementation.
The project is currently realising single and multi-board tests in various integration centres around the world. Larger scale tests including detector interfaces are planned at CERN. A dedicated 25 Gb/s link protocol has been developed to achieve low bit error rate and recovery mechanism. During Run-3 of the LHC, starting in 2022, slices of the upgraded L1 trigger system will be installed to run in parasitic mode to gain experience and validate the design with real collision data.

Scrutinising further the data with the 40 MHz Scouting System
The concept of trigger scouting has been introduced in CMS at the HLT. It is based on the use of physics objects reconstructed as a by-product of the triggering process to perform data reduction and analysis, only storing high-level information for selected events, thus overcoming the rate to storage limitations of the DAQ. The Level-1 scouting system will use Level-1 trigger reconstructed objects and quantities in a similar way, selecting and analyzing them on the fly at the collision rate. This system has the additional advantage of allowing for a systematic search of correlations among multiple contiguous bunch crossings, and can be used to scrutinize the collision events and identify potential signatures unreachable through standard trigger selection processes. Figure 4 provides a functional diagram of this system. In addition to these features, a 40 MHz scouting system harvesting the trigger primitives produced by sub-detectors and the trigger objects produced at various levels of the trigger system is proposed.
The scouting Decision System (sDS) and the scouting Global System (sGS) constitute a first and independent stage of the scouting, with relatively modest throughput requirements, providing vital trigger diagnostic functionality for the GT and interesting physics functionality. The regional barrel and endcap calorimeter triggers, the barrel, overlap and endcap muon track finder systems are all distinct and independent. As such, each of them can be included in the scouting system as needed. The capture of cluster and trigger tower data from the endcap calorimeter has throughput requirements similar to those of the scouting Track System (sTS). Much larger throughput is expected from the lower stages of the scouting system. The system is equipped with DAQ DTH boards designed for the readout of detector information [3].

Conclusion
The CMS experiment is proposing solid solutions to the trigger and data acquisition challenge imposed by the extreme HL-LHC running conditions. The Phase-2 Level-1 trigger upgrade project is constructing a flexible and modular architecture with enhanced capabilities complying with the physics requirements. Sophisticated algorithms have been prototyped in FPGAs and exploit target hardware with demonstrated functionalities. The project has started its construction phase and the second generation of hardware prototypes are under validation. Further testing is planned at integration centers as well as during the Run-3 of the LHC with live collision data.