Hardware and firmware developments for the upgrade of the ATLAS Level-1 Central Trigger Processor

Level-1


JINST 9 C01035
1 The Central Trigger Processor The ATLAS experiment [1] uses three levels of triggers to identify physics events of interest.The Level-1 trigger reduces the event rate from 40 MHz to 100 kHz using information from dedicated muon trigger detectors and from the calorimeters.It is a synchronous, pipelined system that operates at the LHC Bunch Crossing Frequency (BC) of 40.08 MHz. Figure 1 shows the current ATLAS Level-1 trigger system.The final stage of the Level-1 Trigger is the Central Trigger Processor (CTP) which receives electron/photon, tau hadron and jet multiplicities as well as transverse energy information from the calorimeter trigger processors and muon multiplicities from the muon trigger.All this information is used to decide whether to accept or to reject a given event and to generate the Level-1 Accept (L1A) signal which initiates the readout.Additional trigger inputs coming from luminosity detectors, minimum bias scintillators and beam pick-ups are also sent to the CTP.The trigger decision is based on flexible logical combinations of trigger inputs, known as trigger items, which make up the so-called trigger menu.The trigger, timing and control (TTC) network is used for transmitting the timing signals received from the LHC and the L1A signal to the detector front-end.Additionally, the CTP generates trigger summary information that is sent to the software based Level-2 trigger and the data acquisition (DAQ) system.The CTP also performs comprehensive on-line monitoring operations.A more in-depth description of the CTP can be found in [2].
As shown in figure 2, the CTP system is housed in a single 9U VME crate and consists of the following custom designed modules: -1 - • CTP Monitoring (CTPMON): performs bunch-by-bunch monitoring of the trigger signals on the PIT backplane.
• CTP Core (CTPCORE): receives 160 trigger signals from the PIT backplane, combines them in an array of Look-Up tables (LUT) and a large ternary Content Addressable Memory (CAM) to form 256 trigger items that are individually pre-scaled and masked to generate the L1A signal.The CTPCORE also sends trigger summary information to the Level-2 Trigger and the DAQ system.
• CTP Output (CTPOUT): four modules distribute the trigger and timing signals via 20 cables to the sub-detectors.They also receive busy signals and calibration requests.
During the shutdown of 2013/2014 the Level-1 trigger system is being upgraded to cope with the increased luminosity of the LHC.The introduction of a Level-1 Topological Processor (L1Topo) [3] and resource limitations are imposing an upgrade of the CTP.This involves a new design of the CTPCORE board and the replacement of the CTPOUT board and the COM backplane.In the following, we will focus on the upgrade of the CTPCORE module.

CTPCORE+
The newly designed CTPCORE+ board will be capable of handling more than three times the original number of trigger inputs and twice the number of trigger items.Furthermore, the CTPCORE+ will implement three partitions for generating independent L1A signals, a primary one for physics running and two secondary partitions for concurrent operations of different ATLAS sub-detectors, for commissioning or calibration purpose.A more in-depth analysis of the main modifications can be found in [4].
The CTPCORE+ is a 9U VME board that hosts two large Xilinx Virtex-7 FPGAs [5] that implement the primary functionality and an auxiliary Xilinx Spartan-6 FPGA for interfacing to the VME bus.The Virtex-7 FPGAs used (XC7VX485T) provide 20 Multi Gigabit Transceivers (MGTs), more than 480,000 logic cells and more than 1000 RAM blocks of 36 kbits each.A block diagram and a picture of the board are shown in figure 3.
The Trigger Path FPGA (TRG FPGA) implements all the latency critical functionalities.It performs logical combinations of the 320 trigger inputs received from the PIT bus and prescales the trigger items to generate the L1A signals and the associated trigger type.In addition to the trigger inputs from the PIT bus, the TRG FPGA can receive trigger information either through 96 electrical lines via 3 front panel connectors or via 12 optical serial links.High density optical receivers (Avago MiniPOD) are used to receive the 12 optical links over a single ribbon fiber.The PIT bus and the electrical interface will be used as primary source for receiving trigger inputs while the optical interfaces are planned to be used only in the context of future upgrades, latency permitting.
The TRG FPGA interfaces with a DDR3 memory module that can be used for injecting test patterns and for storing snapshots images of the trigger inputs received.Sixteen MGTs are used for sending detailed trigger information from the TRG FPGA to the Readout/Monitoring FPGA (RDT FPGA).Each of the links will operate at 6.4 Gbps for a total throughput of 99.3 Gbps using a 64b66b encoding scheme [6].
-3 - The RDT FPGA implements all the non-latency critical functionalities.Upon reception of the primary L1A signal, trigger summary information is transmitted to the Level-2 trigger and the DAQ system through two serial optical readout links operating at 2 Gbps, implementing the S-LINK protocol [7].A GPS timing reference is received from an external card (CTRP) and used by the RDT FPGA for adding a precise time-stamp to each event.
A large part of the internal logic is dedicated to monitoring features.In particular, about 50% of the block RAMs resources are used for building histograms of selected trigger items as a function of the bunch number.Two DDR3 memories are interfaced to the RDT FPGA and are used to store snapshots of the detailed trigger information.A MiniPOD transmitter module can be used for running loopback tests with the optical trigger inputs.Two Gigabit Ethernet (GbE) interfaces are connected to the RDT FPGA and are planned to be used in the future to overcome the VME bandwidth limitations, allowing faster data transfers to external monitoring computers.Finally, a XC6SLX45 Spartan-6 chip implements the VME interface and controller.This chip allows configuration and monitoring of the TRG and RDT FPGAs through the VME bus.The first CTPCORE+ has been produced and is currently being tested.

Demonstrator setup
The CTPCORE+ module uses new FPGA chips that have only recently entered into full production.Given the complexity of the system and the novelty of some components a demonstrator has been prepared for validating the hardware and for providing a platform for developing firmware and software.Two commercial evaluation boards (VC707) [5] from Xilinx have been used for this purpose.These boards have the following features: • XC7VX485T Virtex-7 chip: same FPGA type as on the CTPCORE+ module in a different package.
A picture of the demonstrator setup is shown in figure 4.This setup has been used for validating some assumptions made during the CTPCORE+ board design.Two important aspects have been investigated: • The power consumption of the XC7VX485T chip with different configurations.
• The feasibility and reliability of high-speed communication between the two FPGAs on the CTPCORE+ board.

Power consumption measurements
In order to properly select the DC/DC converters of the CTPCORE+ board, the knowledge of the current consumption for the different voltage rails is fundamental.These values have been estimated using a Xilinx spreadsheet based power estimation tool (XPE) [5] as well as measured on the VC707 board.
The XPE tool provides an estimation of the static and the dynamic power consumption of the chip, based on a set of configurable parameters such as: • FPGA model and operating conditions; • number of internal clocks and their frequencies; • the percentage of logic and RAM blocks and their toggling rates; • data rates and clocking schema for the MGTs; • external memory interface data rate and technology (DDR3, DDR2, etc.).
The tool reports the total power consumption of the chip as well as the current requirement for each supply voltage.Figure 5 shows an example of the output of the XPE tool.
-5 - The measurement of the chip power consumption was performed by accessing the internal registers of the DC/DC controllers (Texas Instruments UCD9248 [9]) installed on the VC707 board.These devices support the Power Management Bus (PMBUS [10]) and allow monitoring the actual current consumption and voltage levels.Figure 6 shows the output measured for one of the controllers.Similar monitoring functionalities have been foreseen on the CTPCORE+ board.
The values calculated by XPE are generally conservative, with the estimated values 5-20% higher than the measured ones.However, for one of the MGT supply rails XPE underestimates the power consumption by about 200%.

High-Speed Link tests
On the CTPCORE+ board, about 2300 bits of trigger summary information need to be transmitted every 25 ns from the TRG FPGA to the RDT FPGA, corresponding to a bandwidth of 92 Gbps.Sixteen MGTs operating at 6.4 Gbps will be used for this purpose.
In order to verify the feasibility of this approach, we connected the two evaluation boards through four FMC mezzanine cards (FMS-28 from Faster Technology) [11] and four high speed -6 - A Xilinx on-chip MGT analysis tool, IBERT [5], was used for measuring the channel Bit Error Rate (BER) and to generate the bathtub curve.Sending a Pseudo Random Bit Sequence (PRBS-31) at 10 Gbps we measured a BER of better than 10-15.In addition, we used the internal measurement capability of the Virtex-7 chips to produce the bathtub curve, shown in figure 7.
Given the good results obtained on the demonstrator system we are confident that the real transmission of data between the TRG and the RDT FPGAs on the CTPCORE+ module will work, in particular since it operates at a lower baudrate (6.4 Gbps instead of 10 Gbps) and over a much shorter distance (∼ 10 cm instead of 1 m).
The IBERT tool is also being used on the CTPCORE+ board to validate the on-board high speed links.

Firmware design
The CTPCORE+ upgrade required the development of new firmware modules and the redesign of the existing CTPCORE firmware in order to add new functionality.The demonstrator system was used for testing the firmware modules developed and for verifying their correctness without having to wait for the CTPCORE+ board.The main firmware blocks designed are discussed below.

Control and monitoring interface
Since the demonstrator setup is not VME based, a different control interface had to be used to emulate the VME bus interface available on the CTPCORE+ module.We decided to adopt an Ethernet-based control and chose IPBus [12], a UDP based protocol that can be used for accessing the internal registers of the FPGA.From the software side the IPbus use model is similar to the one of the VME bus providing simple register read and write operations.IPBus is used by the CMS experiment for current upgrade projects and is being considered for future ATLAS upgrades.We adapted IPBus firmware to the Virtex-7 architecture that was not supported at the time we developed the firmware.

DDR3 memory controller
The CTPCORE+ uses DDR3 memories as playback and snapshots memories.The memories must be accessed internally from the FPGA logic as well as externally from the Control and Monitoring Interface.The Control and Monitoring Interface accesses the memory sporadically, requiring a low bandwidth, while the internal logic accesses the memory synchronously to the BC, demanding a constant and guaranteed bandwidth.To accommodate the different nature of the requests, an access scheduler has been designed that uses the standard external memory controller IP Core provided by Xilinx.The design has been tested on the demonstrator system, achieving a line rate of 1.6 Gbps.Careful use of dual clock FIFOs was required to handle the transfer of data between the memory, BC and control clock domains.

Chip to chip communication protocol
The MGTs used for transmitting data from the TRG to the RDT FPGA must be operated at the lowest speed possible to minimize their power consumption.We chose the Xilinx proprietary Aurora64b66b protocol which introduces only a minimal transmission overhead penalty.The Au-rora64b66b IP core takes care of setting-up, synchronizing and verifying the status of multiple MGTs and can be configured for running at different baud rates with up to 16 MGTs per channel.
In order for the same firmware module to be used both on the demonstrator and on the CTP-CORE+ board, we designed a solution, on top of the Aurora protocol, where groups of 4 MGTs are connected to a round-robin scheduler that sends and reconstructs data in the correct order.This allows the CTPCORE+ to be configured for running with 16 MGTs at 6.4 Gbps (99.3 Gbps -8 - bandwidth) and the demonstrator system with 12 MGTs running at 10 Gbps (116 Gbps total bandwidth), satisfying the bandwidth requirements of the system in both cases.Furthermore, with this approach, the number of bits to be transferred per BC can be changed, easing future modifications of the firmware.

Firmware validation
Using the newly designed and tested firmware blocks we set up a demonstrator system implementing some of the functionalities of the CTPCORE+ module.One of the boards was configured as the TRG FPGA, while the second one implements the RDT FPGA.Pre-loaded trigger inputs are read from the DDR3 memory on the first board and sent to the second board using the chip-to-chip protocol described above.On reception, data are stored inside the second board DDR3 memory.Ethernet and IPbus are used for configuring the system and for writing/reading the content of the DDR3 memories.Figure 8 shows this setup.
The demonstrator system is fully operational and it represents an excellent starting point for the design and validation of the CTPCORE+ module firmware.

Summary
The existing CTP module is being upgraded in order to significantly increase the number of trigger inputs and trigger combinations, allowing additional flexibility for the trigger menu.For this purpose, a new CTPCORE board has been designed and the first prototype is currently being tested.
In parallel, various tests and measurements have been performed on a demonstrator system based on commercial evaluation boards.The power consumption of the FPGAs has been measured and the feasibility of a high-speed communication has been demonstrated.New firmware has been designed for accessing the DDR3 memory, for using the high-speed links and for controlling the FPGAs.These blocks have been integrated in a system that emulates part of the CTPCORE+ board functionalities.
The porting of the existing CTPCORE firmware to the new architecture is ongoing.The software is also being modified and extended to support the new hardware and the added functionalities.Commissioning of the upgraded CTP is foreseen for the second half of 2014.

Figure 2 .
Figure 2. ATLAS Central Trigger Processor architecture and implementation.

Figure 7 .
Figure 7. Reconstructed bathtub curve of a 10 Gbps at the receiver.

Figure 8 .
Figure 8. Setup used for validating the firmware.