FELIX: a PCIe based high-throughput approach for interfacing front-end and trigger electronics in the ATLAS Upgrade framework

The ATLAS Phase-I upgrade (2019) requires a Trigger and Data Acquisition (TDAQ) system able to trigger and record data from up to three times the nominal LHC instantaneous luminosity. The Front-End LInk eXchange (FELIX) system provides an infrastructure to achieve this in a scalable, detector agnostic and easily upgradeable way. It is a PC-based gateway, interfacing custom radiation tolerant optical links from front-end electronics, via PCIe Gen3 cards, to a commodity switched Ethernet or InfiniBand network. FELIX enables reducing custom electronics in favour of software running on commercial servers. The FELIX system, the design of the PCIe prototype card and the integration test results are presented in this paper.


Introduction
The ATLAS [1] Phase-I upgrade requires a Trigger and Data Acquisition (TDAQ) system able to trigger and record data from up to three times the nominal Large Hadron Collider (LHC) instantaneous luminosity. During the LHC Long Shutdown 2 (2019-2020), new ATLAS on-detector electronics for the Liquid Argon (LAr) Calorimeters and New Small Wheel (NSW) muon detectors will be installed. A new detector independent readout architecture, named Front-End LInk eXchange (FELIX), will provide access to the TDAQ systems in a scalable, detector agnostic and easily upgradeable way, to the above mentioned FrontEnd systems. On one side of the FELIX system, the GigaBit Transceiver (GBT) [2] architecture and a protocol developed by CERN provides a high-speed (4.8 Gb/s) radiation-hard optical link for data transmission from the on-detector FrontEnd electronics. By means of time multiplexing, the GBT protocol provides up to 42 independent data links, however sharing the same fibre. As shown in figure 1, through such links FELIX receives and identifies different information streams, the so called "e-links". Via a Field Programmable Gate Array (FPGA) based PCIe card, the detailed description of which follows in this paper, the GBT link data is funneled to the host PC memory. From there on -1 -data packets will be routed via a commercial switched network. In the opposite direction, FELIX receives packets from the network and forwards them to specific on-detector electronic modules. The GBT frame consists of 116 bit data (payload) and a 4 bit header. The header is used to align the frame at the receiver side, and it can either be DATA (0101) or IDLE (0110). Four of the 116 bit data are used for slow control and monitoring, based on the High-Level Data Link Control (HDLC) [3] protocol. By means of HDLC, FELIX will support the configuration of GBTx ASICs [4] via the IC (Internal Control) slow control (2-bits). The 2-bits for the EC (External Control) or any regular 2-bit e-link can be used to communicate with the GBT-SCA (Slow Control Adapter) ASIC [5]. The SCA is used to control and monitor devices on on-detector Front-End boards. The SCA supports user interface ports like I2C, SPI, JTAG and GPIO. Finally, a FELIX system has to handle the input from the Time, Trigger and Control (TTC) system [6], by recovering the LHC clock and forwarding the machine-synchronous trigger information. This information will be distributed to on-detector electronics over low-and-fixed-latency GBT links, and also to new and upgraded off-detector firstlevel trigger systems. For readout of the latter a lightweight protocol, the so called "FULL mode", with higher throughput (9.6 Gb/s) than the GBT protocol is envisaged to be used. All functions described above are implemented in FPGAs mounted on PCIe cards, the so called "FLX cards". As a system, FELIX consists of a PC running a Linux based OS (SLC6), an Ethernet or InfiniBand Network Interface Card (NIC), and up to two FLX cards, as depicted in figure 2. The Hitech Global HTG-710 is used as an FLX demonstrator card. It is equipped with an 8-lane PCIe Gen3 (64 Gb/s) interface and with two CXP transceivers providing interfaces for 24 bidirectional optical links (max. 13.1 Gb/s). Moreover a custom mezzanine was designed to receive and decode the TTC clock and data information. The FLX demonstrator card firmware is also ported to a Xilinx VC-709 evaluation board, which has the same type of FPGA and PCIe interface as the HTG-710, but less optical interfaces. This second card targets detector and trigger system test setups. As none of the above options matches completely the final requirements in terms of FLX card, a third board, known as the FLX-711 (figure 5) had been developed and adopted as a candidate prototype of the final FLX card. Drivers and software tools have been developed for control and monitoring as well benchmarking of these boards. Data routing and the connection to the COTS (Commercial Off-The-Shelf) network is implemented in a software pipeline running on the FELIX host PC. The -2 -packet processing performance satisfies the requirement of FELIX [7]. As the FELIX data handling and FELIX software aspects have been extensively discussed in previous documents [7,8], the progress of the FELIX card prototype (FLX-711) design and testing will be the main focus of this paper. An update on the FELIX firmware development and the integration test results of FELIX with FrontEnds will also be presented. 5 Features of BNL-711 FELIX base line hardware platform: PCIe FPGA board gen3 x16, "BNL-711"

FELIX card prototype
• Developed at BNL also as the DAQ platform for the LTDB (   The speed of these 48 optical links can be up to 14 Gb/s which is limited by the MiniPODs. An ADN2814 [11] is used to recover the 160 MHz LHC TTC clock and data. An on-board jitter cleaner chip is then used to clean the TTC clock, and provide clean reference clock [12] for transceivers. As this board is also used as part of the test setup for the LAr Trigger Digitizer Board (LTDB) in the ATLAS LAr Phase-I upgrade [13], where buffering of 320 channels Analogto-Digital Converter (ADC) data are required, two DDR4 small outline dual in-line memory modules (total capacity 16 GB) have been added on-board. Lastly, a micro-controller (ATMEGA256A) is used to program the FPGA from selectable bitfiles stored in a flash memory [14]. Software in the PC communicates and controls the reconfiguration process via the System Management Bus. Figure 4 shows the complexity of the Printed Circuit Board (PCB) stackup. Due to the board size limit, the FLX-711 uses two types of blind vias to achieve the complete routing: one is for the high-speed signals connected with MiniPODs, the other is for the dense -3 -

JINST 11 C12023
DDR4 traces. Because of this layer stackup three laminations are required for manufacturing the PCB. The 1078LR, 1078MR and 1078HR are different prepreg constructions of Megtron6 material [15]. The first version of the FLX-711 board is shown in figure 5. All of the hardware features have been successfully tested. To test the PCIe interface two Wupper [16] Direct Memory Access (DMA) engines are implemented in the FPGA, while counter data is used to test the throughput to the server. The total measured throughput of these two 8-lane PCIe endpoints can be up to 101.7 Gb/s, in agreement with the PCIe specifications. To test the optical links the IBERT [17] from XILINX is used to perform BER (Bit Error Rate) testing at link speeds of 12.8 Gb/s, for all of the 48 fiber optical links. All of the 48 channels are connected as loopback. The result shows that the BER is smaller than 10 −15 ; an eye diagram of a typical channel is shown as in figure 6. On the remote reconfiguration interface: the i2c-tools [18] can successfully communicate with the micro-controller through the System Management Bus. The micro-controller is in turn able to set the FPGA configuration pins to initiate the FPGA programming, loading the image from the -4 -

JINST 11 C12023
target segment in the Flash memory. The highest two address pins of the Flash are controlled by the ATMEGA256A, thus splitting the Flash into 4 segments.  Details of the FELIX firmware are described in [8]. The main modules are the Wupper PCIe Engine, the Central Router for internal data multiplexing, the TTC decoder and the optimized GBT-FPGA core. A block diagram of the firmware for FLX-711 is shown in figure 7. The overall occupancy in a KCU115 FPGA is 22% LUTs for 4 channels. The Central Router takes about 16% of the total, and scales linearly with the channel number. Since there are two PCIe endpoints, two distinct Wupper PCIe Engines are implemented. Half of the channels are therefore connected to each engine. As introduced earlier, a customized lightweight protocol, called "FULL mode", is defined for the links between FPGA-based FrontEnds and FELIX systems; the goal being to provide a higher maximum payload. Figure 8 shows a block diagram of both the FrontEnd and FELIX ends of a FULL mode link. The link speed is 9.6 Gb/s, but as data is encoded in 8b/10b a maximum user payload of 7.68 Gb/s can be achieved. The packet size is in units of 32 bit. As an upper limit estimation, eight channels, each with a maximum payload of 7.68 Gb/s could be transferred within the PCIe Gen3 8-lane bandwidth (maximum 64 Gb/s). In the FLX-711 case, as the PCIe interface is 16-lanes, up to 16 channels can be supported.

Integration test results
FELIX will be used to interface several FrontEnds, such as the LAr calorimeter trigger electronics and the muon system New Small Wheel (NSW) in the ATLAS Phase-I upgrade. For the Phase-II upgrade of HL-LHC (High-Luminosity LHC), the plan is to adopt FELIX for interfacing all the FrontEnds. A series of integration tests with the LTDB and the Control and Readout ITk (Inner Tracker) Board (CaRIBOu) have been done in the course of this year, and will be summarized in the following paragraphs.

Integration with LTDB
In the LAr Phase-I upgrade, the LAr Trigger Digitizer Board (LTDB) is used to digitize the input analog signals, and transmit them to the back-end [13]. On the LTDB prototype, there are five GBTx and five GBT-SCA chips. GBT-SCA chips are used to control the power, the I2C slaves and perform the on-board temperature measurement. Besides the interface to EC links with the GBT-SCA chip, each GBTx on the LTDB provides the recovered 40 MHz TTC clock to the ASICs NEVIS ADC [19] and serializers LOCx2 [20], and sends the BCR (Bunch Crossing Reset) signal to LOCx2. Both FLX-709 and FLX-711 have been used to demonstrate the interfacing to the LTDB board. Test results show that the LTDB works according to specifications, and can cope with the recovered clock and BCR signal sent from the FELIX GBT links. Communication with the GBTx and GBT-SCA on the LTDB can also be performed by using the IC and EC bits in the bidirectional GBT frame.

Integration with CaRIBOu system
CaRIBOu is a modular test system for silicon sensor research and development for the ATLAS upgrade [21]. It consists of several boards: the control and readout board, the Xilinx ZC-706 evaluation kit, and several Front-End chip carrier boards. The FELIX demonstrator VC-709 has been used as back-end to interface to CaRIBOu systems. The integration beam test has been successfully carried out at CERN in August 2016.
A block diagram of the test setup is shown in figure 9a. The pixel interface board outputs the system clock, and the commands to all of the FE-I4 [22] boards via RJ-45 connectors. Commands are encoded in the appropriate FE-I4 format. One Ethernet cable is used to connect the clock and the commands to a VC-709 which emulates the function of an LTI (Local Trigger Interface) board. Firmware on this board decodes the FE-I4 command format, and extracts the TTC signals, such as the trigger, BCR and Event Counter Reset (ECR) commands. These TTC signals are then sent to FELIX, which then distributes them to the FrontEnd CaRIBOu system. The links from the LTI emulator to FELIX and between FELIX and CaRIBOu are all GBT links. Figure 9b shows the clock distribution of the system. SI5324 [23] on the LTI emulator cleans clock from the Ethernet cable, and generates synchronized reference clock for GTH transceivers. FELIX recovers the system clock from one link of the LTI emulator, cleans it with SI5324, uses it as its system clock and also as a reference clock for other transceivers. Similarly CaRIBOu recovers the system clock and uses it as reference clock for the data and command transmission link with FELIX. This clock scheme guarantees that all system clocks of all the different boards are synchronized. The data -6 -from CaRIBOu to FELIX, the commands and status information transferred between FELIX and CaRIBOu, are thus all synchronized to the same system clock.   The test shows that FELIX can be used to do the calibration of the CMOS sensor AMS180V4 [24] and the readout ASIC FE-I4B. The data from FE-I4B is encoded inside the ZC-706, in the format specified by FELIX. One 8-bit e-link is used for the data transmission. The whole system works according to specifications, and showed no glitches for more than 12 hours runs. The software provided by FELIX can stream data continuously to disk.