Prototype readout electronics for the upgraded ALICE Inner Tracking System

The ALICE Collaboration is preparing a major upgrade to the experimental apparatus. A key element of the upgrade is the construction of a new silicon-based Inner Tracking System containing 12 Gpixels in an area of 10 m2. Its readout system consists of 192 readout units that control the pixel sensors and the power units, and deliver the sensor data to the counting room. A prototype readout board has been designed to test: the interface between the sensor modules and the readout electronics, the signal integrity and reliability of data transfer, the interface to the ALICE DAQ and trigger, and the susceptibility of the system to the expected radiation level.


Introduction
ALICE (A Large Ion Collider Experiment) is an experiment designed to study the properties of the Quark-Gluon Plasma (QGP) using proton-proton, proton-nucleus and nucleus-nucleus collisions at the CERN Large Hadron Collider (LHC). The ALICE Collaboration is preparing a major upgrade to the experimental apparatus, planned for installation during the second long LHC shutdown in the years 2019-2020. A key element of the ALICE upgrade is the construction of an ultra-light, 7-layers, high resolution Inner Tracking System (ITS) [1]. The ITS is the innermost barrel detector of the ALICE apparatus (figure 1), closest to the interaction point. Its main functions are tracking charged particles in a magnetic field for momentum measurements, and reconstructing primary interaction vertices (generated directly in the primary beam collisions) and secondary interaction vertices (from the decay of short lived particles).

Detector design
The upgraded ITS, shown in figure 2, comprises 7 cylindrical layers. The Inner Barrel consists of the three innermost layers, while the Middle and Outer Barrels contain the rest. The ITS layers are azimuthally segmented in units called staves, which are mechanically independent. The staves within each group of layers share a common design and are fixed to the endcap wheels, which serve as precision supporting structures. Cooling pipes and cabling enter from only one side of the detector, simplifying its maintenance.

Pixel sensors and staves
The basic sensor units of the upgraded Inner Tracking System detector are Monolithic Active Pixel Sensor ASICs [2,3] specifically designed for this experiment and implemented in 0.18 µm TowerJazz CMOS Imaging Technology [4]. The sensor size is 30 mm × 15 mm and it is thinned down to 50 µm on inner layers and 100 µm on outer ones. They are equipped with a high-speed serial data interface and a bidirectional control bus for configuration and monitoring. There are a -2 -total of 24,120 sensors in the ITS, creating a detection surface of 10 m 2 which is segmented into 12.5 billion pixels. Each Inner Barrel Module (figure 3) is equipped with 9 pixel sensors which share common clock and control signals. Each of the sensors has its own high-speed serial output running at 1.2 Gbps. Pixel sensors are glued and wire-bonded to the Flex Printed Circuit (FPC), constituting a module. It is mounted on a carbon fibre space frame, creating a stave. The space frame provides mechanical fixing and cooling. There are 48 staves within layers 0, 1 and 2 which utilize 432 pixel chips in total. The length of the Inner Barrel Module is approximately 30 cm.

Readout System architecture
The ITS Readout System, illustrated in figure 5, is composed of 192 readout units (RU) located about 5 m from the detector's edge. The RUs configure and control the pixel sensors, receive and assemble data, and manage Power Units. Pixel sensors are controlled and monitored via 624 bidirectional, differential lines; the clock signals are distributed via 624 lines, and their data are read out via 3816 differential high-speed lines. These lines are made of segments of microstrips on the FPCs and of micro-twinax cables from the detector's edges to the Readout System. Between the -3 -detector and the Readout System, the interconnection lines are grouped into the micro-twinax ribbon cables capable of carrying 12 differential pairs. The Readout System interfaces with the ALICE Online and Offline Computing System (ALICE O 2 ) [5] and the Central Trigger Processor via GBT optical links [6]. Data are sent out to the counting room using up to 576 optical unidirectional links. Control data and trigger signals are received via two sets of 192 optical links. Each Readout Unit has one bidirectional optical control link, one downstream trigger link, and up to three upstream data links. A summary table with the number of resources used per layer is presented in table 1.   0  12  12  12  12  36  12  1  16  16  16  16  48  16  2  20  20  20  20  60  20  3  24  96  24  24  72  24  4  30  120  30  30  90  30  5  42  168  42  42  126  42  6  48  192  48  48  144  48  Total  624  192  192 576 192

The ITS Prototype Readout Unit
The Prototype Readout Unit, shown in figure 6, was designed to address many different R&D activities related to the development of the ITS Readout System. Its versatile architecture allows for testing and verification of the interface between a single sensor or a sensor module and readout electronics, as well as signal integrity and data transfer reliability over 5 m long twinax cables. It does the same for the interface to the power units and the ALICE O 2 Computing System, along with -4 - the triggering capabilities. It is also utilized during beam tests where the FPGA's performance is evaluated and methods for mitigation of the radiation effects are developed and characterized. The ITS RU v0a is a 6U VME form factor with 10 layers.
To configure and receive the data from the pixel sensor, to communicate with a PC, and to control the power unit, the Xilinx Kintex-7 325T device is utilized. It operates the GBTx chip and an optical SFP module. Multiple FPGA configuration schemes have been foreseen to support both the initial programming and scrubbing. The FPGA can be configured either by a JTAG programmer, GBT-SCA [7] JTAG master controller, or via a flash-based memory. Shunt resistors at the outputs of the power converters with easily accessible terminals facilitate power monitoring and characterization. To provide communication with a PC or other DAQ systems, the board is equipped with the Cypress FX3 USB 3.0 controller and an SFP optical transceiver module. A versatile clock tree allows the FPGA to operate using either a local 160 MHz clock, an LHC 40 MHz machine clock received by the GBTx chip, or an externally provided clock. The prototype readout unit provides a dedicated interface to control the ITS Power Unit. An FMC slot is used to interface -5 -with a GBTx-FMC board [8] that allows for communication with a DAQ or trigger system via the versatile GBT link. Most of the unused transceivers and clock inputs can be accessed from outside to provide more flexibility. The adjustable power supply section generates and monitors the voltages for the pixel sensor modules. With the many termination topologies provided on the ITS RU v0a, different connection schemes of the pixel sensor module can be explored.

Radiation testing
The Prototype Readout Unit was used as a platform for radiation susceptibility testing. Both the Xilinx Kintex-7 325T FPGA and the GBTx chip were irradiated. The irradiation experiments were conducted at the isochronous cyclotron at the Nuclear Physics Institute of the Academy of Sciences of the Czech Republic in Rez, near Prague. The machine provides a proton beam with an energy range from 6 to 37 MeV. The available proton flux ranges from 10 4 to 10 14 p cm −2 s −1 , over a uniform area of about 2.5 × 2.5 cm 2 . A dedicated dosimetry system presented in [9] was used to scan the beam profile and monitor its intensity during the irradiation.
The expected radiation levels at the position where the ITS Readout System will be installed are an equivalent 10 krad of a total ionizing dose (TID) (with a safety factor of 10), a 1 MeV neq fluence of 1.6 · 10 11 p cm −2 , and a hadron flux energetic enough (>20 MeV) to induce an SEU1 in the readout electronics of about 10 3 cm −2 s −1 .
During the irradiation campaign, the FPGA's CRAM2 cross-section was measured to be 3.07 · 10 −15 cm 2 bit −1 and compared with existing results [10] to verify a correct beam dosimetry. Single-event latchup events in the expected radiation field of the RU are not expected, as shown for example by studies like those in [11].
Reconfiguration methods, including the use of the Xilinx Soft Error Mitigation IP [12] and custom active partial reconfiguration, were studied. Logic firmware blocks were tested and their susceptibility to radiation induced SEU effects was evaluated. The results shown in this study and follow-up experiments will be used to determine the expected error rates of the ITS Readout System and to determine the necessary mitigation methods.

The testing firmware
In order to study the effectiveness of different techniques for mitigating radiation effects, a firmware design based on the report in [13] was implemented. The firmware consists of test structures called lanes ( figure 7) and their readouts. Each lane consists of a hardened pattern generator, a logic test structure (STEP), and a hardened pattern checker. The pattern generator generates 6-bit test vectors (from 0 to 63). The logic test structure is replicated 64 times, forming an array which shifts the test vectors. It consists of a hard-coded LUT transfer function (combinatorial logic) and an output register.
To test different redundancy schemes, it is possible to replicate either the combinatorial logic, the output register, or both. As many FPGA resources as possible were utilized in order to maximize 1Single Event Upset. 2Configuration Random Access Memory.
-6 -the area susceptible to radiation. Depending on the selected redundancy scheme, the firmware was compiled with 256, 160, 128 or 64 lanes. The pattern checker compares the output from the array of logic test structures with the output from the pattern generator. If a discrepancy is found, an error pulse is generated and an error counter is incremented. During the test, the values stored in the error counters are periodically read out via the USB 3.0 interface and saved for further analysis. Tests were carried out both in the beam test and by fault injection in the lab environment using the Xilinx Soft Error Mitigation IP.
The testing firmware was compiled in four different variants. In the first one, nothing was replicated. In each lane, 6-bit vectors generated by the pattern generator are shifted via the array of logic test structures. In the second variant, the output register of each STEP is triplicated and a voter selects the correct output data. In that scheme, the number of configuration bits required to configure the voter is approximately 3 times higher than the number of bits necessary to configure the triplicated output register. In the third variant, the combinatorial logic is triplicated, and the correct data are selected by voting before storing in the output register. In this case, the number of configuration bits required to configure the triplicated combinatorial logic is approximately 9 times higher than the number to configure the voter. In the last variant, both the combinatorial logic and output register were triplicated and voted.

Results
A comparison of the cross-sections obtained from the beam test and the fault injection test is presented in table 2. The results are comparable and they show that the fault injection test can be used as an alternative method to estimate the susceptibility of the firmware to SEUs. For each test the FPGA was irradiated with a fluence from 0.89 · 10 9 to 1.27 · 10 9 p cm −2 at a beam energy of 30 MeV measured on the surface of the FPGA. For the fault injection test, it was estimated that an equivalent fluence corresponds to injecting from 205 to 293 errors. Mitigation techniques based on refreshing the FPGA's configuration memory (scrubbing) were not employed in these tests. The cross-section per lane in the firmware, where only the STEP output register is triplicated, is higher than the cross-section of the firmware where nothing is triplicated. This is expected because the physical cross-section (number of used configuration bits) of the additional voter after the triplicated register is much higher than the cross-section of the triplicated register itself. The cross-section per lane in the scheme, where only combinatorial logic is triplicated, is lower than the cross-section of the firmware where nothing is triplicated. This was anticipated because the physical cross-section of the triplicated combinatorial logic is much higher than the voter that votes the correct data. However, the firmware with that redundancy scheme occupies the biggest area per lane. The lowest cross-section per lane was obtained for the firmware where both the combinatorial logic and the output register were triplicated.
If a voter's physical cross-section is higher than the cross-section of the protected circuit, then a decrease of radiation susceptibility can be observed. Before applying the mitigation technique, the structure to be protected must be studied to find the most appropriate method to implement.