The proposed trigger-less TBit/s readout for the Mu3e experiment

The Mu3e experiment searches for charged lepton flavor violation in the rare decay μ→eee with a projected sensitivity of 10−16. A precise measurement of the decay product momenta, decay vertex and time is necessary for background suppression at rates of 109 muons/s. This can be achieved by combining an ultra-lightweight pixel tracker based on HV-MAPS with two timing systems. The trigger-less readout of the detector with three stages of FPGA-boards over multi GBit/s optical links into a GPU filter farm is presented. In this scheme data from all sub-detectors is merged and distributed in time slices to the filter farm.


The Mu3e experiment
The Mu3e experiment [1] searches for lepton flavor violation in the decay µ → eee. This decay is possible in the standard model through neutrino mixing but is suppressed to unobservable levels (BR< 10 −50 ), any observed signal event would be a clear sign for new physics. While the best previous experiment had a branching ratio sensitivity of 10 −12 [2], the sensitivity goal of the Mu3e experiment is 10 −16 . In order to reach this sensitivity, well over 10 16 muon decays have to be precisely reconstructed, which leads to a required decay rate of 10 9 muons/s and several years of running. To discriminate signal events from combinatorial and radiative background, a highly granular tracking detector with good spatial resolution and a precise timing system is mandatory. The tracking detector is realized as an ultra-thin high voltage monolithic active pixel sensor (HV-MAPS) pixel tracker composed of two vertex layers around the target, two central outer layers and further pairs of outer layers upstream and downstream of the central detector. The pixel tracking detectors are complemented by the scintillating fiber detector and the scintillating tile hodoscope, which deliver precise timing information for the particle tracks, see figure 1. While the picture -1 -  Figure 1. Mu3e detector shown with a signal event: the muon beam hits a fixed target in the center and the positive muon decays into two positrons (red) and one electron (blue). The vertex is determined by the two inner pixel tracker layers, which are surrounded by a fiber hodoscope for time and two further pixel tracker layers for momentum measurement. In the forward and backward region the momentum and time of the re-curling electrons and positrons are more precisely determined with two more tracking stations and scintillating tile detectors.
shows the Mu3e detector for phase I of the experiment, another pair of stations will be added in phase II. The track and timing information is read out trigger-less to a GPU based online filter farm, via three stages of FPGA driven data acquisition boards and an optical link network.

Timing detectors
For the suppression of combinatorial background, it is important to precisely determine the time for each track. In the Mu3e experiment, two different timing detectors are foreseen, a scintillating fiber detector and a scintillating tile detector. The 36 cm long scintillating fiber detector is placed at 6 cm radius in between the innermost layers of the silicon pixel tracker and the outer central pixel layers, see figure 1. This position is chosen to measure the time even of low momentum electron and positron tracks, to determine the charge sign of re-curling particles and to match tracking points between central and re-curl stations. As the fiber detector is in the middle of the tracking system, the amount of material has to be kept small, so it is envisaged to use three to five layers of 250 µm thick double cladded scintillating fiber. Silicon photo multiplier arrays will be used for the photon detection. It has been demonstrated that with a fiber hodoscope of this type time resolutions of 500 ps and efficiencies close to 100% can be achieved [11]. The total number of scintillating fibers will be in the order of 4000. The very high single cell hit rate of up to 5 MHz for the fiber detector poses a challenge for the readout, requiring newly developed ASICs.
The second timing detector is a scintillating tile detector [12,19], which is placed at 6 cm radius in the forward and backward part of the Mu3e detector, consisting of four stations with 36 cm length each. The re-curling electrons and positrons going in the forward or backward direction will pass another double layer of the silicon pixel tracker and then hit the scintillating tiles. Since the particles will be stopped in or after the scintillating tile detector, the thickness of the tiles can be optimized for best time resolution. The size of one tile will be 7.5 × 7.5 × 5 mm 3 close to the target -2 -and 10 × 10 × 5 mm 3 in the far stations, resulting in a total number of 7200 tiles. Single silicon photo multipliers of 9 mm 2 active area together with tiles made of BC420 from Bicron have shown nearly 100% efficiency and time resolutions of less than 50 ps. Similar to the fiber detector the single cell hit rate is up to 3 MHz assuming 2 ×10 9 muons/s decay rate on target.
Both for the fiber detector and the tile detector, two alternative front-end electronic options are investigated. One option would be a faster version of the DRS4 chip [14][15][16][17], which is a switched capacitor array with 5 giga samples per second (GSPS) and close to 12 bit resolution. The future DRS5 chip could offer 10 GSPS speed and allow for deadtime-less readout at event rates of 5 MHz. The DRS5 sampling chip would allow to acquire the entire waveform around the hit and thus be able to detect and suppress pile-up. With the DRS4 chip a timing accuracy of 40 ps was achieved across many thousand channels in the MEG experiment [18].
The other option is a readout based on the STiC chip [9]. The STiC chip is a mixed mode 64 channel chip for SiPM readout with high time resolution. It combines a 6 bit DAC for the voltage tuning of each SiPM with a dual threshold based time and charge determination. A 16 channel versions of the STiC chip has shown time resolutions of about 50 ps. The STiC chip could potentially be integrated directly at the silicon photo multiplier arrays of the scintillating fiber tracker. Independent of the choice of readout ASIC, the data rate per channel of the timing detectors will be larger than 100 Mbit/s.

Silicon pixel tracker
As the decay products have a very low momentum of only 53 MeV/c and less, the track resolution is dominated by multiple Coulomb scattering. Therefore the tracking detector has to be very thin, while the pixel size of 80 µm × 80 µm can be relatively large. By combining a Kapton foil support structure with High Voltage Monolithic Active Pixel Sensors (HV-MAPS) [3,5] thinned to 50 µm, a layer thickness can stay below 0.1% of a radiation length. The innermost part of the silicon pixel tracker is two cylindrical detector layers of 12 cm length at approximately 2 cm and 3 cm radius. They are equipped with 180 inner pixel sensors of 1 × 2 cm 2 size. With the help of these inner layers the decay point or vertex can be measured with a precision of about 200 µm, which together with the relatively large target surface, helps to separate decay products from different muons. The next two pixel detector layers are 36 cm long and mounted at a radius of 7.3 cm and 8.5 cm. These layers together with the solenoid magnet field of 1 T provide a momentum measurement, which is necessary to suppress background from the radiative decay µ → eeeνν. To further enhance the precision of the momentum measurement, the outer two layers are repeated upstream and downstream of the central tracker. In these stations hits from the re-curling electrons and positrons can be used to determine their momentum with a resolution better than 0.5 MeV/c. While only two re-curl stations are foreseen in phase I of the experiment, another two stations will be added for phase II in order to achieve a sensitivity of 10 −16 . All five stations of outer double layers are build from 4680 outer sensors of 2×2 cm 2 , leading to a total number of about 280 million pixels.
-3 - Figure 2. Schematic cross section of a HV-MAPS cell. HV is applied between the p-substrate and the n-well. A charge signal is sensed after an ionizing particle passes the thin depletion layer. The integrated CMOS readout electronic is protected by the deep n-well.

HV-MAPS
High voltage monolithic active pixel sensors (HV-MAPS) [3-5, 7, 8] combine the advantages of fast hybrid pixel detectors and thin monolithic active pixel sensors. Commercially available HV-CMOS technology allows to fabricate pixel sensors running at high voltage together with analog and digital readout electronics on one sensor chip. The high voltage of typically 60 V across a depletion layer of only 9 µm leads to a high electric field and subsequently fast charge collection via drift. The analog and digital readout circuits are designed in a deep n-well which is implanted into the p-substrate and therefore protected against the high voltage, see figure 2. While the first stages of pre-amplification are integrated into each pixel cell of 80 × 80 µm 2 size, the comparator and the digital readout is placed at the chip periphery in order to minimize digital cross talk. As opposed to other monolithic pixel sensors the HV-MAPS add a frame time stamp with 50 ns resolution to each hit data-point. The data from the pixel cells is zero-suppressed and output over high speed differential serial links running at 800 MBit/s.

Pixel sensor readout scheme
The HV-MAPS chips designed for Mu3e comprise a digital readout logic for zero-suppression. The first stage is the pixel logic which stores the hit bit and a frame time stamp if the cell senses a particle hit. Inside each column a token is passed from one pixel logic cell to the next. Pixels with active hit bit send address and frame time stamp to a column memory while they posses the token. The column address and a coarse time is added to pixel address and frame time stamp, then the data is buffered and assembled into large readout packages. These packages are distributed over up to four fast serial output links.

Data link scheme
The data links in the Mu3e experiment ship multiple Tbit/s of zero suppressed data from the pixel sensors and timing detector front-end ASICs to a filter farm based on graphics processing units (GPUs). With the help of three stages of FPGA based data acquisition boards and high speed optical links the readout network merges data from all detector sub-components and sends time slices of these merged events to individual computing nodes. There are three types of links used in Mu3e, the LVDS front-end links, optical links from the on-detector electronics to the counting house and optical links inside the counting house, see figure 3.

Front-end links
The front-end links run from the front-end chips to the front-end data acquisition boards. Both HV-MAPS and STiC front-end chips have integrated digital logic, zero suppression and fast serialisers with one to a few differential outputs running at 400 to 800 Mbit/s. The LVDS signals from the front end chips are carried by flex print cables to the front-end data acquisition boards, which are placed close to the active detector region. The front-end boards house powerful FPGAs. In the case of the silicon pixel tracker between 36 and 45 LVDS links are connected to one front-end FPGA. These FPGAs buffer sort and merge data from all inputs and send out time slices of merged data over high speed optical links.

Detector to counting house links
The high speed optical links between the detector and the readout boards in the counting house serve multiple purposes. The primary purpose is to ship about 4 Tbit/s over more than 1000 high speed links running at 5 to 6.25 Gbit/s. The second purpose is to galvanically decouple the detector from the counting house electronics. The third purpose is to switch the data stream between the four different sub-farms of the event filter farm. This switching is done by connecting one optical output of every front end FPGA to one readout board in each sub-farm. Matching time slices from all parts of the detector will be sent to the same sub-farm. The following time slice is then sent to the next sub-farm and so on. Evidently the time slices sent from all front-end FPGAs have to be well synchronized with respect to start, end and destination sub-farm.

Sub-farm links
Inside the counting house there is a second set of high speed optical links, which are connecting the readout boards to the farm PCs. These links run at 8.5 to 10 Gbit/s. Inside a sub-farm each of the readout boards is connected to all twelve PCs. Like in the previous stage, data belonging to one time slice is sent to the same PC. In the PC the third and last class of FPGA boards receive the data from the high speed optical links. These FPGAs build blocks of events, reject events without tracks coinciding in time and pass the remaining data along to the GPU via PCIe 3.0. It is foreseen to use direct memory access (DMA) when transferring data from the FPGA on the PCIe card to the GPU, which will allow to more efficiently use the available bandwidth.

Data acquisition boards
There are three different types of data acquisition boards in the Mu3e experiment: the front-end boards inside the detector, the readout boards in the counting house and FPGA boards inside the PCs. Their main purpose is to ship data from the detector to the filter farm PCs, but they are also used to distribute the system clock and the event package destination as well as the slow control.

Front-end boards
The front-end boards are inside the detector volume close to the HV-MAPS and timing ASICs. They house one to eight powerful FPGAs with differential I/Os for serial data at up to 1.6 Gbit/s each. Each of the FPGAs on the front-end boards receives the data from one half-module of the detector over 36 or 45 LVDS links. The data is deserialized, buffered and reformatted on the frontend FPGAs. As described in section 4, large time slices of data are send to readout boards in the counting house over fast serial optical links. As each of the four optical output links is connected to a readout board of a different sub-farm, the data destination has to be switched synchronously for all front-end FPGAs.
Besides housing the front-end FPGAs, the front-end boards are also supplying low and high voltage to the HV-MAPS, send test-pulse signals and digitizing the values measured with the onchip temperature sensors.

Readout boards
The FPGA driven readout boards in the counting house connect front-end electronics and the PCs of the filter-farm. There are 32 readout boards in total, in each of the four sub-farms four pixeltracker readout boards and two plus two readout boards for scintillating fiber and tile detector. Each readout board receives data from one detector partition, for example the upstream half of the central silicon pixel tracker. The data is sent over up to 52 fast optical links to the readout board, deserialized, reformatted, buffered and send over a second set of optical links to the twelve PCs of the same sub-farm. In order to equally distribute the computational load over the PCs, data from all detector elements belonging to the same time slice is sent to one PC at a time. The readout board FPGA must have up to 64 10 Gbit/s optical in and outputs altogether. The readout boards will be located in counting house crates with power and slow control connections.

PCIe cards
The FPGA driven PCIe card inside the filter farm PCs will receive data from all eight readout boards of their sub-farm over high speed optical links running at 8.5 to 10 Gbit/s. In practice the optical receivers are mounted on optical mezzanines like the Santa-Luz card developed by the LHCb group of the TU Dortmund [13]. Alternatively additional quad optical transceivers directly integrated on the PCIe card can be used. The FPGAs on the PCIe cards build events and can filter for coincidences in time. The interesting events are re-formatted and send via the PCIe bus to a graphics processing unit. The data transfer will be performed through direct memory access between FPGA and GPU to avoid extra overhead. The PCIe cards can either be commercial development kits from the FPGA vendor or custom made cards as considered for the LHCb upgrade.

Online event filtering
The Mu3e detector reads out trigger-less to the online event filter farm at muon decay rates on target of up to 2 GHz. The filter farm has to reduce the number of events by three to four orders of magnitude. This is possible by removing all combinatorial background with the help of the timing and vertex information. The remaining events, containing mainly µ → eeeνν decays will be stored and analyzed offline.

Timing filter
The timing filter is based on the full data delivered by the scintillating fiber and scintillating tile detector. Large enough time slices of this information are available to the FPGAs on the PCIe cards in the event filter farm PCs. After re-synchronizing and ordering the information coming from the different sub-modules of the timing detectors, coincidences of multiple tracks are searched for. Candidate events with three kinematically allowed tracks coinciding in time are sent to the GPU for vertex filtering.

Vertex filter
The online vertex filter reduces combinatorial background by reconstructing the decay point of the muons on the large target. In order to precisely determine the vertex position, hits from the silicon -7 -pixel detector are combined to tracks pointing back to the decay point. The reconstruction of tracks is done with the help of powerful commercially available graphic processing units (GPUs), which are capable of performing 10 9 triplet fits per second [1]. The triplet fit is a proposed tracking algorithm which takes multiple Coulomb scattering into account [6,10,20]. It has been shown in simulation, that even with loose vertex requirements the online vertex filter can reduce the event rate by a factor 10 3 at muon decay rates of 2 × 10 9 Hz. Further significant data reduction can be achieved by combining the vertexing with modest kinematic requirements, as for example the three-particle invariant mass or the planarity.

Summary
The Mu3e experiment will search for the charged lepton flavor violating decay µ → eee with a sensitivity of 10 −16 . This aim is achieved with the help of a 280 million channel silicon pixel tracker realized in the novel HV-MAPS technology and two timing detectors measuring the trajectories of electrons and positrons of up to 2 × 10 9 muon decays per second on a fixed target. Both the HV-MAPS chips and the timing detector ASICs output zero-suppressed data trigger-less. This multi Tbit/s of data from the front-end ASICs is readout via a switched optical network based on three stages of FPGA boards. In each stage data from a geometrical slice of the detector is combined and a time slice of this information is sent to one node of the next stage at a time. As a result time slices of complete detector data can be processed in each of the 48 filter farm PCs. The online filter farm strongly reduces the amount of data by suppressing the combinatorial background by three to four orders of magnitude. This can be achieved by searching for events with timing coincidences on the FPGA based PCIe card and reconstructing the vertex position of the signal candidate tracks with a powerful GPU.
The combination of the above described technologies allows to readout and process the full information of a multi 100 million channel detector at GHz event rates and will enable the Mu3e experiment to improve the sensitivity for the decay µ → eee by four orders of magnitude with respect to today's most accurate measurement.