Upgrade of the Muon Sorter in the Cathode Strip Chamber Level 1 Trigger System at CMS

The top level of the Level 1 Trigger System in the Cathode Strip Chamber (CSC) detector at CMS consists of the Track Finder (TF) crate with 12 Sector Processors (SP) and one Muon Sorter (MS) board. The MS provides sorting of up to 36 trigger objects from the SP boards, selects the four best (by a definable criterion) ones, and transmits then to the Global Trigger crate of CMS. With the anticipated LHC luminosity increase above 1034 cm−2s−1 at an energy of 6.5–7 TeV/beam the CSC TF needs to be upgraded. The new CSCTF will be robust to higher occupancies, provide improved transverse momentum assignment and increased precision of the muon output variables. A transition from the current 9U VME electronic standard to the more flexible uTCA and utilization of the Xilinx Virtex-6 and Virtex-7 FPGAs, with multiple embedded gigabit links, will allow us to build a higher performance TF such that the MS functions can be performed by one of the SP modules. We present here the results of our efforts in the past year to upgrade the CSC Muon Sorter, including the short term modifications of the existing VME board, long-term transition to the uTCA as well as firmware development for both of these projects.


Introduction
The CMS Muon System consists of three sub-detectors: the barrel Drift Tubes (DT), the barrel and endcap Resistive Plate Chambers (RPC), and the endcap Cathode Strip Chambers (CSC) [1]. Currently there are 473 CSC chambers installed and operational since 2009, and 67 more are being fabricated. The CSC chambers are arranged in two endcaps, with four layers (or "stations") of chambers in each endcap. The CSC Track Finder (CSCTF) is housed in a single 9U VME crate. It consists of 12 Sector Processors (SP); each SP serves a 60 degree sector in one endcap. Each SP receives 15 optical streams at 1.6 Gbps data rate from five Muon Port Cards (MPC) residing in the peripheral CSC crates. The received strip and wire positions of each track segment are converted in the SP to Phi and Eta coordinates by using the input SRAM Look-Up tables (LUT). These coordinates are used for three-dimensional track finding and momentum assignment in the Virtex-5 FPGA and output LUTs. Three reconstructed tracks are transmitted to the Muon Sorter (MS) board residing in the middle of the CSCTF crate. The MS sorts out up to 36 incoming tracks, selects the four best (by a chosen criterion) ones, and transmits them to the Global Muon Trigger (GMT) receiver board via the parallel copper links at 40 MHz.
In the coming few years the luminosity of the LHC will be more than doubled and the beam energy increased to 6.5-7 TeV/beam. The MPC and CSCTF boards will need to be upgraded to address the following four issues [2]: momentum resolution improvement; restoring sensitivity to muon jet signatures; completing the Phi coverage of the track finding system; providing a list of muons to the calorimetry system for isolation calculation and b-jet tagging. It is expected that between the LHC first (in 2013-2014) and second (2017-2018) long shutdowns (LS) the complete "old" (existing) and a slice of the new CSCTF will be running in parallel. A block diagram of the hardware and links is shown in figure 1.
The upgraded MPC mezzanine FPGA will be able to provide all the muon trigger patterns (up to 18 of them from one peripheral crate every bunch crossing) to the new SP, in addition to three 1.6 Gbps optical links to the existing SP [3]. The existing MS will be upgraded to provide an optical link to the interim calorimeter trigger; this modification is discussed in section 2 below. The CSCTF -1 - will migrate from the VME to the uTCA standard to take advantage of massive use of multi-gigabit serial links, both copper and optical. All 12 new modular processors will occupy 3 uTCA crates.
Having several input and output optical links, the new processor will be able to maintain optical interfaces to the RPC trigger system as well. The functionality of the new processor and the MS are discussed in section 3.

Interface to the interim calorimeter trigger
Preliminary studies of the muon isolation using CMS calorimeter trigger primitives have shown potential for significant reduction of the single muon trigger rate [2]. The existing MS can be modified in such a way that it would provide a direct optical link to the interim calorimeter trigger so that forward muons can be isolated even before the full upgrade of the trigger system is completed. The present MS mezzanine Virtex-2 FPGA will be replaced by a newer one with the embedded gigabit links. The four best selected muon patterns will be provided to the old copper links as before and also will be serialized and transmitted to the calorimeter trigger receiver board via optical links. The main MS board would not require any changes. The best candidate FPGA is the XC5VLX110T-2FF1136C from Xilinx Virtex-5 family. Virtex-5 supports 3.3 V logic interface for all of its inputs and outputs; this is important since all the functionality on the main MS board is built on 3.3 V parts. The newer Virtex-6/7 families would require to rebuild the main board or use of multiple level converters installed on the mezzanine card. A photograph of the MS board with the production Virtex-2 mezzanine FPGA and a block diagram of the proposed Virtex-5 mezzanine FPGA are shown in figure 2.
-2 -  Each muon is represented by a 31-bit word. The GTP links of the Virtex-5 device can be programmed at a convenient rate of 3.2 Gbps or 1.6 Gbps; then either two or four links would be sufficient to transmit four muons. Either SNAP12 or QSFP optical transmitters can be used. The original FPGA project has been modified, targeted to the Virtex-5 with the GTP links added, and successfully compiled. An optical interface to the calorimeter trigger receiver is being discussed.

Future options for the CSC muon sorting
The first prototype of the upgraded CSCTF processor in the uTCA standard (called the Muon Track Finder, MTF6) is based on a Core Logic Module (CLM) Virtex-6 FPGA. The large external Pt LUT Module is connected to the CLM as a mezzanine card. An independent Optical Module (OM) is another uTCA board that connects to a CLM via the custom backplane connector with point-to-point gigabit links (up to 120 pairs of 3.2 Gbps or 4.8 Gbps rate per link). The present OM incorporates seven 12-fiber board-edge receivers and three 12-fiber transmitters residing in the middle of the module but linked to the front panel with short pigtail fibers. A simplified block diagram of these boards is shown in figure 3. The final CLM board, currently under design, will be based on a Virtex-7 FPGA.
-3 - The bit counts for the existing and future MS are shown in table 1. As one can see, the number of bits to be received and transmitted almost doubles. The modular MTF processor can perform all the functions of the MS as well. Assuming a modest rate of 3.2 Gbps for the MTF output optical links, one would need to transmit three muons, or ∼ 180 bits. They can easily be squeezed into three links (4 frames per word with 8B/10B encoding ). Then, after an intermediate optical coupler, all inputs from 12 processor MTF boards would occupy only 3 12-fiber links. Up to 12 selected muons can be sent from the MS to the GMT via one 12-fiber optical transmitter. The data may be replicated to two other optical transmitters, if needed.

Sorting algorithm and firmware development
The original sorter project [4] was targeted to a Xilinx XC2V4000 FPGA and optimized for sorting of the 4 best trigger candidates out of 36, performed in parallel. The sorting method is based on a 7bit "quality" value; the larger the "quality", the better the trigger candidate is for sorting purposes. Since one of the requirements the for upgrade is the ability to provide more than 4 muons to the GMT receiver, the sorter design has been modified for better scalability. The complete project consists of (n) identical basic modules (figure 4), where n=4,8 and 12 are our primary goals. One module allows to select the "best" pattern out of 36. As a first step, it performs all the 630 possible comparisons between 36 patterns. As a second step, the 36-bit binary address of the "best" pattern is selected. At the final merging step, one "best" pattern along with the updated list of 36-patterns, where the selected one is replaced with a zero value, are provided to the output. Assembling (n) such patterns, it's easy to perform sorting of (n) patterns out of 36. While there are essentially 3n sequential steps in this algorithm, including intermediate registers, each step requires only very simple logical operations (comparison, AND, OR) and can be performed with very low latency in the FPGA.
-4 - Three VHDL projects based on a described above model have been implemented with the Xilinx XC5VLX110T-2FF1136C FPGA as a target device for the following three sorters: "4 out of 36", "8 out of 36" and "12 out of 36". The selected patterns are chosen in a "ranked" order. The latency increases linearly with the number of outputs and is equal to 2, 4 and 6 bunch crossings (50, 100, 150 ns) respectively. The projects were compiled using the present the 7-bit and a future 13-bit comparator units. Full project compilation takes less than one hour; the 13-bit comparators require more logic resources, but still they all should fit perfectly into our large Virtex-6 or Virtex-7 FPGA. The main advantage of the new design is its scalability.

Conclusion
The upgrade of the CSC Track Finder in general and its top component, the Muon Sorter board in particular, spans approximately 6 years from the start of LHC Long Shutdown 1 (LS1) in 2013 until the end of Long Shutdown 2 (LS2) in 2018. As a fist stage during LS1 we plan to upgrade the mezzanine FPGA in the existing MS board as well as the FPGA mezzanines on all MPC boards that provide trigger primitives for the CSCTF. In addition to existing functionality, the modified MS will provide an additional optical link to the "interim" calorimeter trigger for operation in 2015-2017. The Virtex-5 FPGA for this design has been identified and the schematic design of the new mezzanine is in progress. We expect to build a first prototype in 2014.
The CSCTF design is migrating from the 9U VME standard to more flexible uTCA architecture with the emphasis on wider use of gigabit serial links. We expect that a fraction (slice) of the new TF will be ready by the end of LS1 and will be functioning parasitically in 2015-2017 in parallel with the old TF. We plan to build a new uTCA processor in such a modular way that it will be able to perform all the Muon Sorter functions as well, eliminating the need to build a separate complex board. It will receive three reconstructed muons from each of 12 MTF boards and provide up to 12 pre-sorted muon patterns to the CMS Global Trigger; all inputs and outputs via optical links.

JINST 8 C11016
We have modified the sorting algorithm for better scalability with the increased number of output tracks (from the original 4 tracks to 8 or 12 tracks). It can be used for both the upgrade stages mentioned above. Three corresponding projects have been successfully implemented in the Virtex-5 FPGA with a low latency of 50, 100 and 150 ns respectively. With the design targeted to higher performance Virtex-7 device, the latency can possibly be reduced further.