A full-function Global Common Module prototype for ATLAS Phase-II upgrade

The High Luminosity Large Hadron Collider (HL-LHC [1]), an upgrade of the LHC, is set to become operational in 2029, aiming to achieve instantaneous luminosities 5–7.5 times larger than the nominal value of the LHC. However, unlocking the full physics potential at this much higher luminosity level necessitates a tenfold increase in the data bandwidth processed by ATLAS. This poses significant challenges to the design of the Trigger and Data Acquisition systems. To address these challenges, a baseline architecture has been chosen for the ATLAS Phase-II upgrade, relying on a single-level hardware trigger known as the Level-0 Trigger. This trigger has a maximum rate of 1 MHz and a latency of 10 μs. Central to this upgrade is the inclusion of a new subsystem — the Global Trigger [2]. This component performs complex algorithms, akin to those currently used in Phase-I high-level trigger software (such as Topoclustering), on full-granularity calorimeter data. The Global Trigger is divided into three sublayers: the Multiplexer Processor (MUX) layer, the Global Event Processor (GEP) layer, and the Global to Central Trigger Processor [3] interface (gCTPi). A full-function Global Common Module (GCM) hardware prototype has been designed to fulfill the requirements of all three sublayers of the Global Trigger, featuring different firmware loads. This GCM prototype, based on the ATCA [4] front board form factor, incorporates two of the latest AMD (Xilinx) Versal Premium devices VP1802 [5]. These devices boast double the density of the Virtex UltraScale+ FPGA VU13P used in the previous design [6] and include an integrated SoC with a completely new architecture. To handle high-speed I/Os, this GCM prototype employs twenty 12-channel 25.7 Gb/s FireFly [7] optical engines. The estimated maximum power consumption of this GCM prototype is 400 W, which falls within the cooling capabilities of the ATLAS ATCA shelf. To ensure power integrity, signal integrity, and thermal performance, extensive PCB simulations and thermal simulations have been done to guide the layout design of the GCM prototype. This paper provides an in-depth overview of the design process for this full-function GCM prototype hardware, with a particular focus on technology choices and simulation results.

1 Global Trigger Architecture Figure 1 illustrates the architecture of the Global Trigger subsystem, specifically designed to provide Event Filter-like capabilities to the Level-0 trigger system for the HL-LHC.The Global Trigger architecture is structured into three layers: the Multiplexer Processor (MUX) layer, the Global Event Processor (GEP) layer, and the Global-to-Central Trigger Processor interface (gCTPi).The MUX layer consists of up to 56 nodes, managing a total throughput of around 60 Tb/s.These nodes gather data from detectors (calorimeter and muon) and legacy Feature Extractor modules through more than 2300 fibres.The MUX layer employs time-multiplexing on a bunch-crossing basis, sending complete events to 49 nodes in the GEP layer in a round-robin manner.Within the GEP layer, each GEP node processes the entire event data related to a specific bunch-crossing.These nodes execute complex algorithms and transmit the outcomes to the gCTPi.The gCTPi, comprising a single node, selects and resynchronizes results from all GEP nodes before forwarding them to the Central Trigger Processor (CTP).Notably, this time-multiplexed architecture of Global Trigger is highly scalable and enables the implementation of asynchronous and iterative high-level algorithms.-1 - Having proven successful in the ATLAS Phase-I system, the ATCA platform is chosen for the implementation of this GCM prototype.Figure 2 depicts the block diagram of its design.The top right Versal Premium VP1802 is designated for a MUX node, while the bottom left VP1802 is allocated for a GEP or gCTPi node.Surrounding both VP1802 units are twenty Firefly 25 Gbps parallel optical engines, resulting in a total of 240 high-speed links.

JINST 19 C02049
The maximum estimated power consumption for each VP1802 device, using AMD's Power Design Manager software, is 130 W with 50% resource utilization running at 320 MHz.To provide an adequate margin, the hardware power design on the board is capable of supplying 165 W to each VP1802 device, equivalent to 70% resource utilization running at 320 MHz, with the current on the VP1802 core voltage rail VCCINT reaching 170 A.
Considering the vertical airflow configuration of the ATLAS standard ATCA shelf, the placement of the two VP1802 devices on the GCM is staggered vertically to effectively manage the cooling design challenge of the board.It is anticipated that the modules will operate for more than twenty years so it is crucial to maintain critical devices at temperatures significantly below their maximum ratings.The design and simulation of heatsinks for both the VP1802 and Firefly optical engines have been outsourced to specialized companies.Simulations suggest achievable temperatures of 70 • C for the VP1802 and 50 • C for the Firefly optical engine, following the board layout as depicted in figure 2.
With different firmware loads, this GCM board fulfils the requirements of all three sublayers of the Global Trigger.This common hardware approach significantly simplifies system design and long-term maintenance, minimizing the complexity of firmware and software development by leveraging a shared infrastructure.

GCM prototype design methodology
This GCM prototype stands out as a high-speed, high-density, and high-power ATCA front board.To ensure the success of this project, a systematic PCB design methodology is adopted, as illustrated in figure 3.
PCB simulation is integral to such a complex board design and is seamlessly integrated into our design flow.Pre-layout simulation has played a crucial role in determining various aspects of PCB -2 -technology, including laminate material, layer count, via technology, copper thickness, BGA breakout pattern, and more.Consequently, this GCM prototype utilizes a 26-layer PCB with via-in-pad and backdrill technology.Via-in-pad technology is particularly vital for high-speed large BGA breakout, as it reduces 3D impedance discontinuities and improves routability simultaneously.Backdrill technology is essential for the performance of high-speed vias.The total copper weight in the GCM PCB stackup is 6 oz for four power layers and 7 oz for twelve ground layers.Only ground layers are used as the reference planes for high-speed signal layers.This PCB configuration represents the state of the art in the PCB industry, considering the ATCA board thickness constraint.Post-layout simulation has been instrumental in optimizing and verifying the PCB performance, with detailed results presented in the next section.With the AC performance (e.g., ripple noise) of DC-DC components (e.g., LTM4681) already tested and verified on evaluation boards beforehand, the primary challenge in the GCM PCB power distribution lies in addressing the issue of on-board DC drop on large current power rails.

GCM prototype PCB post-layout simulation 3.1 Power integrity simulation
Figure 4 displays the simulation results of the DC drop before optimization on the VP1802 core voltage power rail VCCINT at a maximum current of 170 A. Two issues are identified.Firstly, the DC drop on VCCINT is excessively high, resulting in a total of 14 W dissipated in the copper.This impacts the GCM power budget and leads to cooling problems of the PCB itself.Secondly, numerous power vias carry over 2 A current, posing a concern for the board's longevity.
To address these problems, the VCCINT power distribution is refined through a meticulous examination of nearby signal layers.Some signals are strategically rerouted, facilitating the addition of targeted copper fills in signal layers for improved power distribution.Figure 5 demonstrates the results of this optimization after multiple iterations, highlighting a reduction of more than half in the DC drop and the effective suppression of via current spikes.
Another issue identified during the power DC simulation is the occurrence of a hotspot on a relatively low-current power distribution, as depicted in figure 6 (left).Despite the current on this power rail being only 15 A, a hotspot develops due to the heavily perforated power plane.Once spotted, addressing such an issue is relatively straightforward, as demonstrated in figure 6 (right).

Signal integrity simulation
The majority of the 240 on-board high-speed links operate at 25 Gbps.The key challenges here involve optimizing the 3D breakout area at both ends of the links and minimizing the crosstalk between the links.

VP1802 breakout optimization
Figure 7(left) illustrates the use of via-in-pad technology for the VP1802 breakout area.In this approach, the vias are drilled directly into the VP1802 footprint pads, effectively merging the two 3D impedance discontinuities of the pad and via into one.Additionally, back drilling is employed to remove the via stubs.To align the impedance of the differential vias with the 93-ohm target of VP1802, a dog-bone-shaped anti-pad is incorporated around the differential vias.The simulation result of the final optimization is depicted in figure 7 middle and right, where the VP1802 breakout pattern's impedance is brought within 10% of the target impedance, and the performance in frequency domain is excellent beyond the signal spectrum of 25 Gbps.

Firefly optical engine breakout optimization
The Firefly optical engine employs a fine-pitch surface mount connector, as shown in figure 8(left).To align the impedance closer to the target, the ground plane immediately under the differential pads in the connector is cut out.Differential vias with four ground vias are utilized to connect the high-speed links to inner layers.A sweeping process in the simulation is employed to determine the optimal size of the anti-pads for differential vias.The simulation result of the final optimization is depicted in figure 8 middle and right, where the Firefly optical engine breakout pattern's impedance is brought within 10% of the target impedance.The performance in the frequency domain is very good beyond the signal spectrum of 25 Gbps.-4 -

Typical 25 Gbps channel performance
25 Gbps links (3 to 4 inches long) on the GCM prototype are simulated with optimization applied to both ends.The impedance of the entire channel is well controlled, as depicted in figure 9(left).The insertion loss of this channel smoothly rolls off with frequency, providing a significant margin against the limit set in the industry standard OIF-CEI-04.0 [8].Crosstalk control on a high-density, highspeed board is of utmost importance.AMD has specified stringent crosstalk requirements between their multi-gigabit transceivers (MGT) on the VP1802 in three cases.For the Tx-Tx very shortrange (VSR) case, the crosstalk limit is −35 dB at signal Nyquist frequency.For the Tx-Rx VSR, the limit is −45 dB, and for the Rx-Rx VSR, the limit is −40 dB.This GCM design opts for the larger package VSVA5601 among the two VP1802 package variants, as it offers better crosstalk performance due to the increased number of ground pins placed between MGTs.To further reduce crosstalk, the Tx and Rx links are allocated into separate MGT QUADs.Subsequently, MGT crosstalk simulations are conducted on the GCM prototype PCB layout.While Tx-Tx and Tx-Rx crosstalk meet the specifications, Rx-Rx crosstalk failed in some specific situation, as depicted in figure 10(left).This issue has been traced down to differential tracks passing closely by differential vias of another link.By swapping signal routing layers and implementing back drill, this condition can be avoided.Figure 10(right) shows the Rx-Rx crosstalk meeting the requirements after optimization.

Special test launch points
The validation of PCB simulation results is equally important in the domain of high-speed electronics.In this context, precise and accurate testing plays a crucial role in guaranteeing the performance and reliability of intricate circuit designs.Employing specialized test launch designs becomes instrumental in achieving controlled signal transitions, thereby minimizing the impact of launchinduced distortions, and facilitating accurate measurement of signal integrity parameters.The utilization of 2.4 mm precision connectors on the GCM prototype ensures high-quality connections to high-end oscilloscopes.The breakout pattern of this connector on the GCM is also subjected to simulation and optimization using a 3D solver, as illustrated in figure 11.This approach ensures that the test setup itself is optimized for accurate and meaningful high-speed signal measurements.

Summary
A new full-function GCM prototype has been meticulously designed and implemented for the ATLAS Phase-II Upgrade's new Global Trigger.This GCM prototype is a high-speed, high-power, and high-density ATCA front board.Throughout the design process, a systematic methodology has been employed that concurrently addresses signal integrity, power integrity, and thermal integrity.This project passed the ATLAS Preliminary Design Review in October 2023, and the first prototype board is being manufactured.

Figure 1 .
Figure 1.Global Trigger Architecture for ATLAS Phase-II TDAQ Upgrade.

Figure 6 .
Figure 6.Left: hotspot on lower current power rails on GCM Right: hotspot fixed.

Figure 9 .
Figure 9. Performance of a typical 25 Gbps link on the GCM prototype.Left: simulated TDR response (Tr = 20 ps) for the whole channel.Right: insertion loss of a whole channel on GCM (green) and minimum SDD21 recommended by industry standard OIF-CEI-04.0 for VSR channel (black).

Figure 11 .
Figure 11.GCM special test launch design.Left: sweeping simulation to choose the best impedance match.Right: insertion loss (green) and return loss (red) of the best launch design.

2 Full-function GCM hardware prototype 2.1 GCM prototype hardware implementation Figure
2. Full-function GCM prototype block diagram.