Development of radiation tolerant components for the Quench Protection System at CERN

This paper describes the results of irradiation campaigns with the high resolution Analog to Digital Converter (ADC) ADS1281. This ADC will be used as part of a revised quench detection circuit for the 600 A corrector magnets at the CERN Large Hadron Collider (LHC) . To verify the radiation tolerance of the ADC an irradiation campaign using a proton beam, applying doses up to 3,4 kGy was conducted. The resulting data and an analysis of the found failure modes is discussed in this paper. Several mitigation measures are described that allow to reduce the error rate to levels acceptable for operation as part of the LHC QPS.


Introduction
The powerful particle beams inside the Large-Hadron-Collider (LHC) at CERN are a strong source of ionizing radiation. Electronic systems in the accelerator tunnel are required to function properly between 0.5 to 10 Gy annual dose. The Quench Protection System (QPS) being located in low radiation areas is a critical system with a small error budget. Missed quenches can cause great damage to the machine, while premature beam dumps due to spurious triggers reduce the overall availability. Upgrades foreseen for the next few years will increase the luminosity of the LHC significantly, so that more radiation tolerant electronics need to be installed. For the redesign of one of the existing quench detection systems (QDS) a radiation tolerant high resolution Analog to Digital Converter (ADC) (24 bits) is required. This work describes the analysis of the radiation tolerance of a possible candidate, the ADS1281. The results of irradiation campaigns regarding this ADC are presented including failure modes found during testing. Furthermore the development and test of mitigation techniques, that allow the device to be fully used even when those failures occur, is described.

Superconducting magnet quench detection system
The QDS is one of the vital protection systems of the LHC. To protect the various types of super conducting magnets used in the LHC, all losses of superconductivity have to be detected as early as possible [1,2]. This is necessary to have enough time to extract the stored amount of energy from the magnet circuit before damage occurs and to dump the intense beam safely. For the 600 A corrector magnets the QPS is based on a comparison between measured and expected voltage drop -1 -over the magnet coil. The expected voltage is calculated from the measured current (I Dcct ), the derivatives of the measured current, voltage and the inverse parallel resistance to the coil. The expected value is then compared to the measured voltage (U Diff ). The signs in this equation are dependent on machine setup and convention.
If the difference between measured and expected value (U Res ) is greater than a threshold value for the evaluation time of t Eval = 20 ms this is counted as a possible quench. This leads to an activation of the circuit protection interlocks and in most cases to a dump of the accelerator beam. Figure 1 shows the current version of the QDS circuit board. It is equipped with two high resolution ADCs to measure voltage drop and current over the corrector magnets. Furthermore it contains an Field-Programmable-Gate-Array (FPGA) implementing the detection algorithm including the later described error mitigation techniques and handling data transmission and communication.

On-board field-programmable-gate-array
The FPGA used on the QDS circuit board is of the ProAsic3 type by Microsemi™. It is a flash based FPGA and therefore immune to the corruption of configuration due to radiation [3]. It also has been proven immune to cumulative damage up to a total ionization dose (TID) of 400 Gy [4,5]. The single event upset (SEU) immunity of the configuration file does not include the algorithm itself. To prevent errors due to SEUs all registers and all combinatorial blocks are automatically protected by Triple Modular Redundancy (TMR) implemented by the synthesis tool.

Analog to digital converter
A high resolution is necessary to provide the QDS detection algorithm (see 2.1) with a sufficiently smooth derivative. The ADS1281 [6] was chosen because it provides 24 bit resolution, and previous tests had already proven that it was immune to latch-ups. It consists out of a ∆Σ-Modulator, a programmable filter composed of a Sinc filter and four Finite Impulse Response (FIR) filters, a calibration block, and a serial interface for communication with the rest of the system (see. figure 2).

Irradiation campaigns
In the course of four irradiation campaigns the behavior of the ADS1281 under radiation was analyzed. Two campaigns tested the ADCs vulnerability to latch-ups. The first of the campaigns was conducted at the H4IRRAD facility, a mixed field test site at CERN [7]. The second was conducted using the proton beam at the Paul-Scherrer Institut (PSI) [8]. In the first test at H4IRRAD [9] the ADC was stimulated with a test signal. The output of the ∆Σ-modulator was monitored for Single Event Upsets (SEUs) in the bit stream, bypassing the digital filter block. Furthermore, the power consumption was measured to search for Single Event Latch-Ups (SEL). No SEUs or SELs were found during the measurement. The cross sections for both error modes could be determined as σ SEL < 8.06 · 10 −13 cm 2 /device and σ SEU < 1.31 · 10 −12 cm 2 /device.
The second irradiation campaign [10] further searched for SELs. Also the modulator output stream was monitored for breaks in the stream (Single Event Functional Interrupts SEFIs). No SELs were detected and the cross section for SEFIs was calculated as σ SEFI = 9.38 · 10 −13 cm 2 /device.
Due to these very low error cross sections the ADC was proposed as a component for a new quench detection system for the 600 A corrector circuits. Two further irradiation campaigns at PSI were conducted to analyze the radiation tolerance of the full ADC, including the digital filters. During both campaigns the ADCs were stimulated with simulated magnet signals or sine waves and the complete data stream of the ADCs was stored and analyzed for errors. The data stream was not extracted right after the modulator but after passing all blocks of the ADC. During evaluation of the stored data several new failure modes were found [12].

Observed errors
The first two new error modes found were single-sample and multi-sample disturbances in the output signal ( figure 3 and 4).
While it is not possible to prove how those errors occur due to unavailable knowledge about the inner workings of the ADC, the most likely reason is that they both stem from SEUs in data word bits. Such an SEU will create a δ-like pulse, which appears as a single-sample disturbance in the output signal. If this δ-like pulse is created inside the programmable filter block (figure 2) it is transformed by the FIR filter stages into a multi sample error.
This theory was validated by simulating the programmable filter stage using LabView™. It was not possible to simulate multi sample errors with very high amplitude but the theory shows good agreement with errors at lower amplitudes (see figure 5). The discrepancy between the simulation model and the measurement results can be explained by several unknown parameters of the FIR filter stages like the internal word size.
-3 -   Comparison between a simulated multi-sample error and a measured one. The red curve was created by replicating the ADC FIR filter stages using LabView™. This filter was then stimulated with δ-like pulses to produce this multi-sample error.
Furthermore an error mode was observed, where the ADC stops producing data for a certain amount of time (see figure 6). The duration of these stops varies but is mostly centered around the time the ADC needs to produce 64 samples (see table 1). While there were some aberrant stops with much longer duration, the length of most stops correlates with the duration of an internal restart of the ADC. Whether these restarts are caused -4 -  by SELs on the reset pin or by some error correction mechanism cannot be determined without knowledge about the internal functionality of the ADC. The last error mode discovered was the possibility of sudden changes in the gain and offset of the ADC output signal (see figure 7). This can be fairly easily explained by SEUs in the configuration memory of the calibration block (see figure 2). With the acquired data from the irradiation campaign it was possible to compute error rates for all discovered error modes. Since the critical areas in the component are unknown the 3 cross sections are given in relation to one component (see table 2). The 600 A quench detectors will be located in areas with an annual radiation dose of no more than 1 Gy (in fluence about 1.87 · 10 9 ) [11]. 200 quench detection boards, each equipped with two ADCs, will be used. This would add up to a total error rate for the full system of approximately one error every three days. This error rate is not acceptable in the light of the plan to increase the availability of the LHC over the next years. Therefore, in order to use the ADC as part of the QPS it is necessary to develop countermeasures to mitigate the discovered error modes.

Mitigation measures
To enable the ADC to work properly as part of the 600 A quench detectors several mitigation measures are necessary to clean the output signal from radiation induced errors. Single and multisample errors are distortions of the original signal. By smoothing the signal with an array of digital filters it is possible to decrease the amplitude of those errors sufficiently to pass below the thresholds -5 -

JINST 11 C01032
for detecting a quench. For constructing such a filter array a combination of decimating the signal, median filters and moving average filters was chosen. Resources on the FPGA are limited, and longer filters reduce the response speed of the system. Therefore a compromise between filter efficiency and length had to be found. Utilizing the full sampling frequency of the ADC of 4 kHz, allowed for a decimation factor of 4 without increasing the response time of the system too much. Multi-sample errors, which have a typical lengths of about 50 samples, shortened by a factor of 4 can be mostly eliminated by a median filter of length 16. Since moving-average filters only consume minimal space if their length is a multiple of 2 n , long filters can be used to smooth the signal.
A previous version of the filter chain in figure 8, with shorter filters, proved during an irradiation campaign to be able to cope with all but the most extreme multi-sample errors. Simulations also predict that the newest version of the filter will be able to cope with all single and multi-sample errors.
The ADC stops can be classified according to their length as critical or non-critical. Stops with a length of about 100 samples (25 ms with a sampling frequency of 4 kHz) or less are noncritical since they are smaller than the maximal response time of the whole system of about 100 ms. To prevent any spikes in the output signal during the stop and when the signal comes back, the derivative of the missing signal is kept constant. This has to continue until the ADC has resumed functionality and the buffer used for calculating the derivative is full of fresh samples. Stops that are longer than the response time can not be mitigated by this system. To prevent stops of this length both ADC signals are monitored for any occurring stops. Past radiation tests have shown that toggling the power-cycle pin is not always sufficient to interrupt a long ADC stop. A full power down of the device is necessary to restart the ADC.
Corruption in the configuration registers inside the ADC has shown to cause changes in the signal gain and can most likely also cause changes in the offset. Such errors are difficult to detect from the output signal and cannot easily be mitigated. Therefore this error mode has to be mitigated directly at the source, the configuration registers. The serial interface to the ADC is fast enough to allow for a complete readout of all configuration registers in the time between two data samples. If corruption is found the configuration registers are reprogrammed using the serial interface. This -6 -forces the ADC to restart causing a loss of about 16 ms at 4 kHz sampling frequency of data which is low enough to stay below the required system response time.

Conclusion
The ADS1281 has proven to withstand doses up to 600 Gy without damage. Several failure modes caused by SEEs were found but it is possible to mitigate those nearly completely. The device is therefore a viable component for use in the areas of the LHC with an annual dose of a few Gy. In case of usage in higher flux other options, like the usage of an oversampled lower resolution ADC or a different device topology, should be considered.