Correction of the baseline fluctuations in the GEM-based ALICE TPC

: To operate the ALICE Time Projection Chamber in continuous mode during the Run 3 and Run 4 data-taking periods of the Large Hadron Collider, the multi-wire proportional chamber-based readout was replaced with gas-electron multipliers. As expected, the detector performance is affected by the so-called common-mode effect, which leads to significant baseline fluctuations. A detailed study of the pulse shape with the new readout has revealed that it is also affected by ion tails. Since reconstruction and data compression are performed fully online, these effects must be corrected at the hardware level in the FPGA-based common readout units. The characteristics of the common-mode effect and of the ion tail, as well as the algorithms developed for their online correction, are described in this paper. The common-mode dependencies are studied using machine-learning techniques. Toy Monte Carlo simulations are performed to illustrate the importance of online corrections and to investigate the performance of the developed algorithms.


Introduction
Charged particles passing through the active volume of the Time Projection Chamber (TPC) ionize the gas along their path.The electrons generated in this process drift toward the end plates on which the readout chambers are mounted.By amplifying the signal in the readout chambers, the TPC provides a three-dimensional reconstruction of the charged-particle tracks.In the data-taking periods Run 1 (2009Run 1 ( -2013) ) and Run 2 (2015-2018) of the Large Hadron Collider (LHC), the readout chambers of the ALICE TPC [1,2] consisted of multiwire proportional chambers (MWPCs) [3].The ions generated during the amplification process in the MWPCs were blocked by a "gating grid", a series of wires located between the cathode wires and the drift volume.However, the time requirements of the operation with gating grid limited the maximum readout rate of the TPC to around 3 kHz.On the other hand, operation of the MWPC-based TPC without ion gating would lead to intolerably large space-charge distortions in the drift region.Therefore, in order to operate the TPC with the expected minimum bias Pb-Pb collision rate of 50 kHz in Run 3 and Run 4 (2022-2030) [4], all readout chambers were replaced by gas electron multipliers (GEMs) during the Long Shutdown 2 (LS2) (December 2018 -March 2022) of the CERN LHC, since stacks of GEMs have intrinsic ion blocking capabilities [5].To achieve the required gain while effectively suppressing the back-flow of ions, a stack of four GEM foils named from top to bottom (i.e. from drift to pad plane ): GEM1, GEM2, GEM3, and GEM4, was employed for all chambers [6].At the same time, due to the change from triggered to continuous operation, the front-end electronics (FEE) had to be replaced.A detailed description of the upgrade, from the design of the chambers and the new FEE to the pre-commissioning of the upgraded TPC, can be found in Ref. [7].
The typical semi-Gaussian pulse shape of a single readout channel is shown in Fig. 1.The signals from the charged particles are influenced by various effects, such as diffusion and track inclination angle.They also exhibit a characteristic undershoot, due to capacitive coupling across the induction gap between the pad plane and the GEM foils, and a long overshoot after the signal pulse caused by the slow movement of the ions in the induction gap.These two effects are referred to as "common-mode" (CM) effect and "ion tail" (IT), respectively.They are more prominent for high multiplicity environments and, if not accounted for, lead to significant deterioration of particle identification (PID) and tracking performance of the detector.Detailed studies conducted during Run 1 and Run 2, where similar baseline fluctuations were observed, emphasize the crucial need to understand, simulate, and correct for these effects to maintain the performance of the TPC [8].The resulting variations are exemplarily shown for one IROC on a pad-by-pad level in figure 54.From top to bottom, the distributions of C 0 , pulse width (std.dev.), and @ Tot are displayed.A specific pattern is seen in the C 0 and std.dev.distributions, which follows the layout of the SAMPA chip connection to the pad plane.The variations within a chip are rather small.Chip-by-chip variations reflect the expected production tolerances.Typical values are about 5 ns for the width (std.dev.) of the C 0 distribution and about 15 ns for the std.dev.distribution.

JINST 16 P03022
The @ Tot distribution also shows a distinct pattern.Firstly, the influence of the spacer cross (see section 3.1.2)can be observed around pad position 0 and pad-row position 32.In addition, a geometric pattern is seen for each of the four quadrants, with higher @ Tot values towards the respective center.The relative width (std.dev.) of the @ Tot distribution is about 12%.Since the pulser signal is induced by capacitive coupling, the distribution can be explained by mechanical variations of the 2 mm induction gap between GEM 4 and the pad plane.

Data taking with an x-ray generator
Data taken with an x-ray generator are used to study the ROC stability under high load, to determine the average ROC gain, and to study pad-wise gain variations.An Amptek Mini-X [66] x-ray generator with Ag anode was used to irradiate the upgraded TPC as part of the commissioning campaign.During the measurements, the generator is typically operated at 50 kV and a current of up to 80 µA. Figure 55 shows the position and orientation of the x-ray generator during data taking.X-rays from the generator mainly enter the active volume of the TPC through the central This paper is organized as follows; Section 2 describes the measurements obtained during the pre-commissioning phase during LS2.These data served as input for the initial analysis of the two effects.In Section 3, the dependencies of the common-mode effect are investigated using machine learning techniques.In Section 4, a detailed analysis of the ion-tail properties is performed.In Section 5, the two online correction algorithms as implemented at the hardware level in the FPGAbased common readout units (CRUs) are described, and a proof of principle is provided.Toy Monte Carlo (MC) simulations are performed to demonstrate the impact of the two effects on the TPC signals and to investigate the performance of the two online correction algorithms.
-2 -A quadruple GEM configuration with foils having different hole pitch was optimized for the ALICE TPC during an extensive R&D phase.The TPC volume is divided equally into two readout sides (A-side and C-side) by a central electrode.The TPC sectors are positioned on the endplates, each covering 20 • in azimuth.They are radially segmented into an inner and outer readout chambers (IROC and OROC, respectively).The OROC is further subdivided into three individual GEM stacks, therefore a TPC sector consists of a total of four GEM stacks, labeled IROC (stack 0), OROC 1 (stack 1), OROC 2 (stack 2), and OROC 3 (stack 3) (see Fig. 2).The pad signals are read out by the FEE mounted on the TPC end plates at a sampling rate of 5 MHz, which corresponds to a time bin duration of 200 ns [7,9].Custom-made ASICs, called "SAMPA" chips [10], are responsible for the signal amplification, shaping, and analog-to-digital conversion.The digitized data are then streamed to the CRUs, where the baseline subtraction, common-mode effect correction, ion-tail filtering, and zero suppression (ZS) are performed.The baseline configuration of the detector was defined after an extensive R&D phase.Several parameters were carefully optimized in order to minimize the ion backflow of the final system and improve the uniformity across the area of individual readout chambers (see section 3.2 for more -8 -Figure 2: Dimensions (mm) of a sector of the TPC with four GEM stacks.The spacer cross, a structure that ensures the mechanical stability of the GEM foils against electrostatic forces, is shown as two 1.5 mm wide and 2 mm deep bars in the longitudinal and transverse directions for each stack [7].

Laser data
To study the common-mode and ion-tail effects, the laser calibration system of the TPC was used [11,12].Laser data were collected for all sectors during the pre-commissioning of the TPC in the clean room located at the LHC Point 2 [7].The laser is a pulsed (5 ns pulse duration, 10 Hz repetition rate) Nd:YAG laser equipped with two frequency doublers, resulting in a final wavelength of 266 nm, which corresponds to an ionization energy of 4.66 eV.However, the ionization potential of the TPC gas components is much larger ( N ≈ 22 eV,  C 2 ≈ 14 eV,  N 2 ≈ 16 eV).Therefore, with two-photon processes [13], the laser ionizes organic impurities (approximately 1 ppm) in the gas with ionization potentials of 5 − 8 eV.Through a series of mirrors, beam splitters, prisms, and To demonstrate the two effects, Figure 2 shows the signal responses of all pads of one TPC row to the laser tracks of bundle 1.Three individual laser tracks create three charge clusters.The simultaneous common-mode signal is detected as an undershoot in the rest of the pads, called "non-signal" or "empty" pads.After the signal pulse, the long ion-tail is also clearly seen in the signal pads.micromirror bundles, 336 laser tracks simultaneously irradiate the TPC.In each TPC half, 6 wide laser beams illuminate 24 bundles of micromirrors located at four nearly equidistant positions along the LHC beam direction.For each bundle, 7 narrow reflected beams enter the TPC gas parallel to the end plates.Figure 3 shows the reconstructed laser tracks of the C-side bundle 0 (the bundle closest to the endplate), with the color scale indicating the laser ID.In these measurements, for each laser event (corresponding to one laser pulse), raw data with a length of about 500 time bins (one time bin corresponds to 200 ns) were streamed from the FEE to the CRUs, corresponding to the full electron drift time.The data taking was triggered at 10 Hz by the laser system.For each TPC sector, a sufficient amount of laser events (approximately 400-1200) were collected.To increase the signal-to-noise ratio, the signals were averaged over all available events in order to reach better precision of the shapes of the common-mode signal and of the ion tail.Figure 4 shows three laser clusters1 detected on one TPC pad row.The simultaneous common-mode signal is seen as an undershoot in the remaining pads, which are referred to as non-signal or empty pads.After the signal pulse, the long ion tail is also clearly visible in the signal pads.
1A cluster is defined as concentrated deposited charge detected within a search window of 3 bins in pad direction and 3 bins in time direction.
-4 -   Very large signals saturating the dynamic range of the SAMPA chip were discarded from the analysis, since their actual amplitude is not known.Moreover, very large charges fed into the input of the SAMPA chip may lead to loss of sensitivity of this channel for a short time.This can be seen as a saturation of the SAMPA response at a constant value of about 100 ADC, as shown in the right panel of Fig. 5 with red and brown solid lines.Note that this effect is an artifact of the electronics and therefore does not affect the neighboring pads.

Calibration pulser data
Measurements with a calibration pulser system were performed independently during the precommissioning phase in order to investigate the shaping characteristics of the FEE.These mea-surements involve injecting a pulse into the bottom electrode of the GEM4 foil (GEM4B), which induces a signal at the pad plane due to capacitive coupling.Ideally, the same charge should be measured in all pads of the same stack since they have the same dimensions.However, as shown in Fig. 6, significant variations are observed from pad to pad due to sagging of the GEM foil.The figure shows the normalized pulser charge, i.e., the charge normalized to the median charge in the stack, for an IROC.For pads positioned at the stack edges and under the spacer cross, the measured charge is significantly higher due to the presence of a dielectric material at these positions.Despite the spacer cross, the GEM4B bends slightly in the direction of the pad plane due to the stretching of the foil and electrostatic forces.This increases the capacitance and thus the pulser charge.As shown in Fig. 6, the relative change in the capacitance for a given pad reaches up to 50%.
xxx ALICE Collaboration

Calibration pulser data
Calibration pulser measurements were independently taken during the pre-commissioning phase in order to study the shaping characteristics of the FEE.In these measurements, a pulse is injected in the bottom electrode of the GEM4 foil (GEM4B), which induces a signal on the pad-plane due to capacitive coupling.Ideally, the same charge should be measured in all pads of the same stack since they have the same dimensions.However, as shown in Figure 3, where the normalized pulser charge, i.e. the pulser charge normalized to the median pulser charge in the stack, is plotted for an IROC, significant pad-to-pad variations are observed.For pads positioned in the stack edges and under the spacer cross (a structure to ensure mechanical stability of the GEM foils to the electrostatic forces), the measured charge is significantly higher due to a dielectric material placed in these positions.Despite the spacer cross, due to the foil stretching and the electrostatic forces, the GEM4 foil slightly bends towards the pad-plane, thus increasing the capacitance and therefore the pulser charge.The calibration pulser data are related to the common-mode effect since, in both cases, the signals are caused by capacitive coupling.The measured pulser charge is proportional to the pad capacitance, which is an essential variable in the common-mode effect analysis.Since both the pulser charge and the effective field between the GEM4B and the pad plane are inversely proportional to the GEM4B-pad plane distance (except for pads in the cross and edges), the pulser charge is also a relevant variable for the ion-tail studies.
3 Common-mode effect analysis

The common-mode effect
The common-mode effect is a result of the capacitive coupling between the GEM foils and the pad-plane.Since each of the 144 TPC stacks is powered by a separate high-voltage supply, the capacitive coupling extends over all pads within a given stack.When a signal, generated by the electrons, is detected in a pad, a capacitive signal of opposite polarity (also called undershoot) will simultaneously be detected in all pads of the same stack.The magnitude of the undershoot in each pad for a given timebin, Q CM pad (t), is proportional to the sum of the positive signal in the stack at the same timebin.This pad-dependent (and time-independent) proportionality factor will be referred to as common-mode fraction k CF,pad , defined as The measured pulser charge is proportional to the pad capacitance, which is an essential parameter in the analysis of the common-mode effect.Since both, the pulser charge and the effective field between the GEM4B and the pad plane, are inversely proportional to the distance between the GEM4B and the pad plane (except for pads in the spacer cross and at the edges), the pulser charge is also a relevant parameter for the ion-tail studies.
to as common-mode fraction (CF) factor,  CF,pad , and is defined as where is the average positive signal in the stack and  is the number of pads in the stack.The main objective of the common-mode effect analysis is to investigate the dependencies of the  CF,pad .
The laser signals usually span three time bins (see Fig. 4), so for the analysis of the effect, the common-mode charge and the average positive signal in the stack were summed over three time bins around the laser signal.Since the common-mode effect influences all pads in a given stack, the common-mode signals are also present in the laser signal pads, i.e., the true laser signal is slightly larger than the measured signal.This is taken into account in the analysis.

Dependencies of the common-mode effect
All possible dependencies of  CF,pad were explored using the Random Forest (RF) machine learning algorithm [14] as implemented in the ROOTInteractive framework [15].To estimate the commonmode signal, 33% of the non-signal pads were randomly selected as training data.Then, the selected sample was randomly subdivided into 200 estimators, and a decision tree with a depth of 12 was generated for each estimator.The dependencies of the  CF,pad on all available variables were tested.The importance of each variable is listed in Table 1.The normalized pulser charge,  norm pulser,pad (i.e., the pulser charge measured in a given pad normalized to the mean pulser charge in the stack), measured with the calibration pulser system and the stack type account for approximately 97% of the dependencies.The  norm pulser,pad is responsible for the pad-by-pad capacitance variations, while the stack type (IROC, OROC1, OROC2, OROC3) accounts for the absolute stack capacitance due to the different stack dimensions.Note that the amplitude of the measured laser signal is reduced due to the underlying common-mode signal.This is referred to as missing charge, which is responsible for the second-order effects mentioned in Table 1; the average positive signal in the stack and the fraction of signal pads in the stack.The  contribution of track-related properties such as the bundle (i.e.diffusion) and the beam (i.e.track inclination) was investigated, but no significant dependence was found.In Fig. 7, the  CF,pad data, the RF prediction, and the difference between the two for each pad are plotted for the training data, where very good agreement between the data and the prediction is observed.The peak value of  CF,pad moves towards larger (absolute) values as the stack area and hence capacitance increases (see Tab. 2).The spread within each stack results from the pad-by-pad capacitance variations, which can be seen as a linear dependence between  CF,pad and  norm pulser,pad in Fig. 8.The proportionality also holds for pads with much larger capacitance located at the chamber edges and crosses.By performing linear fits on the data points, the common-mode fraction of a pad can be expressed as

Variable
where  stack is the absolute value of the slope (0.44-0.58, depending on the stack type).
where k stack is the absolute value of the slope (0.44-0.58, depending on the stack type).

144
From the Machine Learning studies, the following conclusions can be made: the k CF,pad of a given pad 145 mainly depends on the (time-independent) pad to GEM capacitance (see Table 1).The pad to GEM ca-146 pacitance can be described by the normalized pulser charge as obtained from the calibration pulser runs 147 (which reflects the pad-by-pad capacitance variations within a given stack) and the stack type (which ac-148 counts for the absolute capacitance).The dependence on the pad to GEM capacitance is almost perfectly 149 linear (see Figure 5).A small bias in the common-mode fraction estimation in the laser data comes from The following conclusion can be drawn from the studies: The  CF,pad of a given pad depends mainly on the capacitance between the pad and the GEM stack (see Table 1), which can be described by the stack type and the normalized pulser charge, the latter reflecting the pad-by-pad capacitance variations within a given stack.Note that the missing charge has a negligible effect on the commonmode correction, as discussed above.However, it is accounted for in the full MC simulations that use a GEANT3 implementation of the TPC detector setup [16].

Ion-tail analysis 4.1 Ion tail in GEMs
The analysis of laser data for the common-mode effect revealed unanticipated signal tails.In Fig. 9 (left) the response of a single pad to a laser pulse is shown.On the right, the same response is zoomed in on the signal () axis.After the signal peak, a long tail is observed, caused by the ions generated in the amplification process, lasting about 16 s.In this example, the maximum of the tail is about 0.7% of the maximum of the electron signal, while the integral of the tail,  tail tot , corresponds to approximately 9% of the total electron signal,  signal tot .The shape and duration of the tail depend on the distance between the GEM4B and the pad plane, the distance of the signal from the center of gravity (COG) of the cluster along the pad direction, the track inclination, and the diffusion.Consequently, the above ratios are strongly influenced by these parameters.
A long ion tail as a negative undershoot was also observed during Run 1 and Run 2, where the TPC was based on the MWPC technology.The integral of the ion tail accounted for approximately 50% of the total signal, resulting in significant degradation of the detector performance, especially in the case of out-of-bunch pile-up events [8].Despite its positive nature and smaller amplitude, the ion tail still requires a correction during the Run 3 data-taking period, where, on average, five pile-up events are expected within a full drift time.
Simulations were performed to understand the origin of the ion tail observed in GEMs.For these simulations, only the last GEM foil (GEM4) was modeled, which is sufficient to a first approximation due to the shielding of the ions generated in the previous amplification stages by the GEM4 electrodes.The electric field maps were calculated using the finite element method shape and duration depend on the GEM4B to pad-plane distance, the distance of the pad from the COG (center of gravity) of the cluster, the track inclination, diffusion, etc.. Consequently, the aforementioned ratios are heavily affected by these parameters.As a comparison, the ion-tail integral in the MWPCbased TPC corresponded to ⇡ 50% of the total electron signal.Despite having a smaller magnitude, the ion-tail in the Run 3 setup still requires a correction.Simulations were conducted in order to understand the origin of the unforeseen ion-tail.For these simulations, only the last GEM foil (GEM4) was modeled, which to a first approximation suffices due to the shielding of ions created in the previous amplification stages by the GEM4 electrodes.The electric field maps were calculated using the finite elements method as implemented in ANSYS [7], the transport properties of the charge carriers were obtained by Magbolz [8] and their multiplication was simulated in Garfield++ [9].The simulations showed that the ions contributing to the tail are either created in the GEM4 holes or in the induction gap (region between GEM4B and the pad plane).The simulation results are summarized in Figure 7, where the "end-drift time", namely the time between the creation of each ion until its absorption, is plotted separately for the two aforementioned ion types.as implemented in ANSYS [17].The transport properties of the charge carriers were determined using Magboltz [18], and their multiplication was simulated in Garfield++ [19].The simulations show that two categories of ions contribute to the ion tail, which can be classified by their point of origin.They are created either in the GEM4 holes or in the induction gap, i.e., the region between GEM4B and the pad plane.While the first category can not be avoided and will be present in any GEM system, the latter is particular for the HV settings chosen for the ALICE TPC GEMs, where a high induction field (electric field between the GEM4B electrode and the pad plane,  ind ) plays an important role in minimising the ion backflow.A detailed description of the HV settings can be found in [7].
The simulation results are summarized in Fig. 10, where the time between the creation of each ion until it is collected at an electrode or leaving the amplification area (the end-drift time) is plotted separately for the two types of ions mentioned above.It can be seen that the number of ions generated in the amplification stage (top panel) is considerably larger than those generated in the induction gap (bottom panel).The former are referred to as the fast component of the ion tail due to their sharp distribution in small end-drift time values, while the latter are referred to as slow component due to their flat distribution.The distribution of the fast component does not depend on  ind , however, in the case of the slow component, the distribution becomes narrower with increasing  ind and acquires a slight slope.Since the probability of ionization depends on the induction field, the number of produced ions rapidly decreases with decreasing  ind .As seen in the bottom panel of Fig. 10, for the low  ind settings, the end-drift time distribution of the slow component ions is flat, indicating a uniform production of electron/ion pairs in the induction gap.On the other hand, for higher  ind and in particular for  ind > 4 kV/cm (see Fig. 2 of Ref [20]), the electron/ion pair creation probability is larger closer to the pad plane, due to avalanche effects.Note that the nominal induction field value,  100% ind , is set to 3.5 kV/cm.

Induction field dependence
A dedicated set of measurements was added to the pre-commissioning program to study the tail properties and disentangle the contributions of the two different ion types.In these measurements, 5000 laser events were collected for two TPC sectors.The value of the induction field was set to 50% of the nominal value and then increased in 5% steps, from 75% to 100%.The high number of laser events compared to the standard laser calibration runs (with 1000 events) was chosen to ensure a good signal-to-noise ratio, since the tail magnitude is relatively small.Figure 11 shows the average normalized laser signals of all central pads.The normalization is with respect to the maximum value of a given pad signal.It demonstrates that the magnitude of the tail decreases with decreasing induction value and the shape also varies depending on the induction field value.Apart from the magnitude, the shape of the tail is also different.In particular, the tail decays faster for lower values of the induction field.Note that, due to the steeply rising nature of the tail, as illustrated in Fig. 9, and the fluctuations in the laser signal positions, the bins near the peak exhibit greater fluctuations, leading to increased statistical errors.

Estimation of the contribution from the two categories of ions
Based on the simulation results, an estimate of the contributions from the two categories of ions described in Section 4.1 can be made.It can be assumed that for  50% ind = 1.75 kV/cm the contribution of the slow component is negligible with respect to higher field settings as see in Fig. 10.Since the fast component is practically independent of the induction field, the slow component contribution In Figure 8 the profile of the normalized signal for bundles 0-2, all beams (and therefore track inclina-  rmalized signal is the pad signal normalized to the charge under the laser peak.Since the peak of bun-3 is very close to the central-electrode signal, it was excluded from the data-set.It can be seen that magnitude of the tail decreases with decreasing induction field value.Apart from the magnitude, the pe of the tails is also different, and in particular the tail decays faster for the nominal setting, E 100% ind .for a given  ind value can be estimated as the difference between the value with this induction field and the one with  50% ind , i.e., one can write 2) The results of the above procedure are shown for all central pads of OROC3 in Fig. 12 and Fig. 13.In Fig. 12, the average normalized ion tail is shown for the  50% ind setting, which is dominated by the fast component.An exponential shape is observed.In Fig. 13, the normalized ion tail (left) and its difference from the  50% ind setting (right) are shown for different values of the induction field.The difference approximates the slow component.A nearly linear shape of the slow component is observed in the right column of Fig. 13, consistent with ions uniformly produced in the induction gap.

Estimation of ion-type contributions 199
An estimation of the two ion-type contributions defined in subsection 4.1 can be achieved by relying on the simulation results.It can therefore be assumed that at E 50% ind = 1.75 kV/cm, the contribution of the slow component is negligible.Since the fast component is practically independent of the induction field value, the slow component contribution for an E ind value can be estimated as the difference of the ion-tail between that induction field value and E 50% ind .Equivalently, The results of the above procedure are summarized for all central pads of OROC3 in Figure 9 and Fig-   The difference approximates the slow component.An almost linear shape of the slow component is 204 observed, which is in agreement with uniformly produced ions in the induction gap (see also Figure 7).following form: 3) The first term corresponds to the contribution from the GEM4 holes that is independent of the value of the induction field, while the second term corresponds to the contribution from the induction gap.By reducing the induction field value from 100% to 95%, the average ion-tail fraction is reduced by 10%.This indicates that the ion-tail fraction near the nominal setting is very sensitive to small fluctuations in the effective induction field.From the fit, one can also obtain  IT (0) ≈ 0.045 and  IT (50) ≈ 0.05, indicating that for  ind < 50% there is a residual contribution of about 10% from the slow component.This negligible residual is consistent with the assumption used in Eq. 4.2.Moreover, the data shown here are consistent with the observation of Fig. 13, where for the nominal induction field the contributions of the fast and slow components are almost equal,  IT (50)/  IT (100) ≈ 0.5.Figure 15 shows the dependence of the ion-tail fraction and the ion-tail slope on the normalized pulser charge.The pads located in edge regions or close to the spacer cross of each chamber were considered separately in the analysis, since the dielectric material placed in these regions influences the signal shapes.In the upper panels of the Figure, these pads (cross/edge) are shown in red, while the rest of the pads (bulk) are shown in blue.A linear correlation is observed between the ion-tail fraction and the normalized pulser charge for the bulk pads.This is explained by the fact that the pulser charge and the value of the effective induction field are inversely proportional to the distance between the pad and GEM4B, resulting in a wide range of the ion-tail fraction, from about 5 to 20%.The proportionality does not apply to the pads in the cross/edge region due to the additional dielectric material.In the bottom panels, the bulk region is shown for pads with a distance to the COG of the cluster (dCOG) of less than 0.8 cm and for the four different GEM stacks.
Figure 16 shows the dependence of the ion-tail fraction on the distance from the COG of the cluster for the bulk pads.For the mean, about a 20% difference is observed between the center of the cluster and the center of the neighboring pad.The large spread of the two distributions shown in Fig. 15, mainly due to the pad-to-GEM4B distance, does not allow an accurate pad-by-pad calibration of the ion-tail parameters.This is because the foil sagging not only affects the capacitive coupling, but also increases the effective induction field, which leads to a stronger amplification and thus to a change in the ion-tail shape.Moreover, only a small fraction of about 10% of the TPC pads "see" the laser signals.Since accurate knowledge of the parameters is important for the restoration of the baseline bias and its fluctuations, krypton calibration data were used to disentangle the ion-tail dependencies (see Section 5.1).

Common-mode and ion-tail corrections
Correcting the common-mode effect and ion tail online, before applying the ZS, is critical for maintaining the PID and tracking performance of the TPC and for limiting cluster losses.Moreover, the correction of the ion tail also helps with minimizing the data volume produced by the TPC.The multiplexed data streamed from the FEE are decoded in the CRUs, where subsequently the pedestal subtraction, common-mode correction, ion-tail correction, and ZS are performed.Due to the large number of pads, the data from each TPC stack are read out by either two (for OROC) or four (for IROC) CRUs.
With the current configuration, information cannot be exchanged between different CRUs.This additional CRU segmentation implies that calculation of the common-mode charge for a pad using Eq.3.1 cannot be applied.The calculation of the average positive signal in the stack  pos () stack would require combining information from different CRUs.Instead, a baseline estimation is performed using the empty (or non-signal) pads in the CRU, by using Eq.3.1 and Eq.3.3 [9]: where ⟨⟩ denotes averaging over empty pads in a given CRU.The algorithm consists of two parts: First, the empty pads are selected and the mean baseline is calculated for a given time bin.Second, the common-mode correction is applied to all pads, scaling accordingly using the normalized pulser charge.Since each pad has a different pulser charge, a static map must be provided to the CRUs.An advantage of this method is that the stack-dependent parameters (see Fig. 8) are not needed, since the correction is calculated at the CRU level.The efficiency of the algorithm depends on the correct selection of the empty pads on a time-bin basis.For this, apart from a simple threshold cut ( pad () ≤  • ⟨noise⟩), an additional check is performed by comparing the pad charge to that of a number of randomly selected pads in the CRU.This ensures that pads measuring the ion tail of earlier signals (superimposed with common-mode) are excluded.A pseudo-code for the common-mode correction is given in Appendix A.
Online correction of the ion tail is performed on a pad-by-pad basis prior to ZS in the CRUs.Since the polarity of the ion tail is positive, ZS does not result in missing charge and thus missing clusters.Consequently, the ion tail can be corrected online to first order and the remaining second-order deficiencies can be handled during track reconstruction, if necessary.An exponential correction of the form is quite difficult to implement directly in the CRU FPGAs.In the equation,  in () and  out () are the pad signals for the time bin  before and after the correction is applied, respectively.The sum runs over all previous signal peaks  for the given pad.The parameters   and   are the maximum and slope of the tail corresponding to the signal peak , respectively, while  i,max is the position of the peak maximum.The complexity of such a correction stems from two factors: the number of resources required and the time needed to perform the calculations.First, applying the above correction would mean that the entire peak history for each of the roughly 1600 pads read out by a Online correction of TPC baseline  • For each cluster, a value for Q max was generated using the distribution obtained by the LHC15o data (see Figure 17).Each cluster spreads over three pads and three timebins, following a gaussian distribution in the pad and time space.
• The Ion-tail and then the Common-mode were simulated.It is highlighted that the Common-mode was generated using Equation 7and Equation 3, and not Equation 8.For the slope, a value of 0.5 was used, namely k CF,pad = 0.5 • Q norm pulser,pad .The Ion-tail was simulated as a perfect exponential, and for each pad the parameters for the Ion-tail slope and fraction were used.
• The noise, the pedestal and the rounding were added.17  CRU would need to be stored.However, FPGAs are not typically designed to store large amounts of data.Second, while it is possible for the FPGAs to compute an exponential (e.g., using the CORDIC2 functions ), FPGAs would need to perform multiple exponential calculations in parallel.Based on the number of digital-signal processors available in the CRU FPGA, these calculations would introduce some latency.
To avoid the aforementioned complications, an exponential filter has been developed for online correction of the ion tail.The filter requires only simple mathematical operations.Note that this correction assumes a perfectly exponential ion tail, which is not entirely realistic (see Section 4.3).Moreover, the input parameters are tuned using the laser tracks (see Section 4.4), so different track topologies are not taken into account.Namely, residual biases, expected to be on the per-mill level, are inevitable.However, they may still have an impact on the tracking efficiency and PID performance.More details on the implementation of the exponential filter are given in Appendix B.

Toy MC simulations
To quantify the effects of common-mode and ion tail on the baseline, and to investigate the performance of the two online correction algorithms, toy MC simulations were performed.The stepwise procedure is as follows: • The ion tail was corrected using the algorithm described in Appendix B. The order in which the ion-tail and common-mode corrections are applied is important because the ion tail also produces small common-mode undershoots in the other pads.Figure 19 shows a simulated pad signal for an event with multiplicity corresponding to approximately 30% occupancy in TPC.Both the common-mode effect and the ion tail are included.The baseline is systematically shifted to negative values due to the common-mode effect.Next, a series of simulations were performed to optimize the correction parameters, including simulated noise.For each event, 500 random settings were simulated, allowing for the following options: • Common-mode simulation: ON or OFF.
• Common-mode simulation ON and correction of the common-mode effect either OFF, or using a mean correction, median correction, mean 2 nd iteration, or median 2 nd iteration.
• Ion-tail simulation: ON or OFF.
• Ion-tail simulation ON and ion-tail filter either OFF, or correction using the pad-by-pad ion-tail parameters (pad-by-pad), or correction using the median value of the parameter distributions shown in the bottom of Fig. 18 (fixed-to-median).
Figure 20 shows the results of the parameter scan.On the left, the average baseline shift and on the right its RMS are shown as a function of occupancy.A 99% least-trimmed-squares method was used to exclude some extreme outliers.For the top panels, only the common-mode effect was simulated.The different correction methods (no correction, mean, median, and 2 nd iterations) are shown with different colors.It can be seen that the average baseline bias can reach up to −5 ADC and its RMS up to 1.5 ADC if the common-mode effect is not corrected.Compared to the RMS in the absence of the two effects (caused by the noise), this corresponds to roughly a 60% increase.
For the bottom panels, only the ion tail was simulated.The average baseline bias reaches up to 1 ADC and its RMS 1.25 ADC, which corresponds to an increase of about 35% when the tail is not corrected.
In Fig. 21, the average baseline shift is shown as a function of occupancy for the pad-by-pad (left) and the fixed-to-median (right) corrections of the ion tail.As mentioned in Appendix B, a value of k0=100% overcorrects the data, which can be confirmed by the negative values of the baseline bias (red markers).It can be seen that the optimal value for the pad-by-pad method is k0 ≈ 85-90%, while for the fixed-to-median method it is k0 ≈ 90-95%.The baseline shift is slightly larger for the fixed-to-median method.
Figure 22 shows the average baseline shift for the different methods used in the common-mode (left) and ion-tail (right) corrections, where the occupancy is 28-32%, corresponding to the highest expected multiplicities in Run 3.All parameter settings were averaged.It can be seen that the common-mode correction can restore most of the baseline shift.Using the median 2 nd iteration method instead of the mean results in roughly a 60% smaller average baseline bias.Similarly, correction of the ion tails restores most of the baseline shift.Using the pad-by-pad method instead  82.9 600 Table 3: Impact of the ion-tail correction on the space saving for minimum-bias events with occupancy ≤30%.Noise was not included, and a threshold cut of 1.2 ADC was applied.Note the realistic order of implementation of the effects and the corresponding corrections.
track topologies and cluster shapes, diffusion, and other detector effects that have not been considered.

Conclusions
The signal response of the GEM-based ALICE TPC was studied in detail using the data collected with the laser calibration system.The dependencies of the common-mode effect were understood using machine learning techniques.It was found that the common-mode signal depends largely (96%) on the stack type and the normalized charge detected in runs with the dedicated calibration pulser system.The stack type accounts for the absolute capacitance of the stack, while the normalized pulser charge is responsible for the pad-by-pad capacitance variations.An unpredicted ion tail was observed in data recorded with the TPC laser system.The measurements have shown that ions from two categories contribute almost equally to the generated signal at the nominal induction field setting.The contribution of the ions generated in the GEM4 holes is practically independent of the value of the induction field and results in an exponentially shaped tail.Ions generated in the induction gap of the GEM stack result in an additional contribution, which depends on the electric field applied to the induction gap.The performance of the two online correction algorithms was studied in detail using a toy MC with input parameters determined in a data-driven way.The common-mode correction algorithm correlates all pads of a given CRU for each time bin, while the ion tail is corrected on a per-channel basis using an exponential filter.Both effects are efficiently corrected, despite a residual bias in the baseline, comparable to the noise.Since the ion-tail parameters used in the exponential filter are obtained from the laser tracks, different track topologies were not considered.Therefore, possible imperfections in the ion-tail correction are to be expected.These should be considered when repeating these studies using a full MC simulation.Furthermore, the ion tail has a significant impact on online data compression.During Run 3 and Run 4 data taking periods, the raw data readout rates are estimated to be about 3.5 TB/s.The toy MC results obtained in this study show that in the presence of the ion tail, the final data rate after baseline subtraction is about 890 GB/s instead of 650 GB/s, which is the estimated value given in the technical design report [6].The proposed ion-tail correction algorithm fully covers this increased data rate.The two correction algorithms will be commissioned using pp and Pb-Pb collisions at record energy and luminosities as part of the TPC readout system from 2023 onward.

B Ion-tail correction algorithm
The following pseudo-code for the ion-tail correction is tested in Section 5.1 using a toy MC simulation and implemented in the CRU firmware: // Constant to be set before float k0 ; // Multiplicative correction factor , same for all pads ( around 0.9) for each padID { The amount of correction for a given time bin is k1*(1-k2)*Q_correction, where Q_correction is the buffered charge value of the previous time bin and k1 (fraction of the ion-tail integral with respect to the total charge of the pad signal) and k2 (slope of the ion-tail assuming an exponential form) are the ion-tail parameters.Since these are pad-dependent parameters, they are provided as input to the CRUs in the form of two-dimensional, pad-by-pad maps.The pad-by-pad calibration of these parameters was obtained from data recorded with radioactive krypton isotopes in the gas volume [1].Alternatively, the median value of the distributions shown in Fig. 15 can be used.The impact of this simplification is demonstrated in Section 5.1.An additional scaling parameter, k0, whose value is the same for all pads, is introduced to account for the difference between the integral of the input charge with a continuous Gaussian form and the digitized signal, which has a discrete structure.The digitized signal is equal to or greater than the input, so the parameter k0 is defined as k0 ≤ 1.Note that the above filter can be applied continuously, even in the absence of signals.In this case, the filter has no effect on the measured charge.
In Fig. 25, the exponential filter is applied on the averaged laser signal of a pad.The ion-tail fraction and ion-tail slope parameters, as obtained from the fit, were used for the correction.It can be seen that a 100% correction (k0=100%) slightly overcorrects the data, which is due to the bias -25 -of the sampled charge.Figure 26 (top) shows the normalized data and the 100% correction for randomly selected pads of the nominal induction field setting, while the bottom panel shows the averaged data and the corresponding corrections.-26 -In Fig. 27 the simulation and correction of both effects are demonstrated.To show the effects more clearly, noise is not included.For the ion-tail correction, the ion-tail parameters of the respective pad were used, and a value of k0=80% was chosen.For the common-mode correction, a two-iteration median correction was applied with nPadsRandom=6, nPadsMin=4, Q_thr1=2, Q_thr2=2.

Figure 53 .
Figure 53.Typical electronics response of one pad to the pulser signal.

Figure 5 .
Figure 5. Dimensions (mm) of the ALICE TPC readout chambers.gap with the associated induction field ⇢ ind .Typical HV settings applied to the stacks are discussed in section 4.1.1.The baseline configuration of the detector was defined after an extensive R&D phase.Several parameters were carefully optimized in order to minimize the ion backflow of the final system and improve the uniformity across the area of individual readout chambers (see section 3.2 for more 35.As an example, in Figure1, the reconstructed laser tracks of the C-side bundle 0 (the bundle closest to the endplate) are shown.The color corresponds to the "laserID", a unique number assigned to each of the 336 laser tracks.

Figure 1 :
Figure 1: (Color Online) Reconstruction of projected tracks of C-side bundle 0. Bundle 0 is the bundle closest to the end-plate.With black, the stack edges are shown.The "laser ID" is a unique number assigned to each of the 336 laser tracks.

Figure 2 : 4 yFigure 3 :
Figure 2: Laser signals and induced common-mode signals in the pads of one TPC row.The signal height axis is zoomed-in (signal not to scale).The ion-tail is also visible for the signal pads.

Figure 1 :
Figure 1: (Color Online) Reconstruction of projected tracks of C-side bundle 0. Bundle 0 is the bundle closest to the end-plate.With black, the stack edges are shown.The "laser ID" is a unique number assigned to each of the 336 laser tracks.

Figure 2 :
Figure 2: Laser signals and induced common-mode signals in the pads of one TPC row.The signal height axis is zoomed-in (signal not to scale).The ion-tail is also visible for the signal pads.

4 SignalFigure 4 :
Figure 4: Laser signals and induced common-mode signals in the pads of a given pad row.The signal height axis is zoomed-in such that the signals are not to scale.The ion tail is also visible for the signal pads.

FIG. 3 .
FIG. 3. Left: total charge of saturated signals as a function of common-mode signal.Right: saturated laser signals [2].

Figure 5 :
Figure 5: Left: four laser tracks projected on to the local x-y plane.The dashed oval highlights saturated signals.Right: saturated laser signals of the cluster highlighted in the left panel.

Figure 3 :
Figure 3: Ratio of pulser charge to median pulser charge in the stack for IROC C00.

Figure 6 :
Figure 6: Ratio of pulser charge to median pulser charge in the stack for an IROC.

Figure 4 :
Figure 4: (Color Online) Common-mode fraction data (blue), random forest prediction (orange) and difference between the two (green), for IROC (top left), OROC1 (top right), OROC2 (bottom left) and OROC3 (bottom right).The data shown are the ones used for the random forest training.Note the logarithmic scale.

Figure 5 :
Figure 5: (Color Online) Common-mode fraction as a function of the normalized pulser charge for each stack, only for the data with a CF uncertainty of less than 5%.

Figure 8 :
Figure 8: (Color online) The common-mode fraction ( CF,pad ) as a function of normalized pulser charge ( norm pulser,pad ) for each stack, only for the data with uncertainty of  CF,pad less than 5%.

Figure 6 :
Figure 6: (a) Response of a pad to a laser signal of bundle 0. (b) Response of the same pad, zoomed in on the signal (y) axis.The undershoot observed before the signal pulse is the common-mode response due to signals in other pads of the same stack.Note that the time range is in µs.

8 SignalFigure 9 :
Figure 9: Left: Response of a pad to a laser signal.Right: The same signal zoomed in on the () axis.The undershoot observed before the signal pulse is the common-mode response due to signals in other pads of the same stack.

Figure 10 :
Figure 10: (Color online) Simulated drift times of ions until they reach an electrode for two categories of ions: those generated in the GEM4 holes (top) and in the induction gap (bottom).Different induction field values are shown as different colors, with the nominal value corresponding to 3.5 kV/cm.

198 Figure 8 :
Figure 8: (Color Online) Average normalized signal as a function of time for the different induction field value settings.Signals from laser bundles 0-2 and only central pads are used.The listed numbers in the legend correspond to percentages of E 100% ind .The y error-bars are plotted, but they are smaller than the marker size.

ure 8 :Figure 11 :
Figure 11: (Color online) Normalized signals as a function of time for the different induction field settings, averaged over all central pads (within 0.2 cm from cluster COG).Numbers listed in the legend correspond to percentages of  100% ind .The error bars (standard error of the mean) are smaller than the marker size. 200

ure 10 .
In Figure9, the (average) normalized ion-tail is shown for the E 50% ind setting, which is dominated 201 by the fast component.An exponential shape is observed.In Figure10the normalized ion-tail (left) as 202 well as its difference from the E 50% ind setting (right) are shown for different values of the induction field. 203

205Figure 9 :
Figure 9: Average normalized ion-tail for E 50% ind , which is assumed to describe the fast component independently of the E ind setting.Only data from central pads of OROC3 are shown.The error bars correspond to the RMS of the entries in each timebin.maybe somehow add titles for each plo or is it ok like this?

Figure 12 :
Figure 12: Average normalized ion tail for  50% ind , which is assumed to describe the fast component independent of the induction field setting.Only the data from the central pads of OROC3 are shown.The error bars correspond to the RMS of entries in each time bin.

Figure 10 : 11 |dCOG|Figure 13 :
Figure 10: Normalized ion-tail (left) and difference from the E 50% ind (right), for E 100% ind (top), E 85% ind (middle), E 75% ind (bottom).Only data from central pads of OROC3 are shown.The error bars correspond to the RMS of the entries in each timebin.The slow component of the ion-tail (right) is almost linear for all induction field settings, with its contribution decreasing with decreasing E ind .

Figure 14 :
Figure 14: Dependence of the (average) ion-tail fraction on the induction field value.The data points are fit with the function defined in Eq. 4.3.

Figure 15 :
Figure 15: (Color online) Ion-tail fraction (left) and ion-tail slope (right) as a function of  norm pulser,pad .In the top panels, the color indicates the pad position in the chamber, with blue representing the bulk and red the regions influenced by the presence of the spacer cross or edges.In the bottom panels, the bulk region for pads with |dCOG| ≤ 0.8 cm are shown, where the color scale indicates the stack number.

Figure 16 :
Figure 16: (Color online) Dependence of the (average) ion-tail fraction on the distance of the pad to the COG of the cluster for the bulk pads.Only pads with |dCOG| ≤ 0.8 cm are shown.The different quantiles are shown with different colors.

Figure 16 :
Figure 16: Parameter distributions used as input for the toy MC.Only IROC pads are included.Top left: Distribution of Q norm pulser,pad obtained from the independent pulser calibration run.Top right: noise distribution obtained from the pedestal/noise runs.Bottom left and bottom right: Ion-tail fraction and slope as obtained from the laser data.For simplicity, no correlations between the parameters were assumed.

Figure 17 :
Figure 17: (Color Online) Left: Distribution of p Q max as obtained from the LHC15o data (black).The two peaks at p Q max ⇡ 23 and p Q max ⇡ 31 are due to the saturated signals.The distribution was fitted with an exponential function (red) in the range (10, 24).The input distribution for the MC (blue) is constructed using the data distribution for p Q max < 20, and the fitted function for p Q max > 20.Right: Q max distribution, input for the toy MC.

Figure 17 :
Figure 17: Parameter distributions used as input for the toy MC simulation.Only IROC pads are included.Top left: distribution of normalized pulser charge from a pulser calibration run.Top right: noise distribution from a pedestal/noise run.Bottom left (right): ion-tail fraction (slope) obtained from the laser data.For simplicity, no correlations between parameters were assumed.

Figure 19 :
Figure 19: Top: pad signal for an event with about 30% occupancy, without noise.Bottom: pad signal zoomed in on the -axis.The red dashed line shows the ZS threshold.

Figure 20 :Figure 23 :
Figure 20: (Color online) Average baseline shift (left) and its RMS (right) as a function of occupancy.The simulation and correction of the "common-mode only" ("ion tail only") scenario is shown in the top (bottom) panel.The various marker colors corresponds to the different correction methods.

Figure 24 :
Figure24: (Color online) Fraction of pads used for the common-mode baseline estimation as a function of occupancy, for the mean (blue) and the mean 2 nd iteration (red) methods.Only the common-mode effect is simulated.

Figure 25 :
Figure 25: (Color online) Demonstration of the ion-tail filter applied to laser data averaged over many signals on a given pad.The laser data are shown in black, the fit function in red, and the corrected data in blue, green, and magenta, for different values of the parameter k0.The convolution of a Gaussian and an exponential function was used to fit the data.The ion-tail parameters obtained from the fit were used for the correction.

Figure 26 :
Figure26: (Color online) Top: normalized ion tail before (red) and after (blue) correction for randomly selected pads.For each pad, the fitted tail parameters were used for the correction with k0=100%.Bottom: averaged data with the corresponding corrections of 80%, 90% and 100%.The error bars correspond to the RMS of entries for each time bin.

Figure 27 :
Figure27: (Color online) Illustration of ion tail and common-mode correction of a simulated signal.The input signal is shown in black, the simulated effect (top: ion tail, middle: common-mode, bottom: ion tail and common-mode) in blue, and the corrected signal in green.Noise is not included in the pad signal for better visibility.

Table 1 :
Variable importance for  CF,pad , reflecting how many times the decision tree was divided because of that specific variable.

Table 2 :
Position of the maximum of the CF distributions in Fig.7for each stack type.