Hadronic Energy Resolution of a Combined High Granularity Scintillator Calorimeter System

This paper presents results obtained with the combined CALICE Scintillator Electromagnetic Calorimeter, Analogue Hadronic Calorimeter and Tail Catcher&Muon Tracker, three high granularity scintillator-SiPM calorimeter prototypes. The response of the system to pions with momenta between 4 GeV/c and 32 GeV/c is analysed, including the energy response, resolution, and longitudinal shower profiles. The results of a software compensation technique based on weighting according to hit energy are compared to those of a standard linear energy reconstruction. The results are compared to predictions of the GEANT4 physics lists QGSP_BERT_HP and FTFP_BERT_HP.


Introduction
Experiments at future e + e − colliders require unprecedented jet energy resolutions of 3 % to 4 % across the full expected jet energy range up to hundreds of GeV [1,2]. One concept to achieve such resolutions are Particle Flow Algorithms (PFAs) which aim to combine reconstructed tracks from the tracking system with calorimeter deposits, requiring exceptionally granular calorimeters [3][4][5]. The CALICE collaboration develops, builds and tests different calorimeters that aim to fulfil the requirements for the optimal application of PFAs. One concept consists of scintillator tiles or strips of a few cm 2 size, individually read out by a Silicon Photomultiplier (SiPM). Several such prototypes with different absorbers, granularities and sampling structures have been constructed.
In a common beam test at Fermi National Accelerator Laboratory (FNAL) in 2009 the Scintillator Electromagnetic Calorimeter (ScECAL) [6], Analogue Hadronic Calorimeter (AHCAL) [7] and Tail Catcher & Muon Tracker (TCMT) [8] comprised a combined scintillator-SiPM calorimeter system exposed to beams of muons, electrons and pions in the momentum range 1 GeV/c to 32 GeV/c. Studying the characteristics of single pion events is of special interest as significant fractions of the full particle energy are typically deposited in each calorimeter subsystem. The single pion energy resolution of a calorimeter system is expected to be a significant contribution to its performance in both PFA schemes and classic jet energy reconstructions. As well as PFA reconstruction, the high granularity of these calorimeters, on the scale of electromagnetic shower development (≈1 radiation length X 0 longitudinal sampling, ≈1 Molière radius r M transverse segmentation), should enable the statistical identification of electromagnetic subshowers within hadronic showers. This allows the application of software compensation (SC) techniques to improve the energy resolution and response linearity of the calorimeter system. This has been demonstrated in the CDHS calorimeter [9] and was successfully used in collider experiments such as H1 [10] and ATLAS [11]. The concept has already been applied to data taken with the CALICE AHCAL and was shown to significantly improve its single pion energy resolution [12]. This study extends the software compensation scheme to a combined calorimeter system.

Testbeam setup and simulation model
The datasets used in this note were acquired during a testbeam campaign in May 2009 at the Fermilab Testbeam Facility (FTBF). The MTest beamline delivers secondary particle beams with particle momenta in the range 1 GeV/c to 32 GeV/c. The beam line is equipped with a two channel differential gas Cherenkov counter as well as various scintillator triggers. The calorimeter prototypes are installed in the order ScECAL, AHCAL, TCMT downstream of the beam instrumentation. More detailed descriptions of the beamline setup are available in Refs. [6,13].
Each ScECAL layer consists of 3.5 mm tungsten-based absorber and 72 scintillator strips of 10 mm × 45 mm × 3 mm each for a total area of 18 cm × 18 cm. Thirty layers are stacked, giving a total depth of around 20 X 0 (≈0.9 nuclear interaction lengths λ n ) and 2160 readout channels. The SiPMs used in the ScECAL are Hamamatsu MPPC-11-025M models with 1600 pixels on an active area of 1 mm × 1 mm [6]. The AHCAL is a ≈1 m 3 hadron calorimeter prototype with steel absorber plates, instrumented with 7608 5 mm thick scintillator tiles with area between 3 cm × 3 cm to 12 cm × 12 cm in 38 layers, giving a full depth of 5.3 λ n [7]. The TCMT covers 1 m × 1 m with steel absorber plates and a total of 320 scintillator strips of 5 cm × 100 cm each in 16 layers, instrumenting a depth of 5.2 λ n [8]. The AHCAL and TCMT use the same type of SiPMs produced by MEPhI/PULSAR with 1156 pixels in an active area of 1.1 mm × 1.1 mm.
Details of the simulation models of the ScECAL, AHCAL and TCMT are discussed in Refs. [6,14,15]. The right-handed coordinate system is laid out such that the Z-axis is pointing in the beam direction and the Y-axis is pointing upwards. The beam instrumentation present in the MTest beamline is not included in the simulation model apart from the main trigger scintillators. Extra material upstream of the calorimeters associated with about 0.1 X 0 from beam instrumentation is simulated by adding an Al plate in front of the ScECAL [13,14]. The beam momentum spread is set to 2.7 % for samples with beam momentum ≤4 GeV/c and 2.3 % for samples with higher beam momenta [6,16]. The beam profile is extracted from the calorimeter data and transferred into the simulation individually for each sample, as described in Ref. [13].
The sensor digitisation implemented into the simulation accurately models the statistical effects of the scintillator-SiPM readout, using a computationally efficient procedure. Where available, individual sensor parameters extracted from data or separate lab measurements are used. To include the effects of sensor noise into the simulation, noise samples extracted from random trigger events taken during each data run are overlaid onto their respective simulated samples [13]. After applying the digitisation procedure to the simulated events, the identical software chain is used for the reconstruction, selection and analysis for real and simulated data.
While sensor effects only have a minimal impact on the calorimetric energy resolution, they significantly widen the response to single MIP-like particles traversing individual cells, which is used to calibrate the whole detector to the MIP scale. The shape of single MIP spectra from an individual ScECAL cell shown in Figure 1 (left) is well reproduced by the simulation, indicating that sensor effects are properly modelled by the digitisation procedures. The most probable value (MPV) of each MIP spectrum is extracted for each individual detector cell from fitting these MIP spectra with the convolution of a Landau function and a Gaussian. The distribution of ScECAL MIP MPVs from data and simulated samples are shown in Figure 1 (right). The data sample shows a mean close to unity, demonstrating a MIP calibration accuracy at the few percent regime. The data distribution is only slightly wider than the simulated sample, indicating that the calibration quality is close to the systematic limits.  Further comparisons to data show a typically very good description of electromagnetic shower variables in the ScECAL simulation, with only a slight underestimation of the effective shower radius in simulation, and a resulting slight overestimation of electromagnetic hit energy densities [6,13]. The AHCAL and TCMT simulation models have been well validated in previous studies [12]. G 4 10.1p2 was used to simulate all samples used in this analysis, using the QGSP_BERT_HP and FTFP_BERT_HP physics list, two well validated and actively maintained models also currently used by the LHC experiments [17,18]. In order to estimate selection efficiencies and biases, samples of 100 000 electron, pion, proton and muon events were simulated for each data run used in this analysis.

Run and event selection
During the FNAL testbeam period in 2009, π − -runs were taken with beam momenta ranging from 2 GeV/c to 32 GeV/c. The 2 GeV/c momentum point is omitted in this analysis due to a large admixture of electrons and multi-particle events, a very wide beam profile as well as inefficient and imprecise determination of the layer of first hadronic interaction, leading to an inefficient and impure selection of single pion events for further analysis [13]. This analysis uses one data run per available beam momentum of 4, 12, 15, 20 and 32 GeV/c. Each run contains between 180 000 to 250 000 recorded events. All used data runs were recorded within the same beam period without major changes to the system setup between runs.
As the MTest beamline at FTBF does not offer a direct selection of the particle type apart from its charge, the delivered particle beam is a mixture of mostly electrons, pions and muons in varying fractions depending on the beam momentum. Especially at lower beam momenta there is also a large fraction of events with multiple particles hitting the calorimeters in the same event. The goal of the selection described in the following is to efficiently retain events containing a single contained pion shower in the detector system without biasing the composition of the selected pion showers (e.g. the fraction of the shower energy deposited in electromagnetic subshowers). All applied event selection criteria as well as the reconstruction algorithms and their efficiencies are discussed in detail in Ref. [13].
Raw A preselection requires signals in the primary beam triggers in order to exclude pedestal and gain calibration events recorded during beam runs from the analysis. Likewise, not all simulated initial particles pass through the beam trigger as the beam profiles for the lower beam momentum samples are slightly wider than the used scintillator counter.

Event quality
The first selection step evaluates readouts from the external beam instrumentation, differential Cerenkov counters, the multi-particle counter and other trigger scintillators [6,13]. Events that do not show summed energy deposits of at least 3.5 MIP in the first five ScECAL layers are excluded.

Pion selection
The pion selection is supposed to suppress electron and muon events in the sample. Single muons and punch-through pions are rejected based on the number of hits and their centre of gravity along the beam axis. Electrons are suppressed by reconstructing the layer of first hadronic interaction (FHI layer). The FHI layer is the layer with the first large deposit of energy in the detector, which is taken as an indication of a hadronic interaction of the pion with the detector. The FHI layer reconstruction in the combined system is based on the AHCAL Primary Track Finder algorithm [19], using optimised thresholds to seamlessly extend into the ScECAL [13]. The distribution of reconstructed FHI layers in data and simulation is discussed further in Section 6.1. The reconstructed FHI layer is required to be no earlier than the fifth layer of the ScECAL, which also removes events that have started showering upstream of the calorimeters.
Multi particle suppression Due to the high granularity of the calorimeter system, it is possible to reconstruct the primary MIP-like track a pion leaves in the detector before its first hadronic interaction. Events with multiple beam particles in the detector are suppressed by requiring exactly one such isolated primary track in the event. To suppress events with additional muons entering the AHCAL outside of the coverage of the ScECAL, all events with tracks longer than five layers parallel to the beam axis in the outer parts of the AHCAL, reconstructed with the track segment finder algorithm described in Ref. [20], are also rejected.
Shower start Finally, showers are selected for their reconstructed shower start position. Events must have the primary pion track within the central 10 cm × 10 cm of the ScECAL and the FHI layer must be reconstructed no later than the fifth AHCAL layer.
Examples of the full step-by-step event selection in data and simulation are shown for two beam momenta in Figure 2. The event selection suppresses the clearly visible multi-particle peaks in the data distributions without visibly biasing the spectrum of the simulated samples. Especially for the higher beam momentum samples, some entries clearly above the main reconstructed energy peak remain. The efficiencies and biases of the event selection as well as the potential influence of the remaining data sample impurities are discussed in the following sections.

Event selection efficiencies
Selection efficiencies and biases are studied using simulated event samples for different physics lists and particle types. Table 1 shows that 45 % to 50 % of simulated single pion events pass the dull pion event selection in both examined physics lists. Most of the excluded pion events are rejected by the FHI layer requirements. Applying the pion event selection on simulated samples of electrons shows that the combination of FHI layer and isolated primary track cuts provides an excellent electron suppression of 99.9% with a slightly worse efficiency for 4 GeV/c beams.

Remaining sample impurities
To estimate the remaining fraction of electrons in data events passing the pion selection, the distribution of the energy fraction reconstructed in the ScECAL f = E ScECAL rec E rec , using the standard energy reconstruction as discussed in Section 4, is fitted with a linear combination of templates extracted from simulated electron and pion events. Electron showers are typically fully contained in the ScECAL and thus result in values of f close to unity. Pion showers can deposit variable fractions of energy in all parts of the detector systems and can thus generate values of f in the full range of 0 to 1. Figure 3 shows the result of this template fit applied to the data sample taken at 4 GeV/c beam momentum. The fit result varies between 1.70 % and 1.83 % depending on the used fit range. An electron contamination of 1.8 % is thus assumed for the 4 GeV/c data. The electron contamination of the other samples used in this analysis is estimated to be negligible with 0.2 % at 12 GeV/c and <0.1 % for beam momenta ≥15 GeV/c. All these electron contaminations are to be interpreted as upper limits on the single electron contamination fraction, as other possible beam contaminations are not accounted for in the template fit.
The proton contamination in the data samples used for this analysis is expected to be minimal, as exclusively negative charge pion runs were recorded. Thus, only anti-protons could possibly contaminate the beam, which are produced very rarely. The anti-proton fraction of the used data samples was estimated from comparing the fitted slopes of the reconstructed FHI layer spectra (see Section 6.1) from data to simulated proton and pion events. The reconstructed FHI layer spectrum is sensitive to the proton beam content since the hadronic interaction length of protons and pions in steel differs by about 20 %. The fit results support no significant anti-proton contamination, as expected [13].
A general estimate of the remaining fraction of contaminated data events after the pion selection is obtained from the distribution of reconstructed energies in data as shown for the 32 GeV/c sample in Figure 2. Even after applying the full event selection, a number of entries clearly above the main reconstructed energy peak remain visible in the distribution. Assuming a simple toy model as shown in Figure 3, consisting of two Gaussian curves modelling the main signal and contamination contributions respectively, the fraction of contaminated events can be conservatively estimated to be ≤2 % to ≤3 % depending on the assumed mean contamination energy for all data samples with beam momenta ≥12 GeV/c. As the reconstructed energy spectrum of the 4 GeV sample is asymmetric with a tail to higher energies (see Figure 2), the fraction of contamination events is well hidden below the signal and cannot be estimated in this way at this beam energy.
It is known that the MTest beamline delivers a significant fraction of events with more than one beam particle, which are not perfectly suppressed in this analysis. The remaining contaminations are unlikely to be events containing two pions or one pion and one electron which both carry the full beam momentum each, as such events would form a peak around twice the reconstructed energy, which is not seen. The distribution of reconstructed energies rather shows a continuous contamination tail towards higher reconstructed energies. This could be caused by additional contaminating particles with energy lower than that of the beam, e.g. particles which punch through the final beam collimator. The observed behaviour cannot be explained by additional muons in the beam halo, as such muons would deposit far less energy than required to explain the observed contamination. The visible tail towards lower reconstructed energies is part of the signal, caused by pion showers that are not fully contained in the calorimeter system.

Pion energy reconstruction
The first step of the energy reconstruction consists of converting the raw measured ADC inputs of each channel in a given event to a common energy scale per subdetector, calibrated with respect to the most probable deposited energy of a MIP-like particle crossing a single detector scintillator cell. This calibration includes the correction of saturation effects in the sensors, temperature corrections, correction of electronics effects (etc.) with calibration data obtained in lab tests, dedicated calibration runs and in situ methods with down to channel-wise granularity [6][7][8].
The incident particle energy can be reconstructed from these calibrated hit amplitudes by summing all energy deposits D weighted by any number of measured event quantities and external parameters X. In the simplest case, they are directly weighted by their inverse sampling fraction. For any chosen energy reconstruction algorithm, external parameters can be optimised by minimising the sum of quadratic distances of the reconstructed event energy to the known beam energy, resembling a χ 2 function: In this formalism, parameters can be estimated for multiple samples at multiple beam momenta in a single optimisation. In the energy range considered here, the variance of the reconstructed energy spectrum is expected to scale with the beam energy according to a 1 / √ E stochastic term in the calorimetric energy resolution dependence. To normalise contributions to χ 2 between different beam momenta, each event is thus weighted with the inverse of its known beam energy. The constant factor of 55 % in the denominator is approximating the expected stochastic term of the calorimeter to enable the correct parameter uncertainty estimation by the optimisation algorithm, but does not influence the estimated parameter values. In order not to bias the parameter estimation towards a specific beam energy, the same number of events are added to the χ 2 for each beam momentum. The first 40,000 selected events of each data run (20,000 in simulated samples) are used in the parameter optimisation. No significant difference in reconstructed energy resolution is observed whether the events used for parameter optimisation are excluded in the reconstruction or not for both energy reconstruction schemes discussed here.

Standard weighting
In an idealised sampling calorimeter the reconstructed energy for an incoming particle is directly proportional to the measured energy deposits. For a calorimeter system combining different sampling ratios the reconstructed energy is the sum of all hit energies weighted by a constant factor for each calorimeter. As the sampling fraction within the ScECAL is constant and the AHCAL and first eight layers of the TCMT have identical sampling fractions, the standard energy reconstruction only has two parameters w ECAL and w HCAL : The χ 2 -optimisation described above inherently assumes Gaussian distributions. The reconstructed energy spectra obtained from the used calorimeter setup can exhibit non-Gaussian features from fluctuations in the electromagnetic fraction of pion showers, leakage effects and sample impurities remaining in data. The optimisation of weights is thus performed iteratively on the central 90% of reconstructed energies. Figure 4 lists the weights optimised for each beam momentum individually as well as all momenta at once. Variations of the weight with the beam momentum are rather small, with only the lowest beam momentum point at 4 GeV/c preferring slightly different weights. The two used physics lists produce very similar weights. The weights obtained from data for beam momenta ≥12 GeV/c have slightly (≈5 %) higher values than those obtained from simulations, hinting at a general overestimation of energy deposits in simulations, as is also observed in the longitudinal shower profile as shown in Section 6.1.
The reconstructed resolution and linearity only depend on the ratio of weights, which is very similar between data and simulations. Using the standard reconstruction weights obtained from simulation to reconstruct data events (or vice versa) would thus not notably influence the energy resolution, but only result in a shifted energy scale.

Software compensation
The measured response of an undercompensating calorimeter to a hadron shower is typically smaller than the response to an electromagnetic shower of the same initial particle energy. Within hadronic showers, energy can be lost to invisible processes such as recoil, excitation and fragmentation of absorber nuclei and, depending on the active material, neutron emission, leading to a lower response compared to purely electromagnetic showers. Additionally, inelastic interactions in hadron showers can develop purely electromagnetic subshowers from π 0 /η production and their subsequent decay to two photons, yielding the full electromagnetic response for the two photons. [21] Beam Momentum [GeV/c]  Both the loss of measurable energy to invisible mechanisms and the creation of π 0 /η are stochastic processes and thus subject to statistical fluctuations from event to event, leading to large fluctuations in measured energy deposits and degraded energy resolution. Furthermore, the number of generated π 0 /η depends on the number of inelastic interactions within a hadron shower which scales with initial particle energy, degrading the linearity of the calorimeter response. If it were possible to identify electromagnetic subshower contributions within hadron showers, reweighting them to the purely hadronic scale should lead to an improvement in both the resolution and linearity of the energy measurement.
With the materials used in this setup, the length scales of hadronic and electromagnetic showers are notably different ( X 0/layer λ π/layer in both ScECAL and AHCAL). In combination with the high transverse granularity similar to the Molière radius of electromagnetic showers, this enables the discrimination of electromagnetic and hadronic shower components by means of their deposited energy density. Each measured cell deposit is thus weighted as a function w(ρ, E est ) of the local deposition density ρ and an estimate of the full shower energy E est .
Instead of a continuous two-dimensional parametrisation of w(ρ, E est ), the full range of ρ is divided into fixed bins, while the dependence on E est is parametrised over the used energy range for each such bin. This scheme differs from the local software compensation implementation in [12], in which w is iteratively parametrised as a continuous function of both ρ and E est . The scheme presented here leads to more free parameters and thus more degrees of freedom but maintains a stable convergence of the optimisation.
This analysis uses eight bins in deposition density, individually chosen to be logarithmically distributed in the range of occuring depositions for both the ScECAL and the AHCAL, respectively. The obtained results do not critically depend on the number of bins or exact bin boundaries. For the two lowest deposition density bins, instead of summing up hit energies, only the number of hits falling into these bins are counted to suppress Landau fluctuations from low particle multiplicity hits (similar to the reconstruction scheme applied in the CALICE SDHCAL [22]), slightly improving the resolution of the algorithm. Instead of using the deposition density as the hit amplitude divided by the cell size, the hit energy, the energy deposition of each individual scintillator cell measured in photo-electrons and then converted to the MIP scale, is used directly. This is disregarding the differently sized tiles in the AHCAL, which slightly improves the performance of the full algorithm. All TCMT energy deposits are treated as falling into the same hit energy bin, effectively parametrising the relative TCMT weight as a function of only the first energy estimate.
The lowest hit energy bin has significant contributions from the primary pion track before the first hadronic interaction, which show nearly no dependence on beam momentum. To avoid biasing of the parameter optimisation towards weighting up the primary track hits to the full beam energy, hits on the primary track are excluded from the software compensation weighting. All hits on the axis of the reconstructed isolated primary track (as described in Section 3) from the first ScECAL layer up to two layers before the reconstructed FHI layer are included into the energy reconstruction without hit energy or shower energy dependent weighting. To exclude Landau fluctuations, only the number of such hits is used and multiplied by the mean energy deposit of a MIP-like particle in a single cell of the given calorimeter section.
An example of the distribution of hits into hit energy bins is given in Figure 5. In the lowest hit energy bin in the ScECAL, around one quarter of all contributions would originate from the primary track if not identified and excluded. The contribution of primary track hits to the AHCAL hit energy spectrum is small, as around 70% of selected events start showering in the ScECAL. In this analysis the weights, α i , β i , for the ith hit energy bin as well as the TCMT weight γ are parametrised as second order Chebyshev polynomials of the estimated particle energy E est . The full formula to reconstruct the energy in the combined system, with the sampling weights w used from the standard energy reconstruction, the software compensation weights α i , β i , γ, the sum (or count) of energy deposits in the ith hit energy bin E i and the energy deposits on the primary track E track , is: The software compensation reconstruction is defined by a total of 51 parameters (8 bins in the ScECAL × 3 parameters per bin, 8 × 3 parameters for the AHCAL and 3 parameters for the TCMT).
The parameter values are obtained by minimising the χ 2 function described in Equation 4.1, using all available beam momentum samples in one common optimisation procedure. During the parameter optimisation, the known beam energy is used for E est , while during reconstruction the standard reconstruction result is used as an estimate. Figure 6 shows the polynomial functions obtained for the energy dependence of the bin weights for ScECAL and AHCAL resulting from the parameter optimisation. The slopes in the first two bins of ScECAL and AHCAL in Figure 6 (bottom) correspond to a 1 /E dependence and thus a constant contribution to the reconstructed energy of each hit in these bins, regardless of the hit energy. Assuming a shower of E est = 4 GeV, a hit in the AHCAL with a measured hit energy E hit = 1 MIP would be weighted with a factor of around 1.5 (as given by the yellow lines in Figure 6, bottom right) for a contribution to the reconstructed shower energy of 1.5 × 1 MIP = 1.5 MIP. A hit of measured energy E hit = 0.5 MIP would be weighted with the doubled weight, due to the 1 /E dependence of the first two hit energy bins, for the identical contribution of 3 × 0.5 MIP = 1.5 MIP to the reconstructed shower energy. In hit energy bins ≥ 3, two hits of different hit energy within the same hit energy bin would contribute to the reconstructed shower energy proportionally to their hit energy.
Higher hit energy bins tend to be weighted below unity, indicating that a high energy hit is more likely to belong to an electromagnetic subshower. Especially in the ScECAL, bin weights do not monotonically decrease for increasing hit energies, as would be enforced in the local software compensation scheme used in [12]. However, the hit energy range which is assigned the lowest reconstruction weight increases with energy, indicating that the typical hit energy scale for electromagnetic subshowers increases with the incident pion energy.
Applying the weights shown in Figure 6 to the dataset yields an improved energy resolution as shown in Figure 7 (left) and Figure 17. Iterative applications of the software compensation reconstruction using the result of the previous iteration as E est do not further improve the energy resolution. The correlation between standard and software compensation reconstruction in Figure 7 (right) shows a clear non-linearity in the central part of the reconstructed energies, suggesting that events with a high hadronic fraction, and thus lower standard reconstructed energy, are shifted up in the software compensation reconstruction. Likewise, events with above average electromagnetic shower content, and thus too high standard reconstructed energy, are shifted down when reconstructed with the software compensation reconstruction.
The identical procedure of reconstructing energies and optimising the weight parameters is applied to simulated events. Individual bin weights as a function of estimated particle energy for data and simulation are shown in Figure 8 for selected hit energy bins. The AHCAL shows reasonable agreement between weights derived from data and simulations in all hit energy bins. In the ScECAL  discrepancies are seen especially in the two first hit energy bins and the highest hit energy bin, which also shows discrepancies between the used simulation physics lists. In most hit energy bins, both used simulation physics lists agree with each other within their parameter uncertainties. The TCMT weight also has a large discrepancy between data and simulations, although mostly for low beam momenta in which TCMT energy deposits are expected to be minimal. The averaged summed energy deposit per event for each bin is investigated in order to better understand the observed differences between weights derived from data and simulated events, as shown in Figure 9. The highest hit energy bin in the ScECAL has around twice the mean energy deposit in the simulation compared to data, with significant differences evident from the sixth bin. This misdescription of the hit energy spectra especially for very high energy hits has also been observed in electromagnetic showers [6]. For lower beam momenta, the ScECAL hit energy range around 2 MIP to 8 MIP shows some underestimations in the hit energy spectra. In the AHCAL all bins show reasonable agreement between data and simulation, with overestimations up to 50 % in the highest hit energy bin at the highest particle energy.
On their own, the observed differences in the mean hit energy per bin (and thus hit energy spectra) do not sufficiently explain the differences in the optimised software compensation weights. Although the highest ScECAL hit energy bin shows discrepancies between data and simulated events for all beam momenta, its bin weight is only different in the central part of the beam momenta. Likewise, the mean energy sum in the first ScECAL hit energy bin is well described for all energies, but the corresponding bin weights are not.
Shower simulation modelling effects could affect the optimised weights even in hit energy ranges where the observed hit energy spectra match reasonably well between data and simulation, as bin weights are necessarily anti-correlated to conserve the mean reconstructed energy. We thus assume the possibility that the observed discrepancies in the optimised software compensation weights is due to the remaining imperfect shower simulations, especially with the high-Z absorbers of the ScECAL and on the granularity scale studied here. However, the impact of these discrepancies on the resulting energy reconstruction is fairly small, as discussed further in Section 6.4.

Systematic uncertainties
We investigate systematic uncertainties from the following sources: Energy calibration stability The calibration factors of individual detector cells depend on environmental conditions, which might change in between and also during individual runs. Imperfections in correcting for such factors typically shift the energy scale for the whole detector, as most cells will be influenced in the same way. Based on the results obtained from comparing muon calibration samples as shown in Figure 1, as well as reconstructed MIP-like track segments within the studied data sets, we assign an uncorrelated systematic uncertainty of 1 % to the energy scale for each separate beam momentum sample.

Sensor saturation correction
Varying the parameters used in the correction of the sensor saturation within their respective uncertainties does not significantly influence the optimised energy reconstruction weights and adds only negligible systematic uncertainties to the resulting energy resolution and response in both data and simulated events.

Event selection
Possible biases on the energy response and resolution introduced by the event selection as well as systematic uncertainties from varying the applied cuts are investigated by comparing a minimal selection and the full pion selection applied to simulated events. The minimal selection contains the lower and upper cuts on the FHI layer to roughly preserve sampling fractions and average containment between the selections. The bias on both the response and the energy resolution is 1% for all examined beam momenta and physics lists, as given in detail in Table 2. The resulting systematic uncertainties on the energy response are found to be negligible compared to the previously discussed energy scale uncertainties. The determined resolution bias is added to the corresponding systematic uncertainty of the simulated samples. In data, the arithmetic mean between the resolution biases extracted from both physics list is used as an additional systematic uncertainty.

Remaining data impurities
The biggest systematic uncertainties on the results obtained from data stem from the remaining sample impurities as discussed in Section 3.2. The resulting small biases are determined as described below, and corrected for. To account for the resulting additional uncertainty on the results, the magnitude of the correction is added to the respective data uncertainties in quadrature.
The influence of the remaining single electron event contamination is estimated from simulated single pion samples mixed with simulated single electron events according to the electron contamination fractions obtained from the template fits shown in Figure 3 and Section 3.2, and examining their influence on the fitted response and resolution. For 4 GeV/c pion events with a 1.8 % electron contamination the mean bias on the fitted response is found to be 0.96 %, while the mean bias on the fitted resolution is determined as 1.65 % (relative to the fitted resolution). The low single electron event contamination of higher beam momentum samples does not lead to any significant systematic uncertainty.
In order to account for the impact of events with additional energy deposits as shown in Section 3.2, we perform a toy-model study based on the simple signal and contamination model shown in Figure 3. As the true shape of the contamination contribution is not known, a range of contamination shapes is explored by scanning over a range of possible mean contamination energy values. For each fixed mean contamination energy, the combined toy model is fitted to the data points, with the width and relative fraction of the contamination contribution as free parameters. Reconstructed energy distributions are generated according to the combined model fit and their energy response and resolution extracted with the same fitting procedure as used in the full analysis. The biases introduced by the added contamination model are extracted by comparing the fit results to the true values of the assumed signal component. All biases obtained with different background shapes are averaged into a single number per beam momentum by weighting each iteration with the inverse reduced χ 2 of the combined toy model fit, favouring models which result in a good agreement between the toy model and the data.
The resulting biases on the mean reconstructed energy are almost negligible, varying from 0.20 % at 12 GeV/c to 0.16 % in 32 GeV/c samples. The mean bias on the extracted energy resolution varies between 0.5 % to 1.4 %. Since the method to determine the contamination fraction is not applicable to the 4 GeV/c data sample, the respective maximum values of all data samples are used as an estimate for the 4 GeV/c sample.
ScECAL absorber material Pion event samples have been simulated using both ends of the range of possible ScECAL absorber compositions corresponding to effective radiation lengths of 7.6 g/cm 2 and 7.9 g/cm 2 [6]. The difference is found to be negligible for all of the hadron shower observables discussed in this paper.

ScECAL hit energy spectrum
In order to assess the systematic uncertainties due to the misdescription in the tails of the hit energy spectra in simulated electron showers in the ScECAL [6], a full set of additional simulations with an altered effective Molière radius in the ScECAL1 is used [13]. All software compensation weights are reoptimised on this set of simulated samples, yielding weights with small but significant differences to the default simulation2. Comparing fits to the resulting reconstructed energy spectra of the modified simulation events to the default simulation yields no significant difference in response and a small but significant difference in resolution in the range of 2.0 % to 3.5 % for the standard energy reconstruction and 0.1 % to 2.0 % with the software compensation energy reconstruction. The observed differences in resolution are used as the systematic uncertainties on the simulation results. The influence on the longitudinal pion shower profiles is found to be negligible.
In summary, the most important sources of systematic uncertainties of the results presented here are the uncertainty in the run-to-run energy calibration, small biases due to the applied event selection, the data impurities of unclear origin remaining after the event selection, the remaining single electron contamination in the 4 GeV/c data sample as well as the general overestimation of the high energy tails in the ScECAL hit energy distributions in simulated events. A detailed breakdown of all numeric values is given in Table 3.
1The effective Molière radius is increased by artificially doubling the air gap between the ScECAL layers. 2The actual difference in response and resolution between applying both sets of weights is negligible.

Results
This section discusses longitudinal shower observables, energy reconstruction and linearity, and energy resolution of the full calorimeter system for standard reconstruction and software compensation reconstruction. The data are compared with simulations.

Profiles
An example of a distribution of the reconstructed layer of the first hadronic interaction in the ScECAL and AHCAL in pion selected data is shown in Figure 10 for data and both examined simulation models. No events with reconstructed FHI layer ≤ 4 are present in the sample due to the applied pion event selection. The slight suppression of events with reconstructed FHI layer around 5 to 8 is due to the clean primary track requirement in the event selection, the efficiency of which decreases for short primary tracks [13].
The data are well described by the simulated samples within their statistical uncertainties, including the efficiency effect of the primary track selection step and the transition region between the ScECAL and the AHCAL. This demonstrates a good material description of the calorimeter setup, a similar performance of the event selection in data and simulated events, and no significant remaining single electron contamination in the dataset. The mean longitudinal profile of all pion shower events passing the event selection for 32 GeV/c beam momentum is shown in Figure 11. The low mean energy deposit in the first five layers is due to the event selection, similar to the effect seen in Figure 10. The low mean energy deposit in the last AHCAL layers points to good average shower containment even without including the TCMT layers. The general shape of the profile is reasonably well described by all physics lists, including the dip in responses in the ScECAL around the transition region between the ScECAL and AHCAL. Both studied simulation models overestimate the mean energy deposits by around 5 % regardless of the beam momentum, as already noted in Section 4.1. This difference is larger than the systematic uncertainty on the MIP calibration or saturation effects.  Figure 12 shows longitudinal shower profiles as a function of the distance from the shower starting point, taken as the reconstructed FHI layer, separately for showers starting in the ScECAL and in the AHCAL. For events with a reconstructed shower start in the ScECAL, the shower profile predictions by the FTFP_BERT_HP and QGSP_BERT_HP simulations are quite similar for the lower studied beam momenta, but vary by up to 20 % at the higher beam momenta. Compared to data, energy deposits are under and overestimated in different layers for both tested physics lists. QGSP_BERT_HP simulations seem to generally produce more strongly peaked shower profiles, while FTFP_BERT_HP simulations generate slightly wider shower profiles than measured in data. Most of the data points lie between the predictions of these two physics lists used. The overestimation of energy deposits in the reconstructed shower start layer by the FTFP_BERT_HP simulations also explains the peaks in the energy deposits around layer 5, as seen for this physics list in Figure 11. For events in which the reconstructed shower start is located in the AHCAL the examined simulation models agree well with each other. Both simulation models show around 10 % underestimated depositions in the first 0.5 λ n , with an overestimation of 10 % to 20 % in the deeper layers. The agreement between data and simulation is on a similar level, with similar features in the MC/data ratio, as a comparable study performed on data taken with the AHCAL only [23].

Energy reconstruction and linearity
For each event the particle energy is reconstructed with the techniques described in Section 4, without use of the known beam momentum. The energy reconstruction of all data samples and simulations is performed using the weighting parameters optimised from their own datasets. The ratio of mean contributions to the reconstructed energy in the ScECAL and AHCAL r = E AHCAL rec /E ScECAL rec varies from around unity at 4 GeV/c to around three at 32 GeV/c.
The spectrum of reconstructed energies for each sample is fitted with a Novosibirsk function [24], of which the mean response and resolution are calculated using Monte-Carlo integration [13,25]. The systematic uncertainties discussed in the previous section are added to the generally small statistical uncertainties on the fitted parameters.
The mean standard reconstructed energies in data agree very well with the beam energy with all deviations, defined as E rec −E beam E beam , smaller than 3 % as shown in Section 6.2. The reconstructed energy observed in simulations shows a small non-linearity, especially towards the lower beam momenta, with no deviation from the beam energy exceeding 5 %, in qualitative agreement with the expectation of an undercompensating calorimeter. Even though the 4 GeV/c data point has the largest assigned uncertainties overall, it does differ significantly from the simulated samples.
When reconstructing the particle energy using the software compensation scheme described in Section 4.2, the linearity of both data and simulated events is better than 3 %, see Section 6.2. The 4 GeV/c data point is well aligned with the simulated results, due to the additional non-linear degrees of freedom in the software compensation reconstruction.  Figure 13. Residual of mean fitted reconstructed energy over beam momentum in data and different simulation physics lists using the standard reconstruction (left) and software compensation reconstruction (right). The given residual is defined as E rec −E beam E beam × 100 %.

Energy resolution
The energy resolution for each sample is calculated from the ratio of the width and mean of the Novosibirsk function fitted to the reconstructed energy spectra. For the standard energy reconstruction, the data energy resolutions are generally well described by the simulations as shown in Figure 14. QGSP_BERT_HP produces a slightly better agreement with data compared to the FTFP_BERT_HP samples, which show a small deviation in the highest energy point. The software compensation reconstruction leads to a relative improvement of the data resolutions between 10 % and 20 % (2 % and 3 % in terms of absolute resolution) depending on the beam momentum, see Figure 14. The software compensation resolution obtained from simulated samples agrees between the physics lists, but significantly overestimates the achievable resolution improvements at all energies by 5 % to 10 %. All extracted resolutions, including the final combined uncertainties on each point are listed in Table 4. Figure 15 compares the energy resolution obtained from data samples in this analysis to the energy resolutions obtained from a previous analysis in which only pion showers with a reconstructed shower start in the AHCAL are considered [12]. The resolutions obtained for both analyses are in reasonable agreement, indicating that the combined calorimeter system with varying absorber materials and sampling fractions maintains the good single pion energy resolution of the standalone AHCAL.

Application of software compensation weights from simulation to data
The influence of the deviations observed in the software compensation weights between data and simulations was estimated by applying the software compensation weights obtained from simulated samples to the reconstruction of data events. Figure 16 shows the energy resolution and linearity  Figure 15. Single pion energy resolutions with standard and software compensation reconstruction from the combined ScECAL+AHCAL+TCMT system compared to resolutions obtained from AHCAL+TCMT in [12]. of data samples reconstructed using weights optimised from both data and simulation. For this comparison the simulation weights optimised from QGSP_BERT_HP are used, as the difference between weights of different simulation physics lists is small. Furthermore only the software compensation specific weights α i , β i , γ are used as optimised from the simulation, while the standard reconstruction weights w ECAL , w HCAL are used from data to set the correct energy reconstruction scale (see Section 4.1).
Applying the software compensation weights obtained from simulations to data events actually improves the energy resolution slightly by a relative 1 % to 3 %. However the achieved linearity is deteriorated, with additional deviations of magnitudes similar as seen in the resolution of 1 % to 4 %. The 4 GeV/c point shows the biggest deviation when applying the simulation weights to data, in line with the previous observation that the simulated 4 GeV/c response profits most from the software compensation reconstruction.
Although there are significant differences between data and simulation in the first two and the last ScECAL hit energy bin weights, applying weights optimised from simulation onto data events data events yields similar performance to the use of weights optimised from data.

Summary
This paper presents results obtained with pion beams in the momentum range 4 GeV/c to 32 GeV/c at the FNAL testbeam facility and the combined CALICE scintillator-SiPM calorimeter system consisting of ScECAL, AHCAL and TCMT. The results are compared to data simulated with the physics lists QGSP_BERT_HP and FTFB_BERT_HP in Geant 4 version 10.1p2.
The longitudinal pion shower profiles show reasonable shape agreement between data and simulation. The longitudinal shower profile as a function of distance to the reconstructed shower start in the ScECAL shows up to 20 % differences between simulation models, especially for high beam momenta.
Two separate schemes are used to reconstruct the primary particle energy. The linear standard energy reconstruction uses one constant weight per subdetector. These weights are in good agreement for data and simulated events apart from a general constant overestimate of energy deposits in the simulations. The deviation from linearity of the standard energy reconstruction is <5 % in simulation and <3 % in data. The energy resolution of the standard reconstruction is well reproduced by both simulation models. The second scheme, based on local software compensation, parametrises individual energy reconstruction weights for each bin in hit energy as a function of the standard reconstruction particle energy estimate. In the AHCAL, these weights are in good agreement between data and simulations. Significant discrepancies between data and simulation are observed in some ScECAL bins. The discrepancies may be partially explained by the observed deviations in the ScECAL hit energy spectra. The linearity deviation using software compensation reconstruction improves to <3 % for data and simulation. In data, the energy resolution improves by 10 % to 20 % with the software compensation reconstruction. The improvements in energy resolution from the software compensation reconstruction are confirmed but overestimated in simulation by 5 % to 10 % for all beam energies and simulation models.
The energy resolutions from data samples measured with the standard reconstruction and software compensation reconstruction are in good agreement with resolutions obtained from a previous analysis in which only showers starting in the AHCAL are considered. The good standalone single pion energy resolution of the AHCAL is thus maintained by adding the ScECAL in front. Table 1. Stepwise selection efficiencies of the single pion selection for data and different simulation physics lists and particle types. The event quality efficiency evQ is normalised to the number of remaining events after the raw cut. Efficiencies marked with † are normalised to the number of remaining events after the event quality cut to enable comparison of data and simulation results and are given as cumulative fractions of events passing the selections up to the respective step. The raw preselection efficiency of the 4 GeV/c simulations is low, as the simulated beam profile is wider than the ScECAL geometry.  Table 3. Individual contributions to the relative systematic uncertainties on the energy resolutions obtained in this paper from the event selection (Evt. sel.), remaining electron contaminations in data after the selection (e − cont.) and the modeling uncertainty in the simulation (Simu.), remaining impurities in data after the event selection (Impur.), as well as the total systematic uncertainty calculated from their addition in quadrature.

Momentum
Values marked with † are uncertainties applied to data which are extracted from simulations. The value marked with × is estimated from the uncertainties of different data samples, as the method to obtain the uncertainty is not applicable to that data point. Empty fields indicate systematic uncertainties that are not applicable to the specific sample, which are thus not included in the uncertainty calculation.   Table 4. Energy resolutions extracted from pion samples in data and simulations, using the standard energy reconstruction (Std.) and the software compensation reconstruction (SC) as plotted in Figure 14. All errors include the full systematic uncertainties as discussed in Section 5 and given in Table 3.