Optimising the Active Muon Shield for the SHiP Experiment at CERN

The SHiP experiment is designed to search for very weakly interacting particles beyond the Standard Model which are produced in a 400 GeV/c proton beam dump at the CERN SPS. The critical challenge for this experiment is to keep the Standard Model background level negligible. In the beam dump, around 1011 muons will be produced per second. The muon rate in the spectrometer has to be reduced by at least four orders of magnitude to avoid muoninduced backgrounds. It is demonstrated that new improved active muon shield may be used to magnetically deflect the muons out of the acceptance of the spectrometer.


Introduction
The Standard Model of elementary particles despite its enormous success, has several places for improvement [1]. In particular, the baryon asymmetry of the Universe [2] and a Dark Matter existence are yet to be explained. The discovery of heavy neutral leptons (HNLs), right handed partners of the Standard Model neutrinos, would solve both above mentioned riddles of Nature. The existence of such particles is strongly motivated by theory and were searched extensively previously. Cosmological constraints on the properties of HNLs now indicate that the majority of the interesting parameter space for such particles was beyond the reach of the previous searches at the PS191 [5], BEBC [7], CHARM [4], CCFR and NuTeV [6] experiments. For this reasons, a new fixed-target experiment at the CERN SPS accelerator was proposed [8]. This experiment will use a 400 GeV proton beam on a fixed target to produce a large number of charm mesons. The HNLs from charm meson decays have a significant polar angle with respect to the beam direction, approximately 50 mrad on average. In order to maximise the geometric acceptance for a given transverse size of the detector, the detection volume must therefore be placed as close as possible to the target. The production of the charm mesons is accompanied by copious direct production of pions, kaons and short-lived light resonances. The subsequent decays of these particles would result in a large flux of muons and neutrinos. To minimise these decays, a combination of a target and a hadron absorber of a few metres length, both made of as dense a material as possible, is required. To reduce the detector occupancy and backgrounds induced by the residual muon flux, a muon shield is required downstream of the hadron absorber. The experimental set-up must therefore balance the opposing requirements of locating the detector as close as possible to the target and of accommodating a sufficiently long muon shield upstream of the fiducial volume of the detector to reduce muon-induced backgrounds.

Bayesian Optimisation
The main goal of our research was to find a light and efficient magnetic shield. In order to achieve this, we applied a Bayesian optimisation algorithm. In this section we give an introduction to this method and motivation behind this approach. Let f (x) be a black-box function on X ⊂ R d , which does not have an analytic expression and can be evaluated at a specific point only by conducting computationally intensive simulations, which in turn can yield noisy observations of the form y i = f (x i ) + i , where i is the random variable with zero mean and constant variance. Our ultimate goal is to find a point x * where f (x * ) is a true global optimum (henceforth without loss of generality we will consider a maximisation problem). In this problem setup, the general framework for optimum searching iteratively constructs the sample , each iteration consists of three steps: Convergence to the exact solution of the problem may require infinite amount of iterations, however, under some theoretical assumptions any degree of its approximation can be obtained within the corresponding finite horizon.

Theorem 1
If f (x) is Lipschitz-continuous with a Lipschitz continuity constant C, X is a d-dimensional unit hypercube, then in the noise-free case for any ε > 0 some point x + with near-optimal true value Strong convergence guarantee is provided against the worst-case scenario, however the solution in this case will typically require an impractically large number of simulations for any reasonable accuracy of approximation. Since such scenarios are not plausible in practice and the number of simulations is very limited, one can come to the idea to relax guarantees in order to meet practical limitations, yet insure against average-cases. Bayesian approach to optimisation allows implementing this idea by providing specific choosing strategies (step i) in the framework for optimum search. In this approach, the objective function is approximated with a surrogate model, that is, a parametric function for which prior knowledge of parameters combined with likelihood of the observed sample defines a posterior distribution over the parametric family. Choice of new points to simulate is guided by the principle of minimum expected risk (its formalisation under the Bayesian framework has been provided in [9]) resulting in a secondary optimisation problem, which in turn depends not on the original function, but on its computationally cheaper surrogate-model. The most popular choice for building surrogate models in the field of engineering design is based on taking Gaussian processes as a parametric family of priors along with assuming there is white noise in the observations i = N (0, σ 2 ). By definition [11,22] Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Consequently it is completely specified by its mean function m(x) = E[f (x)] and the covariance function for Bayesian optimisation is that posterior takes also a form of Gaussian process, moreover its mean and covariance functions have analytic expressions: Several conditions on the family of priors that ensure convergence of the Bayesian optimisation to the optimum are specified in [12], they are satisfied by a Gaussian process with a constant mean function. Therefore, the convergence is guaranteed when the principle of minimum expected risk is used, as for instance an expected deviation from the global optimum: Expression 2.1 is computationally expensive since it requires estimation of f (x * ), thus in present work we replace this criterion with its common approximation called Expected Improvement [13]: . Expected Improvement has also been shown to converge under additional mild assumptions [14], but its main advantage is a closed-form expression, that doesn't require numerical integration: , φ and Φ are PDF and CDF of the standard normal distribution respectively.

SHiP shield optimisation
The muon shield is a critical component of the SHiP experiment, which deflects the high flux of muons produced in the target, that would represent a very serious background for the particle searches, away from the detector. The shield consists of eight magnets and each magnet is parameterised by seven values: length, width, etc... Because the cost of the muon shield is significant we include the weight of the muon shield as a proxy for the cost. Therefore, our aim is to find the most efficient solution at a lowest cost possible. We apply Bayesian optimisation techniques to solve this problem. The optimisation is performed for already chosen material, thus the material balance is discussed elsewhere [15]. For the evaluation of the shield performance we use 18 million simulated events passed through the muon shield and measured at a scoring plane 64m downstream of the muon shield at the z of the first tracking station of the spectrometer. All other subsystems downstream of the muon shield are removed. Proton fixed target collisions and inelastic muon interactions are simulated using PYTHIA8 [16] and PYTHIA6 [18] respectively. Heavy flavour cascade production is also taken into account [19]. The SHiP detector response is simulated in the GEANT4 [20] framework. The simulation is done within the FAIRROOT framework [21]. The transverse (x, y) position of muons is obtained at the first tracking station. For muons with |y| < 5m, their x-position is converted into "importance weight" σ µ ± = (260 ∓ x µ /cm)/560 for −300 < x µ < 260 and −260 < x µ < 300cm for positive and negative muons respectively, else σ µ + = 0 [15]. Now we can introduce the complete loss function, that depends on the physical performance of the shield, its weight, and the length implicitly via the weight.
where Σ = µ σ µ , W is a weight of configuration and W 0 is a weight of a baseline configuration [15] shown in Fig.1. The weight of the baseline is about 1900 tons and Σ for the baseline is approximately equal to 32. As can be seen, weight is penalised exponentially because we are not interested in the heavy regions, as we will not be able to construct such configurations due to budget and engineering constraints. To ensure a more even coverage of the muon phase space, the number of muons is capped in bins of momentum and transverse momentum, and augmented through resampling available muons in bins with low occupancy. This also reduces the overall number of muons to be simulated to about 500 thousand, and helps us to decrease the time of computations by factor of 8 times in average. We also introduce noise as a prior knowledge into Gaussian Process: in this setup GP tries to estimate the noise of the computation and incorporate it into final variance. The points are computed in batches of 100 points to optimise the usage of the available computing resources. The process of optimisation is illustrated in Fig. 2.   Distributions of rejection performance scores for resampled and full muon samples for the best found shield configuration, comparing with the baseline solution. The optimisation procedure is stopped after 5000 iterations. The obtained configuration is found to be lighter by 25 % than the baseline while having about the same rejection capability as illustrated in Fig. 3. Fig. 4 illustrates the new found configuration.

Conclusion
Bayesian optimisation is a powerful tool which can be applied for the optimisation of non differentiable functions. Even though Gaussian processes can guarantee global optionality under ideal conditions, these are not satisfied under resouce restrictions constraints. However we have demonstrated that this method can be successfully applied even to complicated optimisation problems in physics.