Prediction of soft proton intensities in the near-Earth space using machine learning

The spatial distribution of energetic protons contributes towards the understanding of magnetospheric dynamics. Based upon 17 years of the Cluster/RAPID observations, we have derived machine learning-based models to predict the proton intensities at energies from 28 to 1,885 keV in the 3D terrestrial magnetosphere at radial distances between 6 and 22 RE. We used the satellite location and indices for solar, solar wind and geomagnetic activity as predictors. The results demonstrate that the neural network (multi-layer perceptron regressor) outperforms baseline models based on the k-Nearest Neighbors and historical binning on average by ~80% and ~33\%, respectively. The average correlation between the observed and predicted data is about 56%, which is reasonable in light of the complex dynamics of fast-moving energetic protons in the magnetosphere. In addition to a quantitative analysis of the prediction results, we also investigate parameter importance in our model. The most decisive parameters for predicting proton intensities are related to the location: ZGSE direction and the radial distance. Among the activity indices, the solar wind dynamic pressure is the most important. The results have a direct practical application, for instance, for assessing the contamination particle background in the X-Ray telescopes for X-ray astronomy orbiting above the radiation belts. To foster reproducible research and to enable the community to build upon our work we publish our complete code, the data, as well as weights of trained models. Further description can be found in the GitHub project at https://github.com/Tanveer81/deep_horizon.

Understanding the distribution and dynamics of the energetic protons in the near-Earth space is essential not only for magnetospheric physics. Energetic protons are also suspected to damage space-based instrumentation and to affect their scientific performance. For example, X-ray telescopes such as Chandra (Weisskopf et al. 2002) and X-ray Multi-Mirror Mission (XMM-Newton) (Jansen et al. 2001) are suffering from contamination by so-called soft protons (SP) (De Luca & Molendi 2004;Leccardi & Molendi 2008;Kuntz & Snowden 2008). These are protons at energies in the range of tens of keV up to a few MeV. The SPs that populate the solar wind (SW) and the Earth's magnetosphere can damage CCD detectors leading to a loss of available exposure time due to an increased background rate.
Consequently, the performance of future X-Ray missions orbiting in the magnetosphere and the SW depends on how well the instruments are protected from the SP. For example, the Advanced Telescope for High-ENergy Astrophysics (ATHENA) mission (Nandra et al. 2013) plans to deploy an array of magnets to deflect charged particles away from the instruments (Lotti et al. 2018;Fioretti et al. 2018). Moreover, the original orbit choice might be changed from L2 to L1 due to a better understanding of this region's energetic particles' dynamics. The Solar wind Magnetosphere Ionosphere Link Explorer (SMILE) (Raab et al. 2016) mission also concerns the energetic particle levels in the magnetosheath that it will experience during its polar orbit.
There are few studies related to the energetic proton population in near-Earth space. In contrast to this work, they focus on well-confined regions around the Earth. For example, Meng et al. (1981); Kronberg et al. (2012Kronberg et al. ( , 2015 have studied the dependence of the distribution of energetic protons on SW and geomagnetic activity parameters in the plasma sheet. The plasma sheet is one of the few regions where it is expected to experience an enhanced background, also for XMM, see, for example, Rosenqvist et al. (2002). In Kronberg et al. (2015) the distribution is projected to the equatorial plane. However, the missions mentioned above plan on taking observations at high latitudes.
Energetic protons can also be observed in other regions. The region upstream of the bow shock, for instance, is known to be populated with energetic protons due to the effective acceleration at the bow shock (e.g., Lee 1982;Kronberg et al. 2009). SW phenomena such as Coronal Mass Ejections and Corotational Interaction Regions are associated with energetic charged particles observed by spacecraft in the region upstream of the bow shock. The bow shock also serves to heat the SW plasma as it flows into the magnetosheath, the boundary region between the bow shock and the magnetopause, resulting in harder energy spectrum. This region is generally considered void of energetic particles (∼100 keV). However, occasionally phenomena associated with energetic particles such as hot flow anomalies, plasma jets (e.g., Facskó et al. 2010;Savin et al. 2014) occur. Some of these energetic particles escape the magnetosphere (e.g., Kronberg et al. 2011). The diamagnetic cavities at cusps are also effective particle accelerators (e.g., Nykyri et al. 2012).
Studying each region in isolation enhances our knowledge about the physics in these regions and simplifies modeling. However, the boundaries between the regions are not always well defined, for instance, due to the quasi-parallel bow shock formation or Kelvin-Helmholtz instability. Therefore, it makes sense for space weather applications to study the proton intensities in the near-Earth region holistically. Moreover, most previous studies only consider a few input solar or geomagnetic parameters instead of utilizing the full range of them. Kronberg et al. (2020) studied the dependence of SP contamination in the XMM-Newton telescope on location and various solar and geomagnetic parameters using a machine learning approach. The study revealed the strongest dependence of the contamination on the location and the SW velocity. Simultaneously, parameters such as the AE or SYM-H indices (which measure the geomagnetic activity, namely, the magnetic field disturbance in the auroral region of the northern hemisphere and the disturbance of the geomagnetic field at the equatorial regions, respectively) have shown significantly lower importance levels for the contamination, which was rather unexpected from common knowledge of magnetospheric dynamics.
This study we derive a predictive model for the energetic proton intensities using the Cluster mission observations in the near-Earth space environment. We exclude the region of the radiation belts since the proton intensity levels in this region are much higher than those in the outer magnetosphere. Our experience has shown that the model tends to predict only the intensities in the radiation belts if it is included. To enable modeling the complex non-linear multidimensional dependencies, we employ a machine learning model instead of simple linear models.
To summarize, our study aims are to (1) test the capability of machine learning algorithms to predict energetic particle populations in the near-Earth space; (2) reveal which parameters are the most important for the prediction of energetic protons at different energies and (3) help future missions planning to deal best with the effects of SP. In this section, we describe which data we use, how it was obtained and preprocessed.

Proton observations
The Research with Adaptive Particle Imaging Detectors (RAPID) instrument (Wilken et al. 2001) on four Cluster satellites (Escoubet et al. 2001) measures distributions of energetic electron and ion intensities from ∼30 keV to ∼4 MeV. Around 50 data products are produced from the raw data and delivered to the Cluster Science Archive (CSA) 1 by the RAPID team. We chose to work with proton observations from spacecraft (SC) 4 (Tango), which has continuous observations from 2001 through the present day. We can combine data from this spacecraft with proton observations by the Cluster Ion Spectrometry (CIS) instrument (Rème et al. 2001). The omnidirectional energetic proton intensities can be found at CSA under the product proton Dif flux C4 CP RAP HSPCT (Daly & Kronberg 2010). We took the first seven energy channels as the labels in our experiment, which represented the energy ranges p1=28-64 keV, p2=75-92 keV, p3=92-160 keV, p4=160-374 keV, p5=374-962 keV, p6=962-1885 keV, p7=1885-4007 keV, respectively. We exclude the region of the radiation belts (radial distance, rdist<6 R E ) from the dataset.
For our experiments, we preprocessed the data as follows. First, we eliminated outliers, namely values below 0 (NAN values) and above 10 8 . Then, we sampled the data from their original 4-second resolution down to 1 minute because we have observed many rapid fluctuations within each minute, and the predictors related to the solar and geomagnetic activity have the highest resolution of 1 minute. To be more precise, we calculated the mean proton intensities each minute for each energy channel and used the beginning of each minute as a timestamp.
Since the proton intensity values span multiple orders of magnitude, we use the common logarithms of the intensities as input to the model. However, since the proton intensity measurements contain zero values, we cannot directly apply the logarithm. We investigated two methods: replacing the zero values with very small values, i.e., one-tenth of the smallest non-zero value, or dropping all measurements where the intensity is zero.
We obtained better results when dropping the zero values in the sense that, in this case, the model was more focused on predicting values above zero. In considering zero values, the model was skewed to the prediction of zero values as they are many. Since high proton intensities are dangerous for the performance of the X-Ray telescopes, we have decided to base our model on the proton intensities above zero values. Developing of the model that focuses on the zero/or close to zero proton intensities requires a separate study. Since the number of zero values differs across channels, we performed these operations independently for each channel. The highest energy channels p6 and p7 were dropped from the dataset because they contain too many missing and zero values.
From 15:21:00 on 9 January 2001 to 09:57:00 on 19 February 2018 UT, 6,051,937 minutes of data in total matched these criteria. We list the predictors in Table 1. Their distributions, along with the distribution of the proton intensities, are shown in Figure 8 in the Appendix.

Predictors
In this subsection we introduce the predictors that we divide into groups: related to location in space and related to the solar, solar wind, and geomagnetic activity.  Counts Figure 2. Distribution of the number of proton intensity measurements for the energy channel p1 in the GSE coordinate system in the same format as Figure 1. Resolution (bin size) is 1 RE.

Location
Each 1 minute averaged value of proton intensity is associated with a location in the Geocentric Solar Ecliptic (GSE) coordinate system represented by parameters x, y and z. The position coordinates were taken from the sc r xyz gse C4 CP AUX POSGSE 1M auxiliary dataset for SC4. We use them to obtain the radial distance from the Earth: the parameter rdist. Throughout the paper, distances are given in R E units. The distribution of the proton intensities and the number of their samples in the GSE system is shown in Figures 1 and 2. Figure 1 (left and middle panels) shows high intensities around the equatorial plane, ZGSE=0, namely the plasma sheet at the night side and the region on closed magnetic field lines at the dayside in the XZ and YZ planes. There are also mid-altitude cusps with less intensive plasma around XGSE=0, in the same planes, above and below ∼3R E and ∼-3R E in ZGSE direction, respectively. Enhanced proton intensities at XGSE 3-6R E at higher latitudes than the plasma sheet (above and below ∼5R E and ∼-5R E in ZGSE direction, respectively) in the XZ plane are also visible. The intensities are higher in the northern hemisphere than in the southern hemisphere. The dusk-dawn distributions (in the YZ plane) show asymmetry with higher intensities at the northern hemisphere's dusk side. The same higher intensity at dusk is visible in the XY plane. This asymmetry agrees with observations of energetic protons >274 keV by Kronberg et al. (2015); Luo et al. (2017). The proton intensities are higher at the dayside in the region on closed magnetic field lines.
In Figure 3, we plot the mean proton intensities versus individual predictors. In the following, for brevity, we refer to the logarithm of proton intensities simply as proton intensities. In panel (a), we observe a strong, almost linear decrease of the proton intensities with the radial distance. The intensities almost linearly decrease with the ZGSE coordinate in the southern hemisphere (see panel (b)), while the dependence is more complicated in the northern hemisphere. Panel (c) shows that the maximum of the proton distribution is around ±5 R E . The proton intensity falls rapidly at ∼-15 R E at the dawn side and ∼13 R E at the dusk side. The dependence of proton intensities on  XGSE is rather complicated, see panel (d). The strong increase of the proton intensities at XGSE<-15 R E can be explained by reduced spacecraft sampling in the northern lobe at this distances, see XZGSE plane in Figure 2. It falls strongly at ∼10 R E as expected at the magnetopause boundary, as also seen in Figure 1. These spatial dependencies resemble those observed for the SP contamination at the XMM-Newton X-Ray telescope in Kronberg et al. (2020), see also Ghizzardi et al. (2017).

Solar, solar wind and geomagnetic activity
We combine the proton intensities with simultaneous observations of solar, SW, and geomagnetic parameters taken from the OMNI database 2 ; see also King & Papitashvili (2005). The SW observations are propagated to the Earth's bow shock. The SW is characterized by the proton density, NpSW in cm −3 ; components of the speed (VSW) in the GSE coordinates, VxSW GSE, VySW GSE and VzSW GSE in km s −1 ; the temperature, Temp, in K; the dynamic pressure, Pdyn in nPa, which is calculated as NpSW*VSW 2 × 1.67 · 10 6 , and components of the IMF in the GSE coordinates, BimfxGSE, BimfyGSE and BimfzGSE, in nT.
The SW velocity V y and V z components show that the deviation from the radial direction leads to an increase of the proton intensities, excluding cases with substantial deviation (>100 km/s) in positive Y-and ZGSE directions for which a decrease is observed, see Figure 3 (e). The proton intensities increase with the SW speed in the anti-sunward direction, V x , and the temperature, see panels (f) and (g). The dependencies of the proton intensities on SW density (panel (h)) and the SW dynamic pressure (panel (i)) have non-linear trends. The change of the IMF's direction towards stronger absolute values mainly leads to increased proton intensities (see panel (j).
The influence of solar irradiation is investigated using the F10.7 index, which measures the radio flux at 10.7 cm (2.8 GHz; Tapping (2013)). This parameter correlates well with the number of sunspots and other indicators of solar and ultraviolet solar irradiance. It can be measured reliably under any terrestrial weather conditions (unlike many other solar indices). It is denoted here by F107, and it is measured in solar flux units (sfu). The solar irradiance is non-linearly related to the proton intensities; see Figure 3 (k).
A parameter of geomagnetic activity such as the auroral electrojet (AE) index, denoted as AE index, in nT, characterizes the magnetic field disturbance in the auroral region of the northern hemisphere (Nose et al. 2017). The relation of the proton intensities with the AE index is also non-linear; see Figure 3 (l). However, a general trend of increase of the proton intensities with the AE index is visible, namely with the geomagnetic activity at high latitudes. Another parameter related to the geomagnetic activity is the SYM-H index, denoted as SYM-H and measured in nT (Nose et al. 2017). This parameter characterizes the disturbance of the geomagnetic field at the equatorial regions. The geomagnetic activity related to the geomagnetic storms, characterized by the SYM-H index, shows non-linear relation with proton intensities, see Figure 3 (m).
If we compare the trends of the proton intensity changes with the solar, SW, and geomagnetic parameters with those for the SP contamination from Kronberg et al. (2020) we notice general agreement between those. More details on the comparison are in Section 5. Figure 4 shows the Pearson Correlation (PC) between parameters possibly related to proton dynamics. The correlation values vary between -1 and 1, with values close to -1/1 meaning perfect linear anticorrelation/correlation and values close to 0 meaning no linear correlation. One should be careful with the interpretation of the PC coefficient, as this indicates only linear relationships. The proton intensities from channel p1 are well correlated with the radial distance and the direction Z, in agreement with Figure 3. From the OMNI parameters, the proton intensities of channel 1 are best linearly correlated with the VxSW GSE, the same as the SP contamination in Kronberg et al. (2020).

Data Split
The full dataset, as was mentioned above, comprises in total 6,051,937 measurements from 2001-01-09 15:21:00 UT to 2018-02-19 09:57:00 UT. We split the dataset into a training (or development) set (80%) and a test set (20%). To prevent test leakage, we do not shuffle the data but split it by a time point with the original order preserved. Afterward, we additionally split the development data into a train (80%) and validation set (20%) again by time. We utilize the validation set to optimize the model hyperparameters.  We normalized the features by subtracting the median, and dividing by the inter-quartile range. 3 Table 2 summarizes the sizes and periods of the data subsets after performing preprocessing and splitting.

MACHINE LEARNING MODEL FOR PROTON INTENSITIES
The relation between the proton intensities and the different predictors listed above is complex; see Figure 3. It is, therefore, often a group of predictors or their ensemble that leads to better predictions than the best individual predictor (Geron 2019).
We interpret the prediction of proton intensities as a regression problem of form where y is the proton intensity for a single channel, p is the spatial position, and x t the additional input parameters as given in Table 1 at time t.
[ ; ] denotes the concatenation, and θ are model parameters that are to be estimated from the given dataset. We study solving this regression task by applying various machine learning models, from which we select according to validation performance. The models' hyperparameters are also tuned according to validation performance. We investigated the following linear models: linear regression (Galton 1886) with and without l 1 /l 2 regularization (lasso (Santosa & Symes 1986)/ridge regression (Hoerl & Kennard 1970)), least angle regression (Tibshirani et al. 2004) with and without l 1 regularization (lasso LARS regression), and linear support vector regression (SVR) (Cortes & Vapnik 1995).
We also used a multi-layer perceptron (MLP) regressor (Rosenblatt 1958). This regressor is a neural network with an input layer, one or more hidden layers, and one output layer. Each layer is a dense layer, are trainable parameters, and σ is a non-linear activation function. The weights are trained end-to-end with an iterative (stochastic) gradient descent method using the gradient of the loss function's output w.r.t. parameters. The gradients can be efficiently computed using back-propagation. Important hyperparameters are the choice of the network structure, i.e., the number of layers and the hidden dimensions d i .

Setup
We used LGBM implementation from LightGBM library (Corporation 2008) and sklearn (Buitinck et al. 2013) for all other models. In order to evaluate the models' performance, we use the following measures: Spearman Correlation (SC), PC, Mean Absolute Error (MAE), Mean Squared Error (MSE), and coefficient of determination (R 2 ). For model selection and hyperparameter optimization, we focus on SC.
To demonstrate the necessity of using advanced machine learning models, we compare the performance against two simple baselines: historical binning and k-Nearest Neighbors (kNN) (Altman 1992).
In the historical binning baseline, we create spatial bins of training data with the k-means algorithm (Lloyd 1982) applied to the position features only. The number of bins is chosen independently for each channel based on validation performance. For a test point, we determine the corresponding bin and use the average proton intensity of that corresponding region from the train data as the prediction.
In the kNN baseline, for a test data point, we determine the k nearest neighbor that exists in the train set based on the Minkowski distance and interpolate its proton intensity for the position. Then we take the interpolated intensity at the test position as the prediction.

Results
In order to determine the best hyperparameters, we used random search over a pre-defined search space (see Table 4 in the Appendix) due to its higher sample efficiency than grid search (Bergstra & Bengio 2012). In this search, we trained numerous models on the training data. We utilized the ASHA Scheduler (Li et al. 2020) for its parallelism and extensive early stopping capabilities via the ray tune (Liaw et al. 2018) framework. After training, we evaluate each model on the validation data, and we choose the configuration which obtains the best validation SC. To reduce resource consumption, we perform this optimization only for channel p1 and apply the same best hyperparameters for the other channels, cf. Table 4 in the Appendix. 4 We obtain the best validation performance for an MLP model. Table 3 shows the final results on the test set. The MLP model outperforms the baseline models such as kNN and historical binning by about 80% and 33%, respectively. The average SC correlation between the observed and predicted data is about 56%. This value is reasonable considering the stochasticity and complex dynamics of the fast-moving energetic protons in the terrestrial magnetosphere. Figure 5 shows the distributions of the observed proton intensities versus the predicted values for five energy channels. The left-hand panels show the training set distributions, whereas the right-hand panels represent the test set. Observed and predicted data for the training and test data sets agree relatively well. The data is mainly concentrated along the black dashed line, corresponding to a perfect correlation. The correlations are not much different between the training and test set, implying that the training data was not over-fitted. The prediction of proton intensities in all five energy channels has a similar quality.
In Figure 6, we show a qualitative example of the model's predictions. The chosen time frame shows a plasma sheet crossing, the approach of the radiation belts, and the cusp region at the nightside of the magnetosphere. The model predicts the proton intensities for the different energy channels, rarely deviating more than one order of magnitude, which is considered a good prediction for energetic particle dynamics.
To evaluate the statistical significance of our results, we use a 2-sided hypothesis test on the SC. The null hypothesis states that the predictions are uncorrelated to the observations. For all five channels, we obtain a p value of 0, or, more correctly, below the smallest positive number in float64. Hence, we can reject the null hypothesis, i.e., the predictions are correlated to the observations. Thus, we can deduce from this result that our model learned the trend of proton intensity.

Feature Importances
To investigate the importance of the individual input features on the model's prediction, we utilize permutation feature importance (Fisher et al. 2019) since it is model-agnostic and interpretable. For an investigated feature column, e.g., x, its values are shuffled to break any association with the target value, i.e., the proton intensity. Then, we re-calculate the model's prediction and evaluate the SC between the prediction and the original target values. The feature importance is measured as the decrease in performance: thus, a more substantial decrease in model performance corresponds to a more important feature. Negative feature importance is also possible when the prediction improves when breaking an input feature's association. We use bootstrapping with 100 iterations and a subsampling fraction of 75% to estimate the mean and variance of this statistic to assess the significance. Figure 7 shows the feature importance for each input variable and channel. The parameters related to the location show significantly higher importance than parameters related to solar, SW, and geomagnetic activities. From those, on average, the most vital dependence is seen for ZGSE and the radial distance. The least important location parameter is YGSE. From the other parameters, the SW dynamic pressure is the most important parameter for predicting the proton intensities, followed by, on average, the AE index.

DISCUSSION
We want to note that the most substantial linear dependence of the proton intensities among the OMNI parameters is on the VxSW GSE, cf. the PC in Figure 4. This linear dependence is also evident in Figure 3. However, the feature importance derived in our model indicates Pdyn and AE index as the most important OMNI parameters, cf. Figure 7. In Figures 3 and 4, we always consider one input variable and one output variable in isolation. The model, however, is not restricted to do so, but rather can combine features to get "latent features", i.e., combinations of individual features or their interaction.
The proton intensities' dependency to different parameters are generally very similar to the dependencies of the XMM SP contamination count rates derived in Kronberg et al. (2020); despite differences in the trajectories of the spacecraft. For example, a clear dependence on rdist is similar in enhancing the proton intensities and count rates between 17 and 19 R E . For the YGSE dependency, we observe for both proton intensities and SP count rates a decrease at YGSE 0 R E and dusk-ward asymmetry at distances YGSE ±10 R E . For the XGSE dependency, we observe high proton intensities and SP count rates between XGSE∼0 and 5 R E . Similar dependencies are observed for the almost linear relation with VxSW GSE up to ∼-900 km/s, for the Temp, Np up to 20 cm −3 , BimfxGSE and BimfyGSE     up to |10| nT, AE index up to ∼700 nT and the SYM-H index within the range between -150 to 50 nT. For Pdyn, the considered scale ranges are very different, and, therefore, the resolution is different. However, we still see an average increase of Pdyn with the proton intensities/SP count rates at least up to 6 nPa. The only considerable disagreement for the relations of the proton intensities and parameters is observed for the F10.7 index. For example, for the XMM count rates, one observes an increase with the F10.7 index up to 150 sfu. However, the proton intensities observed by Cluster show a decrease. We can draw a relation to the complexity of the processes associated with the solar cycle or/and to the bias related to the spacecraft trajectory during the solar cycle. The Cluster mission was located closer to the Earth during the years related to the 24th solar cycle (higher proton intensities versus lower F10.7 index) that is significantly weaker than the 23rd solar cycle during which the XMM measurements were done (lower proton intensities versus higher F10.7 index). The highest PC derived for the proton intensities/SP count rates are the ZGSE, rdist, and VxSW GSE in both cases. In conclusion, based on Figures 3 and 4 our energetic proton observations indicate that it is likely that they can be a cause of the XMM SP contamination. The physical processes leading to proton energization associated with high SW speed are related, for example, to the formation of the quasi-parallel shock, acceleration processes associated with reconnection at the dayside, magnetospheric storms, and substorms, see more detailed discussion in Kronberg et al. (2020). The dependencies of proton intensities and the SP count rates on the different parameters are generally similar. A notable difference is that Pdyn and AE index had the highest importance of the proton intensities in the MLP model. In contrast, the SP count rates in the Extra Random Forest model had the highest importance for the VxSW GSE and F10.7 index (from the solar, the SW and geomagnetic parameters). This difference most likely occurs because of using different non-linear models and different observations' lengths (one vs. one and a half solar cycles). Another possible reason is that the Cluster trajectories covering the plasma sheet much more thorough than those of the XMM. The plasma sheet dynamics is strongly related to substorm activity indicated by the AE index and is strongly dependent on Pdyn. In the study by Smirnov et al. (2019) based on Cluster observations, the dependence of the electron intensities at L-shells between 4 and 6 for ∼1.5 solar cycle (approximately the same period as in our study) was also the best correlated with the Pdyn and AE index. The electron intensities in the outer radiation belts are governed by the substorm associated activity in the plasma sheet and acceleration processes related to charged particle injections/dipolarizations (e.g., Gabrielse et al. 2014;Malykhin et al. 2018). The proton intensities are also enhanced during such injections.
Based on this experience, in our future study, we plan to derive a tailored machine learning model for predicting the proton intensities at different energy channels along the XMM trajectory. In this study, we want to use the same time interval and location in the magnetosphere to avoid possible biases. The aim will be to find which energies of proton intensities correlate most strongly with the SP count/rates. The results of the tailored model will be compared with the results of this model.
In this work, we have tried to include a wide range of the history of the solar and geomagnetic parameters. However, they did not improve the model. The SW-magnetosphere energy coupling functions (e.g., Gonzalez et al. 1994;Milan et al. 2012;Wang et al. 2014) are not considered in this work and beyond the scope of this paper.

CONCLUSIONS
Using 17 years of the Cluster/RAPID observations, we have derived a machine learning-based model for predicting the proton intensities at the energies from 28 to 1,885 keV in the 3D terrestrial magnetosphere between 6 and 22 R E . As predictors, we used the location, solar, SW, and geomagnetic activity indices. The results demonstrate that the neural network's (MLP) prediction capabilities exceed the baselines based on the kNN and historical binning on average by ∼80% and ∼33%, respectively. The average correlation between the observed and predicted data is about 56% despite the complex dynamics of the energetic protons in the magnetosphere. The most important parameters for predicting proton intensities in our model are the ZGSE direction and the radial distance, both related to location. The most important predictor of solar, SW, and geomagnetic activity is the SW dynamic pressure. The results are in general agreement with the study by Kronberg et al. (2020) on the characteristics of the SP contamination in the XMM-Newton telescope. The results can directly be applied in practice, e.g., to assess the contamination of the X-Ray telescopes as well as better determining the contamination risk for various future mission concept orbits. Figure 8 shows the distribution of the number of samples for the predictors and the proton intensities (on the vertical axis) with a given value range (on the horizontal axis).   Table 4 shows the search space for hyper-parameter optimization and the best values found for the MLP model.