CME Arrival Time Prediction via Fusion of Physical Parameters and Image Features

Coronal mass ejections (CMEs) are among the most intense phenomena in the Sun–Earth system, often resulting in space environment effects and consequential geomagnetic disturbances. Consequently, quickly and accurately predicting CME arrival time is crucial to minimize the harm caused to the near-Earth space environment. To forecast the arrival time of CMEs, researchers have developed diverse methods over the years. While existing approaches have yielded positive results, they do not fully use the available data, as they solely accept either CME physical parameters or CME images as inputs. To solve this issue, we propose a method that extracts features from both CME physical parameters and CME images and uses the attention mechanism to fuse the two types of data. First, we design a parameter feature extraction module that extracts features from CME physical parameters. After that, we adopt an effective convolutional neural network model as our image feature extraction module for extracting features from CME images. Finally, utilizing the attention mechanism, we present a feature fusion module designed to fuse the features extracted from both parameters and images of CMEs. Therefore, our model can fully utilize and combine physical parameters and image features, which allows it to capture significant and comprehensive information about CMEs.


Introduction
Coronal mass ejections (CMEs) are the large-scale eruptions of magnetic plasma from the Sun that significantly affect space weather (Hundhausen et al. 1984;Gosling 1993;Hudson et al. 2006).CMEs carrying strong and persistent southern magnetic fields can lead to intense geomagnetic storms upon colliding with the Earth's magnetosphere (Sheeley et al. 1985;Gosling et al. 1991).Besides, the rapid CMEs propagating in the solar wind produce interplanetary shockwaves that are the primary reason for solar energetic particle events (Gopalswamy et al. 2001;Cliver & Ling 2009).As a result, geoeffective CMEs can affect aviation, space missions, and electricity networks, as well as impact other industries, like navigation, gas, and oil pipelines (Boteler et al. 1998;Schrijver et al. 2015).Due to the severe damage caused by geoeffective CMEs, predicting their arrival time early is crucial.
CMEs typically spread from the Sun to Earth for 1-5 days, making it entirely possible to anticipate when they will arrive (Richardson & Cane 2010).Researchers have developed a variety of physics-based models over the years to predict the arrival time of CMEs.Zhao & Dryer (2014) and Verbeke et al. (2019) provide some in-depth reviews of these models, including empirical models, drag-based models, and magnetohydrodynamic (MHD)-based models.Empirical models (Vandas et al. 1996;Gopalswamy et al. 2001;Wang et al. 2002;Fry et al. 2003;Odstrcil et al. 2004;Xie et al. 2004;Schwenn et al. 2005;Manoharan 2006) predict the arrival time of CMEs using the kinematic relationships between the speed (or acceleration) and the transit time of CMEs.Drag-based models (Vršnak 2001;Vršnak & Žic 2007;Song 2010;Subramanian et al. 2012;Corona-Romero et al. 2015;Hess & Zhang 2015) take into account the interaction between the CMEs and the background solar wind to describe the evolution and dissemination of CMEs.The MHD-based models (Smith & Dryer 1990;Dryer et al. 2001;Moon et al. 2002;Tóth et al. 2005;Detman et al. 2006;Feng & Zhao 2006;Feng et al. 2007;Riley et al. 2012) predict the transit time of CMEs by setting their physical parameters as boundary and initial conditions in MHD simulations.The above approach builds the prediction model based on the physical theory using CME physical parameters as inputs.Nonetheless, these approaches rely on empirical methods and manual feature selection, potentially leading to an incomplete utilization of the available data.
Recently, the advancement of machine-learning (McCulloch & Pitts 1943;Wold et al. 1987;Dempster et al. 1977;Vapnik 1999;Loh 2011) anddeep-learning (LeCun et al. 2015;Mnih et al. 2015;Schmidhuber 2015;Zheng et al. 2015;Donahue et al. 2017) techniques have facilitated the utilization of data-driven methods for predicting the arrival time of CMEs.Sudar et al. (2015) are among the pioneers to utilize the machine-learning algorithm for anticipating CME arrival time.They developed a method based on neural networks (NN) that avoids the artificial specification of functional forms.However, they did not consider the impact of background solar wind on the CME's propagation.Liu et al. (2018) deployed a support vector machine (SVM) algorithm to create a model for predicting CME's arrival time.Besides, they also employed a feature selection mechanism to recognize the 12 most crucial parameters from the CME physical parameters and solar wind parameters.The approach did not necessitate any prior knowledge or physical hypotheses and only selected parameters based on their mathematical correlation with the arrival time of CMEs.While these models employ machine-learning techniques for constructing prediction models, they still rely on CME physical parameters as input.Compared to manual calculation and feature selection, the convolutional neural network (CNN) demonstrates superior adaptability to CME images, enabling automatic learning of significant feature representations from CME images.Wang et al. (2019) proposed utilizing a CNN model for automatically learning the features of CMEs, using CME images as an input.Shi et al. (2021) suggested that CME images are the projection of the CME's 3D structure onto a 2D plane, and the shape of the CME represented in images cannot accurately represent its structure in 3D space.Therefore, they proposed to leverage both observational data and expert knowledge in daily forecasting work and developed a novel recommended algorithm-based model.
Although the current methods have achieved good results, they fail to fully exploit data because they only take CME physical parameters or CME images as input data.Researchers compute the CME physical parameters by analyzing CME images, so these physical parameters utilize human prior knowledge and typically include the most significant attributes of CMEs.Nevertheless, these physical parameters are obtained on the basis of the detection and tracking of CMEs.Researchers exclusively compute pre-assumed physical parameters solely from CME images, thereby introducing subjectivity.As a result, these physical parameters might be inadequate, potentially omitting some crucial yet difficult-toperceive image information, leading to less-than-ideal predicted results.CNNs can directly extract features from CME images, allowing for the comprehensive gathering of CME-related information, including details that are not probably perceived by the human visual system.However, CNNs are also prone to interference from image noise, which causes them to focus on unimportant background areas of the image and reduces their performance.As a result, it may be challenging to efficiently use crucial features and exclude unimportant aspects when extracting features from CME images.To address this challenge, we propose a method that simultaneously extracts features from both CME physical parameters and CME images and employs the attention mechanism to fuse these distinct features, thus achieving the best use of all the available data.First, we devise an NN-based module, named the parameter feature extraction (PFE) module, to extract features from CME physical parameters.The PFE module comprises four NN layers with a residual connection, effectively performing nonlinear transformations to conduct feature extraction.Then, we adopt the well-established and efficient CNN model as our image feature extraction (IFE) module, tailored for extracting features from CME images.Lastly, leveraging the attention mechanism, we introduce a feature fusion (FF) module to combine the extracted features from CME physical parameters and CME images.By fusing these two different features, our model can gain human prior knowledge from physical parameters and rich CME-related information from images.Therefore, our model can capture crucial and comprehensive information, facilitating a complete understanding of CMEs.
In summary, the contributions of this paper are fivefold: 1. We make the first attempt to jointly utilize both CME physical parameters and CME images in the CME arrival prediction, aiming to gather critical and comprehensive insights into CMEs.
2. We propose a PFE module with remarkable nonlinear transformation capabilities to extract crucial information about CMEs from their physical parameters.3. We employ the ResNet-18 (He et al. 2016)  The remaining sections of this paper are organized as follows.Section 2 presents the data and data preprocessing methods.Section 3 provides a detailed description of our model, including a PFE module for CME physical parameter feature extraction, an IFE module for CME image feature extraction, and the application of the attention mechanism by the FF module.Section 4 shows the experimental results and provides a detailed analysis.Finally, Section 5 includes a discussion and conclusions.

Data
To construct a data set suitable for CME arrival prediction, we retrieve geoeffective CME events from 1996-2022, along with their physical parameters (including background solar wind parameters) and observed images, ensuring a wide range of data sources and a lengthy period.To increase data quality, we preprocess the data for standardization.

Data Collection
To develop a comprehensive data set, we first collect a list of geoeffective CMEs that impact the Earth's magnetic field and reach Earth, followed by gathering physical parameters and observed images of these CMEs.
First, following CAT-PUMA (Liu et al. 2018), we construct the geoeffective CMEs list by integrating data from multiple CME databases, including the Richardson and Cane list (Richardson & Cane 2010), the full halo CMEs provided by the University of Science and Technology of China (Shen et al. 2013), the George Mason University CME/ICME list (Hess & Zhang 2017), and the CME Scoreboard by NASA. 5 The CME list includes the onset time and arrival time of each CME, with onset time recorded as the moment it first appears in the field of view of the Solar and Heliospheric Observatory (SOHO) LASCO C2 (Brueckner et al. 1995) and arrival time recorded as the time when the associated interplanetary shock arrives.We align the geoeffective CMEs chronologically based on their onset time and remove duplicates.Using this method, we collect 397 geoeffective CMEs from 1996-2022.
Then, to identify CME physical parameters, we compare the collected CMEs with the SOHO LASCO CME Catalog (Yashiro et al. 2004).The SOHO LASCO CME Catalog (Yashiro et al. 2004) database includes comprehensive parameters for all CMEs detected by SOHO LASCO (Brueckner et al. 1995) since 1996, such as angular width, acceleration, average speed, final speed, estimated mass, and main position angle.Unlike Liu et al. (2018), we include all CMEs, irrespective of their angular width, after excluding those without final speed measurements, resulting in 243 geoeffective CMEs.We determine the actual solar wind parameters by averaging the background solar wind parameters captured from the start of the CME until 6 hr later.OMNIWeb Plus6 is a download source for solar wind parameters, including magnetic fields Bz and Bx, proton temperature, plasma pressure, plasma speed, flow longitude, and the alpha-particle to proton number density ratio.In total, our data set includes 12 parameters, comprising both CME physical parameters and solar wind parameters.For the sake of brevity, we collectively refer to these 12 parameters as CME physical parameters in the following text.
Finally, apart from obtaining a list of geoeffective CMEs and their physical parameters, it is essential to gather CME images for a comprehensive analysis.Incorporating satellite data into our data set can significantly enhance our ability to develop a precise model.So, we expand our data set by obtaining CME images provided by SOHO LASCO C2 (Brueckner et al. 1995).We collect the first image captured by the satellite when the CMEs first appeared in the field of view of SOHO LASCO C2 (Brueckner et al. 1995) to ensure the time consistency between the CME images and CME physical parameters.This differs from the approach adopted by Wang et al. (2019), who obtained CME images ranging from the early phase to several hours later.In the end, our data set includes 243 geoeffective CME images acceptable under our criteria, covering the time duration between 1996 and 2022.

Data Preprocessing
After implementing the above steps, the generated data set includes various data such as onset time, arrival time, CME physical parameters, and CME images.For CME physical parameters, we standardize them to eliminate the dimensional differences between different physical quantities.
In machine-learning and deep-learning algorithms, data standardization is a prevalent technique for data preprocessing.Data standardization establishes uniformity in the value ranges of various data attributes, by eliminating scale differences and dimensional effects between different features, which enables the comparison and calculation of each feature within the same numerical range.The process of data standardization involves two primary steps.The first step is to calculate the mean and standard deviation of each feature, and the second step is to standardize the values so that their average is 0 and the standard deviation is 1.The formula used for data standardization is as follows: where x i represents the original data, μ x represents the mean of the data, σ x represents the standard deviation of the data, and * x i represents the standardized data.It is worth knowing that data standardization commonly applies to training data, as we aim to aid the model in comprehending the relationships between features while training and then evaluating its performance on the test data.Standardizing all data may leak test data into the model and adversely affect the results.The importance of data standardization lies in enabling different features to be better compared in various environments.The standardized physical parameters can provide more accurate inputs for subsequent deep-learning algorithms, thereby improving the efficiency and accuracy of our model.

Method
The main goal of our research is to simultaneously extract features from both physical parameters and images of CMEs, and then utilize the attention mechanism to fuse these distinct features to make the best use of all the available data.Therefore, we develop a novel approach consisting of three modules, as shown in Figure 1.First, we design the PFE module with four layers and a residual connection to effectively perform nonlinear transformations on CME physical parameters and conduct feature extraction, as detailed in Section 3.1.Then, we employ a classic and efficient CNN model, ResNet-18 (He et al. 2016), as our IFE module to extract features from CME images, which is discussed in Section 3.2.Finally, based on the attention mechanism, we devise an FF module to make full use of and fuse the features of both CME physical parameters and CME images.A more comprehensive explanation of the FF module is presented in Section 3.3.Therefore, our model can simultaneously capture human prior knowledge and CME-related image information.

PFE Module
The CME physical parameters contain the most important CME attributes and human prior knowledge, making it crucial for the CME arrival time prediction task.To avoid the artificial specification of function forms and achieve automatic feature extraction, we propose an NN-based PFE module for CME physical parameters.In the existing research, Sudar et al. (2015) made the first attempt to use an NN-based model for predicting the arrival time of CMEs.Nonetheless, the model falls short as it only uses two parameters as inputs, thereby limiting the potential of the NN.Besides, Liu et al. (2018) expanded the input to 12 physical parameters and used the SVM algorithm to establish the model.However, the challenge is that the method requires manually specifying the kernel function.Due to the complexity of CME physical parameters, simple kernel functions like linear or Gaussian may not fit properly; hence, it is necessary to use some more complex nonlinear functions.Fortunately, according to the universal approximation theorem (Cybenko 1989;Hornik et al. 1989), we can model any function using sufficient neurons and the correct weights.Furthermore, by expanding the number of input features, the PFE module can effectively learn data features, thus improving the performance and accuracy of our method.
We first briefly introduce the basic principle of an NN and then provide a detailed explanation of our proposed PFE module.In an NN, the neurons are the elementary units, which are stacked together to form a complex NN.In this way, an NN can perform complex computations to imitate the human brain.=  represents the corresponding weights, and b represents the neuron's bias.Then, the activation value o of neurons is obtained by applying a nonlinear function f () to the z: where the nonlinear function f () is called the activation function.According to the direction of information transmission, neurons are stacked in layers to form an NN.Each neuron accepts the output of the preceding layer, performs computations, and transmits the output result to the next layer.Conceptually, an NN represents a function that employs multiple compositions of nonlinear functions to enable complex mapping from input to output spaces.
To quickly and accurately forecast the arrival time of CMEs, we propose a simple but effective PFE module, including four layers, as can be seen in Figure 2. In the first layer, we parse 12 CME physical parameters as 12 512-d features.Since each feature has a distinct physical meaning, direct fusion could limit diversity.Therefore, we first perform feature alignment by mapping various CME physical parameters to a high-dimensional feature space that facilitates FF.In the second layer, we raise the 512-d feature vector into a 2048-d feature vector to improve the encoding of high-level features.The choice of dimension 2048 is to increase the expressive power of the model, enabling it to better capture complex patterns and relationships in input data.Utilizing the second layer, we can explore a varied range of feature combinations and acquire more intricate feature representations.Subsequently, we add the third layer to enhance the nonlinear capability of the model.In addition, in the third layer, we reduce the feature dimension to 512 to reduce parameters, prevent overfitting of the model, and enhance the training speed and generalization ability of the model.As the final layer of our PFE module, the fourth layer directly adds inputs to outputs and normalizes them.This residual connection helps to solve the problems of vanishing and exploding gradients, making the model easier to train.In addition, normalizing each feature dimension makes the input have a similar distribution on different feature dimensions, which helps to accelerate the rate of convergence of the model and improve the robustness of the model.
We conduct a comparative experiment to assess the feature extraction capability of our proposed PFE module.To achieve this, we introduce a fully connected (FC) layer after the fourth layer of the model to predict the arrival time of CMEs.To optimize parameters of the PFE module and produce the most accurate predictions, we evaluate the difference between predicted and true values using the mean square error (MSE) loss function.This function is a commonly accepted method of measuring the effectiveness of loss functions in regression problems, which is determined via the following equation: where n is the number of samples in the test set, Y i is the true value of the sample i, and Ŷi is the predicted value of the sample i.We provide more analysis in Section 4.1.

IFE Module
The CME physical parameters include the most critical CME attributes, but those parameters rely on manual calculations by experts based on CME images.Although experts can extract the most crucial parameters of CME from observed images using their professional knowledge, it may not guarantee the comprehensive utilization of information in the CME images.In recent years, with the continuous development of deep learning, CNN has been employed to forecast the arrival time of CMEs.CNN can directly extract features from CME images, allowing for the comprehensive gathering of CME-related information, including details that are difficult to express with CME physical parameters.
The CNN model includes a feature extractor module with convolution and pooling layers and a regression module with the FC layer.The convolutional layers aim at identifying the patterns from the input images.It can be viewed as a locally connected network where each convolution kernel only connects to a small, contiguous region of the input and generates different feature activations at each location to produce feature maps.The feature map output in the convolutional layer is defined as where O represents the output feature map, f is the activation function, K denotes the convolutional kernel weights, I is the input image, and b represents the bias.Pooling layers can increase the receptive field and combine features from different locations.During pooling, the layer processes the statistics by computing either the mean or the maximum value within a sliding window.Consequently, the resulting model becomes robust to minor shifts or variations.The FC layer is used to synthesize extracted features and output prediction results.
In the field of computer vision, numerous CNN models (Simonyan & Zisserman 2014;He et al. 2016;Howard et al. 2017;Huang et al. 2017) have been proposed and have demonstrated excellent results.The Visual Geometry Group (VGG; Simonyan & Zisserman 2014) model stands out for its consistent and straightforward structure, which includes several convolutional layers with 3 × 3 filters and unchanging 2 × 2 pooling layers.The main attributes of VGG (Simonyan & Zisserman 2014) include the simplicity of design, effectiveness in feature learning, and suitability for transfer learning.However, its drawbacks include the potential for overfitting, computational complexity arising from a substantial number of parameters, and constraints in resource-limited contexts.DenseNet (Huang et al. 2017) is distinguished by its densely connected design, where each layer receives feature mappings from all preceding layers, promoting feature reuse and addressing the vanishing gradient problem.The densely connected architecture yields highly compact models with reduced parameters, enhancing parameter efficiency.Moreover, the primary advantage of DenseNet (Huang et al. 2017) lies in the enhanced propagation of features and gradients, contributing to a smoother information flow throughout the network.MobileNet (Howard et al. 2017) is a lightweight CNN architecture designed for efficient image classification and object detection on resource-constrained mobile devices.Its main benefit is the use of depth-wise separable convolutions, splitting standard convolutions into depth-wise and point-wise convolutions to reduce parameters and computational load, achieving a good balance between model size and performance.MobileNet (Howard et al. 2017) excels in scenarios where model size and computational efficiency are critical.Since the input is images, the CNN performs a series of convolution operations during which each layer extracts only a portion of the image's information.As the number of layers grows, the information of the original image can be lost, resulting in less effective network training, network degradation, and gradient vanishing.To tackle this issue, He et al. (2016) introduce a deep-learning convolution network framework that resolves poor training performance when the number of layers is large.ResNet (He et al. 2016) uses shortcut connections to add all preceding image data to each network block, preserving most of the original information.So, the shortcut enables direct access to previously processed information, which leads to improved performance.
Given the superior performance of shortcut connections in deep networks, we adopt ResNet (He et al. 2016)  Then, to combine the image features extracted by the IFE module with the parameter features extracted by the PFE module, we abandon the regression module of ResNet-18 (He et al. 2016) and only retain the feature extractor module of ResNet-18 (He et al. 2016).The feature map obtained by the feature extractor module contains advanced semantic knowledge, which can reflect the model's attention to different areas of the image.The closer the color is to deep red, the higher the response to the model and the greater the contribution to image features.However, CNNs are also prone to interference from image noise, which causes them to focus on unimportant background areas of the image and reduces their performance.Therefore, it can be difficult to effectively utilize crucial features and eliminate irrelevant details, and maintain comprehensive CME-related information while extracting features from observed images.

FF Module
To obtain crucial and comprehensive information about CMEs, we design an FF module based on the attention mechanism to fully merge the features extracted from CME physical parameters and CME images.The attention mechanism is a novel approach in computer vision, based on the functioning of the human visual system, which involves processes of recognizing and processing visual information.The human brain automatically concentrates on the critical and appropriate parts, whereas disregarding other unimportant information when looking at complex images, is referred to as the attention mechanism.Incorporation of the attention mechanism in deep learning enables the model to imitate the selective attention strategy used by the human brain, resulting in enhanced performance and efficiency.For CME physical parameters, the attention mechanism blends features of various parameters, utilizing a weighted summation method to highlight the most significant parameters, maximizing the prediction accuracy.Combining attention mechanisms and CME images allows CNN to preserve the global features of the image while emphasizing primary areas in the image.
The calculation formula for the attention mechanism is as follows: which consists of three vectors: the query (Q) vector, key (K ) vector, and value (V ) vector.The Q vector signifies the particular elements in the input data that require focus.The K vector indicates the features of the input data, while the V vector denotes the values of each element in the input data.Usually, the K vector is set to be equal to the V vector to reduce computational complexity.Taking the matrix product of Q and K vectors generates a similarity matrix that expresses the similarity between elements in the Q vector and input data.Next, the similarity matrix is normalized by utilizing a softmax function and further generates the attention weight.The attention weight indicates the importance of each element in the input data, which can assign more weight to crucial elements and lower weight to less essential elements, improving the identification of relevant features.Finally, performing a weighted sum of the V vector using the attention weight to fuse the degree of contribution for each element in the input data and obtain the enhanced features.The d k is the dimension of the K vector, which is often set to a smaller value to optimize the network's efficiency.In practical applications, attention mechanisms are categorized into two distinct types: self-attention and cross-attention.Self-attention helps analyze a single input data, where the Q vector is equivalent to the V vector, indicating the attention within the input data itself.On the other hand, cross-attention is utilized to compare different input data, where the Q vector and V vector are disparate, regulating the feature extraction of the V vector through the Q vector.
By applying the attention mechanism, we can fully utilize and fuse the features of both CME physical parameters and  CME images, as shown in Figure 4. We first apply selfattention to the feature extraction of CME physical parameters and CME images, respectively.It is feasible to implement selfattention after the feature extraction from CME physical parameters, enabling the model to focus on the crucial parameter components.Similarly, after feature extraction from the CME images, incorporating self-attention enables the CNN model to highlight crucial image regions.We can also utilize cross-attention between parameter and image features to introduce the prior knowledge of CME physical parameters.By utilizing the cross-attention mechanism, we can acquire attention weight by weighting the importance and influence of parameter and image features.The parameter features can as the Q vector, while the image features can as K and V vectors to acquire attention weight.As shown in Figure 4, the attention weight calculated from the parameter features and image features of CME can enable the model to focus on the CMErelated areas, while ignoring other irrelevant parts.The enhanced features obtained by multiplying the attention weight with the image features can successfully highlight significant features and omit unimportant details, meanwhile maximizing the utilization of complete CME-related image information.As a result, our model can fully utilize and combine the features of both CME physical parameters and CME images with the help of the attention mechanism, as well as simultaneously capture crucial and comprehensive information about CMEs.
To assess the FF capabilities of our proposed FF module, we conduct an ablation study as detailed in Section 4.3.We concatenate the features from both the PFE and IFE modules, feeding them into a regression module without the FF module.We utilize the MSE loss function to train the model for predicting the arrival time of CMEs.As shown in Table 3, the model equipped with the FF module demonstrates superior performance compared to the model without the FF module.Therefore, the experimental results elucidate that our proposed FF module can effectively leverage the features from both CME physical parameters and CME images through the attention mechanism.

Experimental Results and Analysis
We conduct three experiments to evaluate the performance of our method.First, we evaluate the performance of our proposed PFE module on the feature extraction of the CME physical parameters.Second, we assess the effectiveness of our utilized IPE module, evaluate the feature extraction capabilities of several other widely used CNN models, and visualize the feature maps on the CME images.Finally, we combine the PFE, IFE, and FF modules to fully utilize and fuse both the features of parameters and images and make a better result.
The experimental setup is as follows.Given that our data set is small, comprising only 243 samples, we employ a fivefold cross-validation (Stone 1974) strategy to ensure a robust performance assessment.Additionally, our evaluation metrics encompass both MAE and RMSE to provide a comprehensive understanding of model performances.The reported results include the mean and standard deviation of MAE and RMSE across the fivefold cross-validation.

Comparison Results of the PFE Module
We compare our proposed PFE module with the existing models based on CME physical parameters to show the advantage of our method, as illustrated in Table 1.The compared methods include the NN-based model proposed by Sudar et al. (2015) and the SVM-based model proposed by Liu et al. (2018).While Sudar et al. (2015) also used the NN architecture, it only utilized two parameters as inputs, which constrains the performance of the NN as a data-driven method.Thus, Sudar et al. (2015) obtained the poorest results, with an MAE of 11.56 hr.The SVM-based model proposed by Liu et al. (2018) not only utilized CME physical parameters but also incorporated the background solar wind as input, which provided additional insights into CME propagation and evolution, leading to the best MAE of 5.93 hr.Notably, Liu et al. (2018) attained the best performance in 100,000 trials, which does not necessarily indicate the universality and stability of their method.To ensure a fair comparison, we perform a new experiment of Liu et al. (2018)ʼs model on our test set and achieve an MAE of 11.58 ± 1.21 hr and a RMSE of 15.52 ± 1.58.Our proposed method utilizes a four-layer NN, incorporating 12 CME physical parameters along with background solar wind parameters as input.This configuration yields an MAE of 9.63 ± 0.93 hr and a RMSE of 12.04 ± 1.12.Our method obtains better performance than Sudar et al. (2015) by utilizing more input data and a more complex NN architecture.Besides, our model outperforms Liu et al. (2018), which demonstrates the superiority and stability of our method.These results suggest that training an NN to learn features from data resulted in a model that better fits the data, yielding more accurate and generalized results.

Comparison Results on IFE Module
We conduct a comparative analysis of our IFE module with existing CNN-based models to evaluate their performance, as depicted in Table 2. First, we compare our IFE module with a CME-specialized model proposed by Wang et al. (2019).Notably, Wang et al. (2019) also employed a CNN architecture, but their model consisted of only four convolutional layers, rendering it a shallow network and limiting the overall performance potential of the CNN.As a result, the result of Wang et al. (2019) shows a lower accuracy, yielding an MAE of 12.42 hr.In our research, we employ a classical and efficient CNN architecture known as ResNet-18 (He et al. 2016).ResNet-18 (He et al. 2016) encompasses 18 layers of learnable weights, thereby exhibiting more  To gain insight into the effectiveness of the IFE module, we also conduct a visual comparison and analysis of its feature maps, as depicted in Figure 5.The first row presents the input CME images, which exhibit three different components: the central dark red disk represents the Sun, the surrounding bright area signifies CME and the remainder comprises background and noise.The second row displays the feature map generated by our IFE module, containing high-level semantic information that reflects the model's focus on distinct areas within the image.The model's response is indicated by the intensity of the red color, with deeper red shades denoting a greater impact on the results of the predictions.In the third row, we overlay the feature map on the input images, facilitating the correspondence of the red areas in the feature map to the CME structures in the original images.Notably, the regions in the feature map with colors closer to deep red primarily cover the bright CME surrounding the disk.The IFE module demonstrates an evident ability to distinguish the CME region from the background and noise components.Consequently, the IFE module exhibits superior adaptability to image data, autonomously acquiring complete and in-depth CME-related information from images, including features that may be challenging for human perception and difficult to express with CME physical parameters.However, the feature map also exhibits some yellow regions at the edge.This observation indicates that parts of the image regions that do not contain CME may be able to influence the IFE module.Consequently, there arises a need to enhance the ability of the IFE module to extract features from CME images, enabling it to focus solely on the principal regions of the CME while disregarding unrelated areas.

Comparison Results of the FF Module
The IFE module demonstrates the remarkable adaptability of CNN to image data, autonomously capturing CME-related information, even those challenging for human perception and physical parameters.However, the IFE module is susceptible to noise and may focus on unimportant image areas, leading to a decline in overall performance.Therefore, it can be challenging to concentrate on vital image regions during feature extraction from CME images.To address this issue, we propose to combine the features extracted from CME physical parameters and CME images for capturing crucial and comprehensive information.First, we simply concatenate the features from the PFE and IFE modules and feed them into a regression module to assess the efficacy of integrating two distinct types of CME features.As shown in Table 3, the performance of concatenated features outperforms the results obtained using either the PFE or IFE module separately, demonstrating the benefit of integrating two unique types of features.However, the two types of features from the PFE and IFE modules both  encompass CME-related information, potentially introducing redundancy and noise.Hence, we propose the FF module to selectively ignore unnecessary image areas and focus exclusively on regions most relevant to CMEs by leveraging the attention mechanism.Table 3 illustrates the results of the combined utilization of PFE, IFE, and FF modules, which resulted in a remarkable MAE of only 8.51 ± 8.66 hr and an RMSE of 11.31 ± 1.16 hr.This outstanding performance surpasses the results obtained using the PFE or IFE module alone, as well as the straightforward concatenation of two features.
To comprehend how the attention weight affects the feature extraction process of CME images, we have generated visual representations of the attention weight of our model.Figure 6 presents these attention weights.Similar to Figure 5, the first row displays the original input image to the model, the second row presents the visual representations of attention weight, and the third row exhibits the composite input image with overlaid attention weight.The red regions in the composite image are assigned larger weights, implying a greater impact on the feature extraction process.Notably, as depicted in Figure 6, the model mainly pays attention to the bright CME area and ignores the remainder background and noise.In Figure 6, the red regions appear on the luminous area of CME, emphasizing that the range of the CME plays a significant role in predicting CME arrival time.Besides, Figure 6 also demonstrates the yellow regions that appear on the edge of the CME contour, reflecting their higher importance in the model.The attention weight multiplied with the image features can successfully emphasize critical elements and eliminate unnecessary details, as well as maximize the utilization of CME-related information.Based on these observations, we can deduce that incorporating the CME physical parameters with CME images can emphasize primary areas in the image and effectively exploit the CME images.
Moreover, to assess the effectiveness of our method, we conduct a comparative evaluation between our method and the Average of all Methods recorded on the CME Scoreboard. 7ere, for the sake of simplicity, we conduct the comparative experiment using the test set from the final iteration of the cross-validation.In our test set, a total of 15 CME events are recorded on the CME Scoreboard. Figure 7 expresses the compared results of our model with that of the Average of all Methods, where the vertical axis represents CME events while the horizontal axis indicates the absolute error (AE) of predictions.For each CME event, the upper blue bar represents the AE of the Average of all Methods on the CME Scoreboard, and the lower orange bar represents the AE of our model.Figure 7 demonstrates that our model's prediction error is lower than existing models in 11 events (corresponding to around 73%), and our model performs worse than existing models in only four events (around 27%).For these 15 events, our model's MAE is approximately 7.86 hr, while the existing model's MAE is around 13.98 hr.These experimental results demonstrate that our model is more advanced than existing models on 15 intersection events because our method can fully utilize and combine the features of both parameters and images to capture crucial and comprehensive information about CMEs.
We present the distribution of the AE for 48 CME events within the test set to further demonstrate the effectiveness of our method, illustrated in Figure 8(a).The horizontal axis represents the AE, while the vertical axis represents the corresponding frequency.Notably, the leftmost three rectangular bars exhibit the highest frequency, indicating a high level of accuracy in most predictions.Specifically, approximately 61% of the CME events have an AE of less than 9 hr, signifying that our proposed method performs well, achieving an AE of fewer than 10 hr for the majority of CME events.For further analysis of the correlation between the predicted and actual arrival times, we present the results in Figure 8(b).The red dotted line represents the predicted results that align with the actual results, and the scattered points correspond to individual CME events.From the distribution of these points, it can be inferred that they cluster closely around the dotted line, indicating a strong correlation between the predicted and actual values for all the events.

Discussion and Conclusions
In this study, we have developed a novel method for feature extraction from both CME physical parameters and CME images and utilize the attention mechanism to fuse these distinct features.First, we proposed a simple yet effective PFE module to map diverse CME physical parameters to a highdimensional feature space, thereby efficiently extracting embedded prior knowledge.For the extraction of CME-related information from images, including imperceptible features, we utilized the widely used ResNet-18 model as our IFE module.Furthermore, employing the attention mechanism, we design an FF module to combine the features extracted from CME physical parameters and CME images.Therefore, our model can fully utilize and combine physical parameters and image features, which allows it to capture significant and comprehensive information about CMEs.
The main contributions of our study can be summarized as follows: first, to our knowledge, we make the first attempt to combine CME physical parameters with CME images to obtain  critical and comprehensive information about CMEs.Second, to enhance the feature extraction capability of CME physical parameters, we employ a more complex and deeper NN compared to previous methods as our PFE module.Third, we utilize the well-known ResNet-18 (He et al. 2016) model as our IFE module due to its exceptional performance.Additionally, we evaluate the feature extraction performance of several other popular CNN models on CME images.Fourthly, we integrate the attention mechanism into the feature extraction process and introduce the FF module to leverage human prior knowledge from physical parameters and extract rich CME-related information from images.The visualization results demonstrate that our model effectively focuses on the CME shape, contour edges, and scope, guided by the prior knowledge of CME physical parameters.Fifth, regarding prediction accuracy, our method is superior to existing methods, obtaining an MAE of only 8.51 hr and RMSE of 11.31 ± 1.16 hr.Our research highlights the effectiveness of fusing both physical parameters and image features, which can make our model selectively ignore unnecessary image areas and concentrate exclusively on regions most relevant to CMEs.
In this study, our goal is to enhance the capacity of our model to utilize physical parameters and image features.However, we do not explore its ability to deal with temporal relationships.The CNN model views each CME image as a distinct event, failing to consider interimage relationships during consecutive satellite image collections.Thus, for future work, we plan to employ deep-learning techniques to explore the temporal relationship of the CME image sequence, which may improve the accuracy of the CME arrival prediction.Furthermore, our model solely predicts the arrival time of the CME on Earth without determining if it would actually reach Earth.Accessing the geoeffectiveness of CME is critical for studying its arrival time and other space weather tasks.Therefore, the second step in future work is to predict if the CME would reach the Earth for all CME events listed in the SOHO LASCO CME catalog (Yashiro et al. 2004).Finally, we intend to introduce more refined predictions in later research that consider the connection between CME and other strong solar activities, such as M-level or higher solar flares, to minimize the potential damage of solar activity to the Earth.

Figure 1 .
Figure 1.Our proposed approach comprises three modules: a PFE module, an IFE module, and an FF module.The PFE module extracts features from CME physical parameters and the IFE module extracts features from CME images.The FF module fuses the features of both CME physical parameters and CME images to capture comprehensive and crucial information about CMEs.
Neurons simulate the structure and characteristics of biological neurons, primarily by receiving input signals and then producing corresponding outputs.Given d input signals, represented as the vector [ ] computes a weighted sum z of the inputs as follows:

Figure 2 .
Figure 2. The PFE module is designed for CME physical parameters.The model consists of three linear layers with the ReLU function, as well as an add and normalize layer that constructs the residual connection.

Figure 3 .
Figure 3.The IFE module employs a ResNet-18 architecture, comprising convolutional layers and pooling layers.The IFE module can automatically extract comprehensive CME-related features from images, including aspects that could be difficult for humans to perceive.

Figure 4 .
Figure 4.The attention weight calculated from parameter features and image features can make our model selectively overlook irrelevant image areas and focus solely on regions most crucial to CMEs.
excellent feature extraction capability and yielding a better MAE of 9.51 ± 1.03 hr and an RMSE of 12.19 ± 1.21 hr.Additionally, we perform a comparative analysis of our IFE module with other widely used CNN models, as outlined in Section 3.2.In our experimental findings, our IFE module demonstrated superior performance in comparison to VGG (Simonyan & Zisserman 2014), DenseNet (Huang et al. 2017), and MobileNet (Howard et al. 2017).The robust performance of the ResNet-18 (He et al. 2016) we employed is attributed to its utilization of shortcut connections, which aid in creating a deeper architecture and enhancing effective feature extraction capabilities.On the other hand, VGG (Simonyan & Zisserman 2014), DenseNet (Huang et al. 2017), and MobileNet (Howard et al. 2017) face challenges, possibly due to their model complexities and limitations in adapting to the task.These comparisons highlight the importance of considering model characteristics and their suitability for specific tasks when utilizing a CNN-based model to predict the arrival time of CMEs.

Figure 5 .
Figure 5.The visualization of the feature map indicates that the IFE module can automatically extract complete and in-depth CME-related information from images, including features that might be challenging for human perception.

Figure 6 .
Figure 6.The visualization of attention weight.The highlighted area visualizes the weight of different areas in the image, with areas of higher weight closer to red and areas of lower weight closer to blue.

Figure 7 .
Figure 7.The comparative study between our method and the Average of all Methods recorded on the CME Scoreboard (https://kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/).

Figure 8 .
Figure 8.The comprehensive analysis of our final results.(a) The distribution of AE between calculated and observed transit times for all CME events in the test set.(b) The correlation of predicted and observed transit time of each CME event in the test set.
(He et al. 2016)res from CME images.Specifically, in practical applications, we select ResNet-18(He et al. 2016)as our IFE module, as depicted in Figure 3.To evaluate the feature extraction capabilities of various CNN models, we perform a comparative experiment as outlined in Section 4.2.First, we employ the MSE loss function to train the CNN model, incorporating a regression module for predicting the arrival time of CMEs.As illustrated in Table 2, our IFE model achieves the highest performance among the considered CNN models.

Table 1
Comparison of Our NN-based Method with the Existing Methods

Table 3
Ablation Studies on the FF Module