An Algorithm for the Determination of Coronal Mass Ejection Kinematic Parameters Based on Machine Learning

Coronal mass ejections (CMEs) constitute the major source of severe space weather events, with the potential to cause enormous damage to humans and spacecraft in space. It is becoming increasingly important to detect and track CMEs, since there are more and more space activities and facilities. We have developed a new algorithm to automatically derive a CME’s kinematic parameters based on machine learning. Our method consists of three steps: recognition, tracking, and the determination of parameters. First, we train a convolutional neural network to classify images from Solar and Heliospheric Observatory Large Angle Spectrometric Coronagraph observations into two categories, containing CME(s) or not. Next, we apply the principal component analysis algorithm and Otsu’s method to acquire binary-labeled CME regions. Then, we employ the track-match algorithm to track a CME’s motion in time-series images and finally determine the CME’s kinematic parameters, e.g., velocity, angular width, and central position angle. The results of four typical CME events with different morphological characteristics are presented and compared with a manual CME catalog and several automatic CME catalogs. Our algorithm shows some advantages in the recognition of CME structure and the accuracy of the kinematic parameters. This algorithm can be helpful for real-time CME warnings and predictions. In the future, this algorithm is capable of being applied to CME initialization in magnetohydrodynamic simulations to study the propagation characteristics of real CME events and to provide more efficient predictions of CMEs’ geoeffectiveness.


Introduction
Coronal mass ejections (CMEs) are large blobs of plasma released from the Sun's corona into interplanetary space.As the most energetic eruptive solar event and the most violent energyreleasing activity in the solar system, CMEs can trigger geomagnetic storms (Gosling et al. 1990), damage satellites and orbiting space stations (Valach et al. 2009), increase the radiation levels for astronauts on duty (Reeves et al. 2003), and disrupt electronic facilities on the ground when they approach the Earth (Abda et al. 2020).Because of the tremendous influence of CMEs, it is important to understand them.The first modern observation of a CME through a white-light coronagraph was made by the Skylab space station in 1973 (Gosling et al. 1974).Following Skylab, several space-based coronagraphs were launched to provide further observations of CMEs.Launched in 1995, the Large Angle Spectrometric Coronagraph (LASCO) C1, C2, and C3 imagers on board the Solar and Heliospheric Observatory (SOHO) spacecraft (Brueckner et al. 1995) cover fields of view (FOVs) of 1.1-3 R s , 1.5-6 R s , and 6-30 R s , respectively.In 2006, the Solar Terrestrial Relations Observatory mission was launched, consisting of two identical satellites orbiting the Sun in different directions (Kaiser 2005).The COR 1 and COR 2 coronagraphs of the Sun-Earth Connection Coronal and Heliospheric Investigation (SECCHI) suite (Howard et al. 2008) are capable of observing the Sun from two different views.
Categorizing coronagraph observations into a CME event catalog provides a basis for further research on CMEs, offering several benefits, e.g., easier access to data and comprehensive analysis of statistical trends.Event catalogs can be divided into manual and automatic catalogs.A widely used manual catalog is the SOHO LASCO CME Catalog,4 maintained at the Coordinated Data Analysis Workshop (CDAW) data center of NASA (Yashiro 2004).The key parameters-e.g., appearance time, central position angle (CPA), angular width (AW), and linear velocity-as well as a short video of every CME recorded are listed (Gopalswamy et al. 2009) in the catalog.The determination of these key parameters requires researchers to manually examine each frame of the coronagraph observations and to capture the sections of the CME structures in order to obtain accurate measurements.This process of manual detection is labor intensive and time-consuming, especially when the solar maximum arrives and the frequency of CME occurrences increases.Meanwhile, the subjectivity of a manual catalog is also inevitable.
An automatic catalog is urgently needed as the amount of data grows.Several automatic methods for CME recognition and tracking have been developed.Robbrecht & Berghmans (2004) first proposed the Computer Aided CME Tracking (CACTus) 5 package.They used the Hough transform to convert LASCO C2 coronagraph observations to images in the [time, height] coordinate system, where the CME appears as an inclined bright ridge.Olmedo et al. (2008) developed the Solar Eruptive Event Detection System (SEEDS). 6The coronagraph observation is projected to one dimension along the angular axis to measure the total brightness of signals along one particular angular degree.Then they applied thresholds and the region growing method to track CMEs.The SEEDS catalog is provided online, using LASCO C2 and SECCHI COR2 observations from 1996 and 2007, respectively.Boursier et al. (2009) proposed the algorithm named Automatic Recognition of Transient Events and Marseilles Inventory from Synoptic Maps (ARTEMIS),7 which was later updated to a new generation by Floyd et al. (2013).It is based on adaptive filtering and segmentation after the morphological characteristics on a synoptic map are acquired.Morgan et al. (2012) and Byrne et al. (2012) introduced the CORonal Image Process method (CORIMP)8 method.It is dependent on a deconvolutional technique to separate the LASCO images into dynamic components that contains the CME signal and quiescent corona.The CME events as well as their basic information have been recorded on their websites since 2000.
The obtained parameters provide key information for further research on CMEs, such as predictions of CME arrival times at Earth based on empirical methods (Kim et al. 2007;Michalek et al. 2008) and machine-learning methods (Sudar et al. 2016;Liu et al. 2018;Wang et al. 2019a), reconstruction of the threedimensional morphology of CMEs (Zhao et al. 2002;Chané et al. 2005;Thernisien et al. 2006), and the initialization of magnetohydrodynamic (MHD) simulations (Shen et al. 2014;Liu et al. 2019).
Machine learning is a collection of algorithms dedicated to learning from data and making predictions and decisions without human intervention (Mitchell 1997).It has proven to be a powerful tool in various disciplines and applications.Recent years have witnessed a large increase in the number of publications applying machine-learning techniques to space weather (Camporeale 2019).Early attempts to apply machine learning to space weather began with geomagnetic storm predictions in the 1990s.Lundstedt & Wintoft (1994) trained a neural network using solar wind data as input to predict geomagnetic storms.Furthermore, Wintoft & Lundstedt (1997) built a hybrid intelligent system for predicting the daily mean solar wind speed using a coronal magnetic field model and a neural network.Inspired by their work, Yang et al. (2018) developed a model for predicting hourly solar wind speeds 4 days in advance.Machine-learning techniques have also been applied in other studies: the prediction of CME arrival times at Earth (Sudar et al. 2016;Liu et al. 2018;Wang et al. 2019a;Fu et al. 2021;Yang et al. 2023), predictions of solar flares (Bobra & Ilonidis 2016;Huang et al. 2018;Park et al. 2018;Li et al. 2020;Zheng et al. 2021), predictions of relativistic electrons at geosynchronous orbit (Shin et al. 2016;Bortnik et al. 2018;Wei et al. 2018;Zhang et al. 2020), predictions of geomagnetic activity (Chandorkar & Camporeale 2018;Gruet et al. 2018;Chakraborty & Morley 2020), and predictions of solar wind speeds (Upendran et al. 2020;Bailey et al. 2021;Sun et al. 2022).
Researchers have also managed to incorporate machine learning in the field of CME detection and tracking.Wang et al. (2019b) trained a convolutional neural network (CNN) model to classify LASCO C2 coronagraph images of a sequence into images with and without CME regions, followed by a deep descriptor transformation to localize the same object.They applied a graph cut algorithm to fine-tune the CME regions for later CME tracking.Alshehhi & Marpu (2021) used a pretrained VGG-16 network to extract features from the input LASCO C2 coronagraph and applied principal component analysis (PCA) to reduce the dimensions of the output from the previous step.With unsupervised clustering, they acquired a binary map with pixels marked as changed and unchanged for further CME tracking.Chen et al. (2022) proposed a method called residual U-net (RU-net) to capture multiscale information of interplanetary CMEs (ICMEs).They detected 178 of 230 ICMEs and achieved an F1 score of 80.18% on the test data collected by the Wind spacecraft from 2010 to 2016.However, the previous studies still need improvement.The LASCO C2 coronagraph data used by the research of Wang et al. (2019b) and Alshehhi & Marpu (2021) cover an FOV ranging from 2 to 6 R s , where the acceleration of CMEs may not be complete (Zhang et al. 2001;Shen et al. 2022).The research can be extended to a broader view, in order to grasp more information about CME motion in interplanetary space, and improved for better sensitivity and accuracy of the parameters obtained.In addition, previous studies that incorporate machine learning with CME detections are able to grasp the transient structure of CMEs, but their tracking algorithms tend to track CMEs in distributed PAs.In this way, reflected streamers, detected but disjointed CME parts, and small deviations from previous steps may cause the overestimation or underestimation of the CME region, leading to disturbed results for the kinematic parameters.Our work is aimed at straightforwardly tracking the whole CME region from the morphological point of view, so as to derive kinematic parameters that are close to the results from visual recognition.Encouraged by the impressive success of machine learning in tasks such as image classification and object recognition, we develop a new automatic algorithm to detect CME structures and derive kinematic parameters based on machine learning.
The main body of the paper is organized as follows.In Section 2, we describe the data set used in our work.In Section 3, we demonstrate the detailed information of the algorithms used in the preprocessing, recognition, tracking, and fitting.In Section 4, we present the results of our algorithm for several representative CME events and compare them with those of the manual catalog (the CDAW CME CATALOG) and classical automatic techniques (including CACTus, SEEDS, and CORIMP).In Section 5, we give a summary of our method and a discussion of our results.An outlook on the application of our method is also given.

Data
Since the launch of the SOHO spacecraft, the LASCO coronagraph on board has provided countless observations.The training data and labels in this work are prepared based on the CDAW CME CATALOG.
We manage to acquire the CME event records (including when a CME appears, ends, and its remarks) from the CDAW CME catalog websites 9 and download daily running-difference images of the LASCO C2 and C3 coronagraphs from 2013 to 2018.The time range covers half the period of solar cycle 24, which might provide representative and balanced CME and No-CME samples from the solar maximum to the solar minimum with relatively adequate complexity and variation.Moreover, the rest period of solar cycle 24 is reserved for validation.We choose the NASA/CDAW website as the data source because it provides us with access to CME remarks, classification labels, and image data.The provided data satisfy the need to train a CNN model for the classification of images and grasp CME features in further tracking.Besides, it might be more reliable to compare the algorithm developed on the CDAW-derived data products with the CDAW CME catalog.According to the record of each CME event, an iteration is performed over each image.If the exposure time of an image is between the appearing time and ending time of an entry from the records, the image is labeled with the corresponding remark for that entry.On the contrary, if the exposure time of the image does not match any of the records, the image is labeled as "No-CME."To suppress the noise in the original downloaded images, a Gaussian blur (Gonzalez & Woods 2018) with a 5 × 5 box is applied to each image.A Gaussian blur is a low-pass filter that uses a Gaussian function to calculate the weight of each element in the kernel, and the image is the sum of the pixels multiplied by the weight.All images are carefully checked and images with distortions or abnormalities will be removed from the data set to ascertain the quality of the images.The downloaded images of the C2 and C3 coronagraph are all 512 × 512 red-green-blue (RGB) image files with pixel values ranging over [0,255].The RGB channels of the images are identical, and they can be easily converted to gray-scale images with pixel values from 0 to 1.After the above procedures, we obtain 71,532 LASCO coronagraph images from 2013 to 2018, with 57,751 of these images being from the LASCO C2 coronagraph and 13,781 of these images being from the LASCO C3 coronagraph.The number of C3 images is less because the training of the model for C3 image classification is conducted by fine-tuning the model for C2.

Method
Our method consists of three steps.(i) First, we train a CNN to classify the LASCO C2 and C3 images into images that contain CME(s) and images that do not.After the images have been correctly classified, we extract convolutional feature maps from the last convolutional layer of the neural network and apply the PCA algorithm to the feature maps to obtain the information of the same object.Then, we use Otsu's method and morphological operations to get an exact CME pixel labeling.(ii) Scanning each frame of the image sequence, we use a track-match method to track the propagation of the CME leaving the Sun in the FOV of the coronagraph.(iii) We derive the kinematic parameters of the CME (including the velocity, CPA, and AW) from the track obtained in step (ii).
The following subsections describe the recognition, tracking, and determination processes in detail.

Classification
In this subsection, our goal is to classify the CME images and detect the CME region in each image.Image classification is the process of categorizing and labeling groups of images, where machine learning, especially a CNN, plays the most important role.The unique structure of a CNN is well suited for such high-level abstract tasks related to digital images and shows excellent performance.
The CNN model we use is a modification of the first standard CNN model, LeNet-5, which was built for document recognition by Lecun et al. (1998).The whole data set is split into training and test data in a ratio of 7:3.The 512 × 512 resolution images are rescaled to 224 × 224 resolution images as inputs to our CNN model.Figure 1 illustrates the architecture of the CNN model we used, which is a combination of different types of layers.The basic layers of the CNN are convolutional layers, pooling layers, and fully connected layers.
The convolutional layer can act as a feature extractor for grid data, such as images.The convolution kernel of a convolutional layer is a matrix with a given shape and learnable parameters.Each unit of the output is connected to only a small area of the input through the kernel.The output and a small area of the input of a convolutional layer are called the feature map and receptive field.The kernel slides from left to right and top to bottom on the input image, multiplying and summing the pixels of the input during the convolution.The process of convolution can be expressed as where y i,j is the output of the convolution at position i, j, w u,v denotes the value of the convolutional kernel at position u, v, and b denotes the bias.Let l be the size of the kernel.u, v are integers, satisfying ing the subscript of the kernel.The shape of the convolutional kernel is 5 × 5. f is the nonlinear activation function to improve the fitting of the network.The most commonly used activation function, named the rectified linear unit (RELU), is applied in our CNN, which allows for efficient computation.The RELU function can be expressed as The pooling layer, also known as the downsampling layer, is designed to reduce the number of parameters and increase the robustness, while preserving the most important information.Similar to the convolutional layer, the kernel of the pooling layer slides over its input and computes the maximum value of the corresponding pixels.The shape of the max-pooling kernel is 2 × 2.
The convolutional layer and the pooling layer work together as a feature extractor module, extracting and combining abstract features from the images.The fully connected layer collects all the abstract features from the previous layers and aggregates them into the output value of the prediction.We modify the fully connected layer of the original LeNet-5 by changing the number of nodes to two, for binary classification of the images.The output of each node represents a label: "CME" or "No-CME."Mitchell (1997) provides a definition of machine learning: given a task T and performance measure P, if the performance of a computer program on experiences E improves, then the program is "learning" from the experience.The performance measure P in our work is set to the cross-entropy loss that is used to measure the "distance" between the predicated labels and the ground truth, which is defined as Here, m is the number of input data, y i denotes the true target label, and y i ˆrepresents the output probability of the last layer.The loss of the output and true labels is computed, and the gradient of each parameter is calculated iteratively backward from the last layer by the chain rule.We choose stochastic gradient descent (SGD) with momentum and weight decay to update the parameters.The procedures for updating the parameters are listed below: where θ t , λ, μ, and α represent the parameters, weight decay, momentum, and learning rate, respectively.The addition of weight decay and momentum to the original SGD optimizer has a dramatic effect on the optimization (Sutskever et al. 2013).

Detection
After the classification step, we are able to further detect the CME region in each of the images marked as CME occurrences by the CNN.The convolutional layer acts as a universal feature extractor, and the output feature map contains the high-level semantic information extracted from the input.To utilize most information beneath the convolutional layers, we adopt the PCA (Pearson 1901) to leverage the correlation between the feature vectors to localize the CME region (Wei et al. 2019).
By feeding N images into the neural network, the output of the last convolutional layer forms a matrix with shape h × w × d, where h, w, and d stand for the height, width, and depth of the matrix, respectively.This matrix can be regarded as having h × w cells, with each cell containing a ddimensional vector, which is denoted as where n denotes the nth feature map in the sequence and x is the vector at position (i, j).The mean of all vectors is calculated by Then, the covariance matrix is computed: The covariance matrix contains the correlation information between vectors.We can get the eigenvector of the covariance matrix Cov(x) ξ 1 , ξ 2 ,K,ξ n and the corresponding eigenvalue λ 1 , λ 1 ,K,λ n , in a descending order.The first eigenvector with the largest eigenvalue indicates the direction to which the matrix can be projected while preserving most information.We take the eigenvector ξ 1 as the main projection direction and apply a linear transformation to each vector: All p from the same feature map formulate a matrix P: The matrix P contains both positive values and negative values.At a position (i, j), the sign of the values indicates a positive or negative correlation between all vectors, and the magnitude of the value indicates the extent of this correlation, which reflects the CME occurrence.To localize the CME, the matrix is resized to the same size of the input image by nearest interpolation.Then the colocalization map indicating the CME region is obtained, as shown in the middle panels of Figure 3.
The colocalization map gives an estimate of the tendency that the pixels belong to CME regions.The larger the absolute value is, the more likely that the pixel is part of a CME.However, the pixel values representing the tendency cannot be directly used for CME tracking and parameter determination.To convert the colocalization map ranging over [0, 255] to a binary mask, we utilize Otsu's method (Otsu 1979).This method is designed for binary clustering by minimizing the intraclass intensity variance or maximizing the interclass variance.
Assuming that all pixels in a gray-scale image are divided into two classes, the intraclass variance is defined as where p 1 and p 2 denote the probability of a pixel belonging to each of the classes, respectively.σ 1 and σ 2 are the variance of the two classes.The probability p 1 and p 2 is given by where k is a threshold for binary classes.The mean value of the image and these two classes are computed by From the above equation, we have So the interclass variance is Otsu's method finds the optimal k that maximizes Equation (17): By Otsu's method, the colocalization map is divided into the foreground and background, and each of the pixels is labeled as a CME or No-CME pixel.The right panel of Figure 3 shows an example of a binary-labeled colocalization map after Otsu's method.The method segments the colocalization map to indicate the CME and No-CME regions.
Morphological operations constitute an image-processing technique to remove small noise from and connect disjunct regions in the binary image, which is used for postprocessing or the enhancing of the output map.They are based on the morphological shapes of images, requiring a kernel or structuring element to slide on the input binary image.The output of a specific pixel is decided when comparing its adjacent pixels to the corresponding structuring element.More details can be found in the work by Shih & Mitchell (1989).Many previous works have also used such a technique (Qu et al. 2004;Olmedo et al. 2008;Alshehhi & Marpu 2021).We apply the morphological operations after Otsuʼs method.Two morphological operators, morphological opening and morphological closing, are used in our work.The opening of A by the structuring element S, denoted by A•S, and the closing of A by the structuring element S, denoted by A • S, are defined as where A ! S and A ⊕ S represent two basic morphological operators, morphological erosion and morphological dilation.

Tracking
To perform CME tracking, researchers used to apply various transformations to capture abstract CME features.However, we find another way to perform straightforward tracking based on morphological characteristics.Object tracking plays an important role in computer vision.The goal of object tracking is to detect objects (Bewley et al. 2016;Wojke et al. 2017)-e.g., pedestrians, vehicles, sports players, and even animals-then track their movements in space from different camera views in video (Luo et al. 2021).Inspired by object tracking, we use a tracking method called track-match.
Section 3.1 provides a CME-marked region in each frame of the observation sequence.An instinctive approach to tracking CMEs is to treat CME regions as independent regions in each frame, but to associate each region across different frames.Thus, we can track a CME from its expulsion at the corona into the heliosphere.The term "track" in our method denotes CME regions across frames that are related to each other.The trackmatch method will scan each frame in an image sequence, establish tracks and compare different CME regions based on the morphological characteristics, and finally filter the tracks.
Before tracking, all images are transformed from [x, y] Cartesian coordinates to [θ, R] polar coordinates, where θ is the polar angle measured from solar north counterclockwise, and R is the radial distance from the occulter.Then the images are rescaled from 512 × 512 to 360 × 360.
The following subsections describe the core procedures of our tracking method in detail.

Track Establishment
Our method maintains a set of CME tracks and updates the tracks while scanning the image sequence frame by frame.Each track holds a sequence of CME regions, recording their position and contour.If there is no existing track that matches the CME region in a new upcoming frame, a new track is created and the CME region is added to it.We assign a state to each track, missed or confirmed.For each track t, we count the number of frames since the last successful match.This number is incremented if the track cannot match any new CME region and is set to zero after a successful match.If the counter exceeds a predefined variable, max age A max , the track is considered missed from the FOV and marked "missed."

Data Association
The method tracks CME regions across frames and assigns a unique ID to related CME regions, representing that these regions are different observations of the same CME.But the problem is how to determine that two different CME regions are "related."This problem is solved by data association, which involves associating a newly detected CME region with an existing CME track.For each upcoming frame, all CME regions are extracted, and a similarity is measured between pairs of detections and tracks.At the nth frame, we assume that the state of k tracks is currently confirmed and m CME regions are detected (following the convention in computer vision, we refer to the detected CME regions as detections).The similarity measure between the ith track and jth detection is given by where t i is the ith track, d j is the jth CME detection from the nth frame, and f is the function to measure similarity.
The similarity function is the key component in this procedure.We assume that the CMEs change gradually in the coronagraph observations, due to the relatively short cadence of 12 minutes of the LASCO coronagraph in usual cases.Thus, the morphological characteristics of the CMEs also change gradually.The similarity between CME regions can be measured from three aspects: position, area, and shape, which can be expressed as where f p , f a , and f s represent the position similarity function, area similarity function, and shape similarity function, respectively.α, β, and γ represent the coefficient terms for position similarity, area similarity, and shape similarity, respectively.
Position similarity.The detected position of the same CME from different frames should be close.We measure the position similarity of CME regions by the distance between their geometric centers.The geometric center of a CME region is given by where x y , ¯¯are the x-coordinate and y-coordinate of the geometric center and p(x i , y i ) is the pixel value at point (x i , y i ).We use the following equation to measure the distance between two points: where x y , i i ¯¯represent the x-coordinate and y-coordinate of the geomagnetic center of two CME regions.
Area similarity.The area of the CME region inside the coronagraph FOV should be changing gradually.Assume s 1 , s 2 are the area of two CME regions and the area similarity is measured by Shape similarity.The image moment is a weighted average of the pixel values of the image, and it can be chosen for pattern recognition (Chaumette 2004).For a scalar twodimensional image, the central moment is calculated by x x y y p x y , , 25 where μ ij is the raw moment of i + j order and p(x, y) represents the pixel value of the image.Moments reflect some inherent properties of the image that allow their application in image processing.Moment invariants are functions of moments that are invariant to transformations of the image, i.e., scale rotation and reflection.Hu (1962)

] ( ) h h h h h h h h h h h h h
These moment invariants are used to measure the shape similarity between the CME regions.The shape similarity is defined as where hu i and hu i ¢ are the ith hu moment invariants of two CME regions.
Taking into account the position, area, and shape characteristics, the similarity between two CME regions is measured.The next task is to assign each detection to an appropriate track, which is solved optimally using the Hungarian algorithm.The Hungarian algorithm (Kuhn 1955) is a method for solving the assignment problem or the bipartite graph matching problem.In our work, given k tracks, m detections, and their pairwise similarity, the goal is to find the best track-detection matching with the lowest sum of similarity value.
The Hungarian algorithm is then applied to solve this assignment problem based on the similarity matrix, outputting a set of the best matches.For unmatched tracks, the variable age of the track is increased by 1.For unmatched detections, the new tracks are created with the detections added to them.We set a maximum similarity limitation t sim .If the similarity of any track-detection match exceeds the threshold, the track and detection of that pair will be marked as an unmatched event if that pair is successfully matched.
After all the images have been processed and the tracks are obtained, we adopt some rules to identify an authentic CME track, following Olmedo et al. (2008) and Wang et al. (2019a).First, the last CME region in the track must reach the fringe of the FOV of the LASCO C2 coronagraph.Second, the height of at least two regions must increase.Then the CME tracks are filtered out.The process of our track-match method is illustrated in Algorithm 1 and Figure 10 in Appendix A.

Determination
After the classification and detection of CMEs, the CME regions from different observation frames are tracked and we can determine the key parameters of the CMEs.The key parameters we focused on are the CPA, AW, and linear velocity.Figure 4 gives an example of how to determine the key parameters.The CME region mask is in the polar coordinate system and the coordinates of the bounding box from each region of the track can be obtained.The height of a CME is the distance measured from the bottom of the image to the top of the bounding box.The start and end PAs are the left and right coordinates of the bounding box.The average and difference of the start PA and end PA make the CPA and AW, respectively.For each CME region mask, the above three parameters are retrieved.A CME track consists of several CME region masks, producing arrays of height, AWs, and CPAs, as functions of observation time.To derive the linear velocity, we apply a linear fit to the height-time series.To derive representative CPAs and AWs, we utilize the affinity propagation clustering method (Frey & Dueck 2007;Pedregosa et al. 2011).The affinity propagation method does not require a predefined number of clustering groups, and the existing data points are denoted as the final clustering center instead of creating a new one.A brief introduction to the algorithm can be found in Appendix B. Then we can choose representative values as estimates of AW and CPA for the track.

Results
To evaluate the performance of our method on real observations, we select several representative events with different velocity and AW from 2010 to 2012 and analyze them in ascending order of AW.The AW of the selected CMEs ranges from 78°to 360°and their velocity falls into an area between 288 and 1205 km s −1 .We compare the results of our proposed method with other classical CME automatic tracking catalogs, namely CACTus, CORIMP, and SEEDS.
In the following subsections, we present the detection results of these four methods in figures and data tables.For CACTus, CORIMP, and SEEDS, the figures and data are adopted from their websites.

Event on 2012 February 14
The CME event on 2012 February 14 launched from the west of the coronagraph FOV. Figure 5 shows the detection maps of CACTus, CORIMP, SEEDS, and our proposed method, from top to bottom, respectively, on 2012 February 14.In the top three rows, the detected CME regions are shown with different colors and symbols.In the detection map of CACTus, the detected CME region is limited by the white straight line.In the detection map of CORIMP, the red points denote the track of the strongest outermost front and the yellow points denote the overall detected structure.In the detection map of SEEDS, the blue points indicate the position of the leading edge and the red points indicate an approximate outline to the leading edge that was created using a segmentation technique.In the last row, the blue color in the maps of our proposed method indicates the irrelevant background, while the warmer color indicates that the pixel is more likely to be part of a CME.In the northwest quadrant, there are brighter lines and spots presented by our detection results.These are traces of small and weak transient eruptions that can be detected only by our method, which shows its ability to detect minor and weaker signals.
Table 1 presents the key parameters of the CME event on 2012 February 14 from automatic methods and the manual catalog.The records from CDAW (see footnote 4), CACTus (see footnote 5), CORIMP (see footnote 8), and SEEDS (see footnote 6) are adopted from their websites.CDAW and CACTus regarded the first appearance time of the event as 02:00.The start time of this event detected by most automatic methods including ours is a bit later than the manual catalog, because the CME only emerged into the FOV with a very small scale close to the image noise.SEEDS, CORIMP, and our proposed method all detected the CME at 02:12.The values of the CPA are in a close range, from 275°to 286°.But the AWs span a larger range, perhaps due to the diffusing morphology of the event.The proposed method and SEEDS give the same AW of 46°.The AW outputs from CACTus and CORIMP are 36°a nd 29°, which are lower than the other methods.Concerning the velocity, our proposed method gives an approximate value of 304 km s −1 for this event, compared with the value of 288 km s −1 from the CDAW catalog.

Event on 2012 January 15
The CME event on 2012 January 15 is a partial halo CME.We select and display several frames from the detection results of CACTus, CORIMP, SEEDS, and the proposed method in Figure 6, from top to bottom, respectively.The main body of the CME is detected by all the methods, and the proposed method finds weaker or smaller CME characteristics besides the main body, while the other methods fail to detect them.We can see that the recognition of our proposed method successfully separates the CME region from the background.Table 2 lists the derived key parameters of the CME event on 2012 January 15.All of these four automatic techniques detect a CME appearance at 05:48, which is consistent with the result of the CDAW manual recognition.As for the CPA, the results of all the automatic techniques accurately fall in the range of 295°-301°.But for the AW, all the automatic methods exhibit a considerably lower value than the manual catalog.Regarding the extent of the velocity, all the automatic methods derive velocities that are less than the value recorded by CDAW of km s −1 .The velocity from CORIMP is the lowest, at 250 km s −1 , followed by 389 km s −1 from SEEDS and 418 km s −1 from CACTus.However, the measurement of the velocity from our method for this event is the almost the same as the manual catalog, with a deviation of 1 km s −1 .The reason why CACTus outputs the highest value among these three automatic catalogs may be due to its Hough transform process.This helps to capture weak signals at the leading edge, which leads to a relatively higher speed of CACTus in many cases.The SEEDS method calculates the velocity at the half-max-lead of the pixel intensity distribution, and thus it tends to output a smaller value compared with CACTus.The CORIMP method uses multiscale filtering and wavelet transform to produce a series of regions of interest and build up the CME detection mask.The velocity is derived from the height-time measurements for the detection masks.

Event on 2011 March 8
The examined CME event is from 2011 March 8.The detection results from CDAW, CACTus, CORIMP, and our proposed method are shown in Figure 7.The jets in the southeast of the coronagraph FOV are also detected by the method, marked in light blue.These kinds of structures can be automatically excluded from the later detection and tracking procedures of our method.Table 3 shows the derived parameters from CDAW, CACTus, CORIMP, SEEDS, and the proposed method.Among these methods, the proposed method and CORIMP are the earliest to detect the CME, at 04:00, while the other methods identify the CME 12 minutes later.CDAW outputs a CPA of 91°, while our proposed method yields mostly the same value of 91°.5, with a drift of just 0°. 5.The CPAs from CACTus and SEEDS are much smaller, at 70°and 72°.For the AW, the proposed method, CACTus, CORIMP, and SEEDS output 141°, 122°, 78°, and 114°, respectively.All the automatic methods underestimate the AW again when compared with the manual catalog.The velocity from all methods ranges from 663 to 842 km s −1 , closely around the 732 km s −1 recorded in the manual catalog.

Event on 2010 August 14
A halo CME involves material surrounding the occulting disk, indicating that the coronal material may be propagating approximately toward the observer (Howard et al. 1985), which can cause the most severe space weather effects.Figure 8 demonstrates the recognition results for a halo event on 2010 August 14.However, the automatic techniques fail to recognize the full structure of this halo CME.The reason may be that there are no obvious CME features on the solar east limb of the C2 observation.
Table 4 presents the key parameters of the CME event on 2010 August 14 from CDAW, CACTus, CORIMP, SEEDS, and our proposed method.CORIMP identifies the first appearance time of the CME event as 10:00, while the other methods identify it as 10:12.The output CPAs from all these automatic methods are very close, as the largest difference is 11°.However, none of the automatic methods successfully indicate that this event is a halo CME.After examining the observations from SECCHI COR2-A at the following moments, we find that this halo CME is not propagating directly toward the Earth.This may be the reason why the CME material emerged on the northwest side of the occulting disk later on, at 11:36.The operator of the CDAW CME catalog may have considered other information and marked this event as a halo.The velocities from CACTus, CORIMP, and SEEDs fall in a range from 559 to 657 km s −1 , while our proposed method outputs a closer value of 1001 km s −1 .
During the propagation of CMEs in the corona and interplanetary space, deflection significantly influences their geoeffectiveness (Shen et al. 2012;Zhuang et al. 2017).Wang et al. (2014) revealed that CMEs are deflected not only in the corona, but also in interplanetary space.With the track-match method, we are able to follow the CME propagation in the FOV of the coronagraph from its eruption.For some CMEs that exhibit significant deflection in latitudinal directions, it is reasonable that we can observe the CPA of the CME changing as the distance from the Sun increases.Figure 9 shows the CPA variation as a function of height, during the propagation of the CME event on 2010 August 14.We can clearly see that the initial CPA of the CME is at 254°and soon increases below 10 R s .Finally, the CME propagates radially without any visible deflection at a CPA of 275°.5.This event demonstrates another important effect of our method: with the CME region labeled and correlated across frames, the deflection can be tracked as well.

Conclusions
In this paper, we have proposed a new automatic tool to determine CME kinematic parameters based on machine learning.First, we trained a CNN to classify LASCO coronagraph observations into CME or No-CME labeled images.Then, we utilized a feature derived from the CNN to obtain pixel-labeled CME masks using Otsu's method and morphological operators.The second step is to track the CME region.We proposed the track-match method to associate CME regions across frames and formulate the track of a CME from its expulsion from the Sun.The key parameters, i.e., the CPA, AW, and velocity of the CME track, are finally determined.
We select several representative CME events with different AW and velocity, then compare the results of our method with classical methods, namely CACTus, CORIMP, and SEEDS.The proposed method shows excellent performance in the   4.
recognition of CME structures and the determination of kinematic parameters.Thanks to the track-match algorithm, based on morphological characteristics, we can track complete CME regions in the coronagraph FOV.Compared with other automatic techniques, the proposed algorithm can identify a CME as early as possible and derive parameters that are closer to the manual catalog in most cases.
Studying the discrepancies between the proposed machinelearning method, the classical automatic methods, and the manual catalog, we can see that the derived AWs of all the automatic methods are relatively smaller than that of the manual catalog.One possible reason for the relatively smaller AW may be the algorithm we adopt.The track-match algorithm tracks the whole region of the CME and filters the qualified CME track for further determination of the kinematic parameters.During the evaluation of our method, we find that CMEs with unclear boundaries are difficult to distinguish from the background, leading to the mistracking of the CME, especially for those CMEs that span large widths.Moreover, Balmaceda et al. (2018) found that the AW is more susceptible to projection effects than other parameters.Burkepile et al. (2004) demonstrated that the CME width is overestimated in the catalog.A relatively smaller AW could be close to the true value.
Considering the transient natures and complicated situations encountered in observations, CME detection and tracking is a tough task.In the future, more efforts will be made to improve our recognition and tracking methods.First, single-point observations have limited the relative research on CME properties.The CME parameters derived from the single-point view suffer from projection effects to some degree, requiring further work on multiview CME detections.As the CME region is acquired, our proposed algorithm is not only limited to the determination of parameters, but it also has shown its potential in the study of CME deflection.In addition, the detected CME region can be used for three-dimensional reconstruction to obtain more comprehensive three-dimensional parameters.The work of incorporating machine-learning detection in real-time MHD simulations is also planned.We hope that the great potential of this algorithm can be helpful for CME warnings and real-time space weather forecasting.

Figure 1 .
Figure 1.Architecture of the CNN model used for image classification.
When training our CNN model, we divide the whole training data set into many batches, with each batch having 256 samples.The training samples are fed into the model one batch after another, and the model parameters are iterated once with each batch of data.An epoch is defined as a training process that completed the model iterations with all the batches of data.After training on the data set, we can evaluate the performance on the C2 and C3 test data sets.The variation of the classification accuracy on the test data set versus the number of training epochs is plotted in Figure 2. Our model achieved accuracy of 98.81% and 99.57% on the C2 and C3 test data sets, respectively.The few false classifications may have been caused by the transient signatures in some images that are hard to categorize to either the CME or No-CME type.

Figure 2 .
Figure 2. The variation of the classification accuracy as a function of the number of epochs on the C2 and C3 test data sets, respectively.

Figure 3 .
Figure 3.The result of the colocalization map and Otsuʼs method.Panels (a) and (d) show the original LASCO C2 and C3 coronagraph image.Panels (b) and (e) show the colocalization map overlying the original images of LASCO C2 and LASCO C3.A warmer color in the colocalization map reflects a higher positive correlation and vice versa.Panels (c) and (f) show the binary tagged colocalization map after Otsu's method of the LASCO C2 and C3 images.The yellow pixels indicate the CME region.The example C2 and C3 images are taken from the observation of the CME event on 2011 March 29.

Figure 4 .
Figure 4.An illustration of the parameter determination.The red part is the CME region and the black part toward the bottom of the image is the solar disk after polar transform.The AW is the difference of the start and end PA.The height of the CME is measured from the bottom of the image to the top of the CME region.

Figure 5 .
Figure 5. Detection results for the CME event on 2012 February 14.

Figure 6 .
Figure 6.Detection results for the CME event on 2012 January 15.

Figure 7 .
Figure 7. Detection results for the CME event on 2011 March 8.

Figure 8 .
Figure 8.Detection results for the CME event on 2010 August 14.

Figure 9 .
Figure 9.The CPA variation at different distances from the Sun during the propagation of the CME event on 2010 August 14.The value of the red stars in the figure is 268°.5, which might be the reason why it is denoted as the representative value by the affinity propagation method in Table4.

Table 1
Key Parameters of the Event on 2012 February 14: Comparison between Proposed Method and Other Classical Methods All the parameters of the CDAW, CACTus, CORIMP, and SEEDS methods in Tables 1-4 are extracted from their websites.The column "Start Time" lists the first appearance time of a CME identified by the method in the coronagraph FOV; "CPA" is the central position angle in degrees, measured counterclockwise from the north of the Sun; and "AW" is the angular width of the CME, in degrees.The values of CPA and AW are the representative values derived from multiple observations at different times by the affinity propagation clustering method.The velocity in the fifth column represents the CME velocity measured in kilometers per second.The similarity of all track and detection pairs at the nth frame forms a similarity matrix or cost matrix:

Table 2
Key Parameters of the Event on 2012 January 15: Comparison between Proposed Method and Other Classical Methods

Table 3
Key Parameters of the Event on 2011 March 8: Comparison between Proposed Method and Other Classical Methods

Table 4
Key Parameters of the Event on 2010 August 14: Comparison between Proposed Method and Other Classical Methods Algorithm 1.The Track-match Method.Output: the track of the CME region k * . 1 Convert all the CME colocalization map to an image in the polar coordinate Apply the Hungarian algorithm on S and output a best match between the tracks and detections M, the unmatched tracks UT , and the unmatched detections UD. 8For each t i in UT , do: 9The variable age of t i increase by 1. Filter the set T to get the appropriate CME tracks k * .