Unsupervised Machine Learning for the Identification of Preflare Spectroscopic Signatures

Magnus M. Woods; Alberto Sainz Dalda; Bart De Pontieu

doi:10.3847/1538-4357/ac2667

1. Introduction

Solar flares have wide-ranging effects, both in the solar atmosphere and further afield in the solar system. Understanding the processes that drive flares, such as tether-cutting reconnection (Moore & Labonte 1980; Moore et al. 2001) or torus instability (Kliem & Török 2006), etc., is therefore of vital importance to understanding these phenomena.

Observational studies of preflare signatures have been conducted by focusing on many regions of the solar atmosphere, from coronal X-ray emission (Chifor et al. 2007) to studies of magnetic flux emergence in the photosphere (Williams et al. 2005; Sterling et al. 2007). Works such as those by Bamba et al. (2013), Bamba & Kusano (2018), and Woods et al. (2017, 2018) have found small-scale transient features, often seen as brightenings in multiple wavelengths, are associated with preflare signals and possibly linked to the processes that trigger flares. These features have been observed minutes to tens of minutes prior to the start of the flare events.

The field of machine learning is providing a growing list of opportunities for application to various problems in solar physics. Such applications have included image classification (Armstrong & Fletcher 2019), postprocessing of ground-based observations with poor seeing (Armstrong & Fletcher 2021), and perhaps, most popularly, solar flare prediction (Bobra & Couvidat 2015, etc.). The majority of these works make use of imaging data; however, other authors have made use of machine-learning techniques on spectroscopic and spectropolarimetric data such as, for example, Carroll & Staude (2001), Teng & Deng (2016), Panos et al. (2018), Sainz Dalda et al. (2019), Asensio Ramos & Díaz Baso (2019), and Panos & Kleint (2020), among others.

In this paper we present a study utilizing k-means clustering to identify spectroscopic signatures in the preflare environment using Interface Region Imaging Spectrometer (IRIS; De Pontieu et al. 2014) data, in an effort to identify the physical mechanisms that govern the triggering of flares and coronal mass ejections. Therefore, this paper does not intend to predict flares from a qualitative analysis of clustering of flare spectral profiles as previous works have already done (e.g., Panos et al. 2018; Panos & Kleint 2020). Instead, we focus here on the nature of the spectral profiles that are associated with preflare triggering events and use inversions to shed light on the physical conditions of these events. We find that spectra showing single-peak Mg ii h and k and single-peaked emission in the Mg ii UV triplet lines up to 40 minutes prior to flaring are associated with the preflare activity. We also find, through the inversion of profiles of this type, that they show temperature and density enhancements in the chromosphere, which is indicative of strong heating events. In Section 2 we detail the k-means algorithm and its implementation in this work. We also discuss the data we used in this work. Section 3 details the outcome of our k-means study, while in Section 4 we discuss in more detail the interpretation of these findings. Our conclusions are presented in Section 5.

2. Data and Methods

2.1. Data

This work primarily utilizes spectroscopic observations made by IRIS, a NASA Small Explorer Mission that was launched in 2013 June as a single instrument mission designed to investigate the dynamics of the chromosphere and transition region with high spatial, temporal, and spectral resolution.

The IRIS spectrograph observes two wavelength windows, one in the far-ultraviolet (split into FUV 1; 1332–1358 Å and FUV 2; 1389–1407 Å) and one in the near-ultraviolet (NUV; 2783–2835 Å). In the FUV, the IRIS spectrograph can achieve a spectral resolution of 26 mÅ, along with a spatial sampling of 0 farcs 33, while the NUV window spectral sampling is 53 mÅ with a spatial sampling of 0 farcs 4. To achieve larger fields of view, IRIS, like many solar spectrographs, makes use of a technique called rastering, i.e., an image having spectral information in one direction and spatial information along the slit (I_λ, Y) in the orthogonal direction on the detector. These individual observations are then combined to make a 3D data cube (Y(t), X(t), I_λ(t)), with t indicating the frame taken at time t. A raster is, therefore, a set of spectral images taken at any step the slit moves in the direction perpendicular to its largest axis (X(t)), scanning the Sun with the slit (Y) in the X direction. If the slit remains in the same position in the X direction at all times during the acquisition, the set of spectral images is referred to as a sit-and-stare data set. The downside of raster imaging is that by increasing the field of view in the x direction, the temporal coherence of the image is reduced. Utilizing rastering, the spectrograph can cover a maximum field of view of 130'' × 175''. IRIS also obtains slit-jaw images (SJIs) to provide contextual information in a 175'' × 175'' field of view around the slit in four passbands (C ii doublet centered at 1335 Å, Si iv 1394/1403 Å, Mg ii k 2796 Å, and Mg ii wing emission around 2830 Å, respectively).

In this work we also make use of the photospheric line-of-sight magnetograms created by the Helioseismic and Magnetic Imager (HMI; Scherrer et al. 2012), an instrument on board NASA's Solar Dynamics Observatory (SDO; Pesnell et al.2012).

2.1.1. Data Selection

IRIS can utilize a number of different observing programs, resulting in a wide variety of very different data sets. For example, some data sets may be high-cadence sit-and-stare programs where there is a single slit position, sacrificing a larger field of view for higher temporal resolution, while some may opt for larger raster images, prioritizing field of view over temporal resolution.

This wide variety of data results in the need to set certain criteria to allow a more standardized data set for our analysis. To this end only flare observations with up to eight raster steps were selected. Choosing this allows us to make sure that we can strike a balance between temporal resolution and having a larger field of view over the active regions that make up the preflare data sets, both of which are key variables when studying the preflare environment.

For quiescent active region (AR) and quiet Sun (QS) data sets, we do not put limits on raster size, thus maximizing the variety of spectra the preflare spectra will be compared against during clustering.

In order to test whether unique preflare spectra could be identified using the technique outlined in Section 2.3, our data were selected from three types of IRIS observations: observations containing flares, observations of quiescent active regions, and observations of the quiet Sun. Tables 1 and 2 show information on the data chosen for use in this study. We ensured that for the quiescent active region data set there were no flares within a minimum of 1.5 hr around the times chosen. Eight flare data sets, of which four were X-class and four M-class flares, were selected. These were compared against 32 AR and 6 QS data sets. Within the AR data sets, care was taken to select observations of active regions at different stages of their evolution, as well as AR that showed clear sunspots, in order to maximize the types of spectra for comparison.

Table 1. Flare Data Sets Chosen to be Used in This Study

Data Set	Date	GOES Class	Flare Start Time (UT)	Central Coordinates (x, y, arcseconds)	Raster Size	Length of Preflare Observation (minutes)	Obs ID
Data set 0	2014-03-29	X1.0	17:35	(490, 280)	8-step	60+	3860258481
Data set 1	2014-09-10	X1.6	17:51	(−167, 123)	Sit-and-stare	60+	3860259453
Data set 2	2014-10-22	X1.6	14:02	(−292, −302)	8-step	60+	3860261381
Data set 3	2015-03-11	X2.1	16:11	(−353, −198)	4-step	50	3860107071
Data set 4	2014-02-13	M1.8	01:34	(110, −100)	8-step	60+	3860257280
Data set 5	2014-06-12	M1.0	21:01	(−669, −306)	8-step	60+	3863605329
Data set 6	2015-03-12	M1.6	11:38	(−235, −190)	Sit-and-stare	60+	3860107053
Data set 7	2014-06-11	M3.9	20:53	(−781, −306)	8-Step	60+	3863605329

Download table as: ASCII Typeset image

Table 2. Quiescent Active Region and Quiet Sun Data Sets Used in This Study

Date	Target	Central Coordinates (arcseconds)	Obs Duration (minutes)	Raster Size	Obs ID
2014-01-30	AR	(−653, −182)	18	64-step	3880010095
2014-05-12	QS	(138, −128)	119	8-step	3860261380
2014-06-27	QS	(−311, −291)	284	16-step	3820419489
2014-07-31	QS	(−182, −170)	139	sit-and-stare	3840009553
2014-08-15	AR	(−219, 121)	78	16-step	3820255483
2014-08-15	AR	(−286, 106)	78	16-step	3820255483
2014-10-14	AR	(220, −423)	18	64-step	3893260094
2014-11-01	AR	(731, 120)	18	64-step	3893010094
2014-11-13	AR	(532, 224)	18	64-step	3893010094
2014-11-15	AR	(−799, −276)	38	8-Step	3800259374
2014-12-16	AR	(151, −97)	57	8-step	3863351377
2015-01-12	AR	(−306, −191)	58	4-step	3862106066
2015-05-05	AR	(−188, −196)	228	8-step	3860609380
2016-02-13	AR	(464,307)	72	8-step	3660259533
2016-04-24	AR	(498, 181)	18	64-step	3893010094
2016-07-27	AR	(−662, −335)	87	16-step	3633100037
2016-07-30	QS	(−146, 87)	27	4-step	3610108020
2016-08-05	QS	(464, 298)	252	8-step	3630105426
2016-08-07	AR	(803, 163)	61	400-step	3600108078
2016-09-20	AR	(159, 1)	407	64-step	3630010059
2016-10-17	AR	(198, 26)	17	64-step	3640010059
2017-01-28	AR	(−79, 356)	57	32-step	3630256051
2017-03-23	AR	(−763, 205)	18	64-step	3893010094
2017-04-27	AR	(−124, −87)	18	64-step	3893010094
2017-05-17	AR	(−211, 158)	289	16-step	3620106639
2017-07-06	AR	(−252, −286)	91	32-step	3620256051
2017-09-04	AR	(−417, 62)	18	64-step	3893010094
2018-05-22	AR	(91, 132)	133	192-step	3600106072
2018-05-22	AR	(147, 128)	28	320-step	3600106077
2018-05-22	AR	(162, 131)	66	192-step	3600106072
2018-05-23	AR	(−683, 298)	55	320-step	3620106077
2018-05-23	AR	(377, 132)	66	192-step	3600106072
2018-05-25	AR	(645, 118)	86	320-step	3620110077
2018-05-26	AR	(862, 106)	167	320-step	3620112077
2019-05-14	AR	(337, 137)	118	64-step	3630504755
2019-09-05	AR	(462, 110)	277	64-step	3640010059
2020-01-22	QS	(−741, −325)	55	16-step	3620107443
2020-07-26	AR	(−214, −413)	108	32-step	3620104844

Download table as: ASCII Typeset image

From these data sets, the chromospheric Mg ii k and h lines were chosen to be clustered. These two optically thick lines are found at 2796.34 Å (Mg ii k) and 2803.52 Å (Mg ii h). Between these two lines lies an ultraviolet continuum, within which at approximately 2798.8 Å are two of the Mg ii triplet lines, with the third triplet line being found at approximately 2791.6 Å. In this paper we will only discuss the two triplet lines around 2798.8 Å as the third lies outside of the wavelength band we consider in our analysis.

2.2. k-means Clustering

Machine learning can be broadly divided into two disciplines: supervised and unsupervised. In terms of categorization problems, of which our study is one, supervised machine-learning techniques make use of data in which the outcome is known as a ground truth to train an algorithm to categorize unseen data. For data where there is no known categorization, and hence no ability to produce a training data set, unsupervised machine-learning techniques can be used. One such technique, which we make use of in this paper, is called k-means clustering (MacQueen 1967). k-means clustering is a technique that is used to take a data set of N observations and split the data into K discrete clusters. Each of these K clusters has a centroid μ_j, which is defined by the mean Euclidean distance to the data points in the given cluster. To find the best fit of these centroids to the data, the algorithm aims to minimize a property called inertia, or within-cluster variance:

$\begin{eqnarray}&&\sum _{j=1}^{K}\sum _{i=0}^{N}\mathop{\min }\limits_{{\mu }_{j}\in C}(| | {x}_{i}-{\mu }_{j}| {| }^{2}),\end{eqnarray} \tag{ 1 }$

where C is a cluster and x_i is an individual data point. The first step of the algorithm is to assign the initial location of the centroids. The number of the clusters, and therefore of the resulting centroids, is defined by the user. Several initialization procedures are available to avoid the impact of randomly selecting the initial centroids (as the original k-means clustering technique was implemented). This is important as by simply randomly assigning one set of initial centroids, the algorithm may identify local, rather than global, minima of the inertia. The algorithm then proceeds to define new centroids through the mean of all the data points assigned to each of the original centroids. This process is then repeated until the difference between the old and updated centroids is deemed to be negligible. When this point is reached, the k-means process is finished and the data have been clustered according to the final centroids.

2.3. k-means Pipeline

The aim of this work is to apply k-means clustering to the problem of identifying spectral profiles that occur prior to flaring. In the previous section we have described how the basic k-means algorithm works, and we shall now outline how we apply k-means in this work for our purposes.

the A multistage approach to identifying possible unique preflare spectra is employed. We wish to identify not only whether unique clusters of preflare spectra exist within the chosen data sets, but investigate how these evolve in time if they are present. To be able to determine whether spectra are seen only during the preflare period, we need to also include data from AR and quiet Sun QS observations. For each flare data set, the start time of the flare was identified from the GOES flare catalog. The data were then gathered and preprocessed between the flare start time and a given preflare time period. Preprocessing is an important step in all machine-learning techniques. This can be done for numerous reasons, but chief among them is to standardize the data sets used. The preprocessing carried out is as follows:

1.
each data set used was normalized by exposure time to correct for the difference in the observing program used.
2.
the wavelength range of each spectral line was truncated to be the same for each data set. For the spectral range including the Mg ii lines, the initial and final considered wavelengths are 2794.5–2805 Å.
3.
All spectra had their wavelength dimension interpolated to have 200 data points.

The cumulative effect of this preprocessing is that it allows the direct comparison of all of the spectra, no matter their original data set.

Principal component analysis (PCA; Hotelling 1933) was then run on the preprocessed data set. PCA is a common technique used in many machine-learning settings, which allows the reduction of the dimensionality of data in a data set while still maintaining the key information of each individual data point. The dimensionality of the data set is the number of features or variables that it has. PCA therefore is essentially projecting a data set with n features into one that has a number of features < n. This reduction in dimensionality has two important effects: to mitigate the effect of high-dimensional data during the computation of the Euclidean distance and to reduce the computational requirements required when running the k-means clustering.

After several tests, we also decided to utilize a feature scaler, in this case, the MinMaxScaler included in the Python package scikit-learn (Pedregosa et al. 2011). This scaler sets the data in each feature to be between a set maximum and minimum value. In the case of our data, a feature is the intensity at each wavelength point in the spectra. The use of this MinMaxScaler is advantageous as without it variations in small-scale features, e.g., the Mg ii UV triplet line and nearby wavelength region, can be swamped by variations in the larger-scale Mg ii k & h lines. This therefore improves the accuracy of the clusters, as a more equal weighting between the features is achieved.

The next step in the analysis process is to determine the number of clusters that will be used. For this we used the elbow method to determine the minimum number of clusters that is appropriate for the data. This involves testing numerous numbers of clusters and plotting the numerical value of the inertia for the tested cluster number. The resultant plot should ideally show a steep decrease and a break, or elbow, before reaching a plateau. The point at which this break occurs, and therefore the number of clusters it pertains to, is generally agreed to be the minimum number of clusters that could describe the data.

We then move onto clustering for each time step under consideration. For every given flare data set, the first time step was chosen to be 40 minutes prior to the flare start time. This value was chosen due to previous work (Woods et al. 2017; Panos & Kleint 2020), which has identified preflare spectral signatures up to 40 minutes prior to flare onset. For each time step, the closest raster to that time is selected for clustering. In the case of the nonflaring and quiet Sun data sets, all available data are clustered, rather than the single rasters of the preflare case. This is to maximize the number of spectra to compare to the preflare spectra. The spectra in each data set are then clustered into the desired number of clusters. As a result, we obtain a representative profile (RP), i.e., the centroid, for each cluster found by the k-means algorithm. This is done for every cluster, resulting in the total number of RPs being n_rp = n_clusters × n_{data sets}. We then group these RPs and run k-means clustering on them all together. This second round of clustering allows us to identify groups of RPs with similar shape across all data sets. As with the results of the first round of clustering, we then determine the second-round representative profiles. A cartoon of this process is shown in Figure 1 to help with visualization. This same two-step process is carried out for each time step up to the flare start time.

**Figure 1.** Cartoon overview of the k-means method for one time step that we employ in this work.
Download figure:
Standard image High-resolution image

3. Results

As the k-means process is unsupervised, it is of vital importance to select optimum parameters to allow the data to be clustered as effectively as possible. We utilized elbow plots, as discussed in Section 2.3, to determine the minimum number of clusters to describe our Mg ii data. From the break points in the elbow plots of both the first and second rounds, it was found that 50 clusters would in theory describe the data. Figure 2 shows two example elbow plots. However, while mathematically 50 clusters are found to be enough to describe the data, using this number of clusters would not result in any physically meaningful results. In the case of first-round clustering, this is due to the fact that when using a smaller number of clusters, many of these clusters are taken by spectra that are affected by cosmic-ray hits. As these spectra are very different from both each other and unaffected spectra, they are often assigned to their own cluster containing only that spectrum. This then results in there being fewer remaining clusters for the algorithm to capture spectra that are not affected by cosmic-ray strikes. To the algorithm this is a success as it has clustered the data into distinct clusters. However, this is not sufficient for our top-level goal, which is to identify the subtle signals that represent preflare triggering activity. This is because many distinct spectral shapes are aggregated into a small number of clusters. In an effort to combat this, numerous values for the number of clusters used were tested. It was decided that using 300 clusters would split the data into a number of RPs that would adequately describe each of the individual data sets used in the first round. For the second round of clustering, the elbow plots again show that 50 clusters would in theory describe the data. However, as with the first round of clustering, we decided that by using a smaller number of clusters, we would lose the ability to distinguish groups of spectra with differences that are slight but perhaps meaningful in the context of the preflare environment, due to them being assigned into a small number of clusters. Therefore, we decided to use 300 clusters for the second round as no information would be lost through the agglomeration of groups of clusters. This does increase the need to manually inspect by eye the clusters to check for clusters that show a similar appearance; however, in both cases, 300 RPs seems to be a good tradeoff between representing the data and ease of interpretation and inspection of the RPs by a human being.

**Figure 2.** Panel (a) shows an example of an elbow plot for the first round of clustering for one data set at one time step. In this case, the time step is 40 minutes (2400 s) prior to flare onset. Panel (b) shows an example of an elbow plot for the second round of clustering for the same time step.
Download figure:
Standard image High-resolution image

Of the 300 representative profiles at each time step (during the second round), we divide these clusters into seven possible categories based on the type of region/timing of the data. These categories are as follows: unique preflare, unique AR and unique QS, preflare/AR, preflare/QS and AR/QS, and nonunique clusters. Table 3 explains the definitions of each of these categories of clusters. The distribution of representative profiles in these seven categories is shown in Table 4. From Table 4 we can see that the distribution of these seven categories remains relatively constant through the time steps that we examine. It is perhaps unsurprising that the dominant category is the nonunique cluster, with 64% of the second-round representative profiles contained within it, averaged over all time steps. The average proportion of clusters that are found to occur only in preflare data sets is found to be 2% of the total clusters. These preflare clusters are found to make up an average of 1.6% of the pixels in a given raster, at a given time step. We will now discuss these preflare clusters in more detail.

Table 3. This Table Defines How the Types of Cluster Are Defined by the Distribution of Data Sets within Them

		Type of Data Set
Type of Cluster	Preflare	Active Region	Quiet Sun
Unique Preflare Cluster
Unique AR Cluster
Unique QS Cluster
preflare/AR Cluster
preflare/QS Cluster
AR/QS Cluster
Nonunique Cluster

Note. For example, a unique preflare cluster contains only spectra from preflare data sets, while a nonunique cluster contains spectra from all three types of data sets. In this table, if spectra from one of the three types of data set are contained in a given cluster type, the word "Yes" is shown highlighted in blue. If spectra from a given type of data set are not included, the word "No" highlighted in purple is used.

Download table as: ASCII Typeset image

Table 4. Distribution of Second-round Representative Clusters for Each Time Step from 2400 s Prior to Flaring to Flare Onset Time

	Number of RPs at Time (s) before Onset of Flare Time									Average
	−2400 s	−2100 s	−1800	−1500 s	−1200 s	−900 s	−600 s	−300 s	0 s
Unique preflare Cluster	5	9	5	6	7	6	10	10	14	8
Preflare/AR Cluster	18	21	16	18	15	20	19	18	17	18
Preflare/QS Cluster	17	19	19	22	23	20	20	24	17	20

Unique AR Cluster	1	0	2	5	1	1	1	1	0	1
Unique QS Cluster	27	27	33	30	26	23	18	22	29	26
Nonunique Cluster	193	187	187	185	194	197	191	193	188	191
AR/QS Cluster	39	37	38	34	34	33	41	32	35	36

Download table as: ASCII Typeset image

3.1. Preflare Clusters

The results of our Mg ii clustering, as discussed in the preceding section, revealed the presence of preflare clusters. We find that in the nine time steps that are studied in this work, preflare clusters are identified in all of them. In Figure 3 we show examples of the eight broad categories of preflare clusters identified. We can see that these eight broad categories are profiles that exhibit single-peaked Mg ii k and h lines, with single-peaked emission in the Mg ii UV triplet lines; double-peaked k and h lines, with emission in the Mg ii triplet line; single-peaked Mg ii k and h lines; double-peaked Mg ii k and h lines; broad-shouldered Mg ii k and h; broad Mg ii k and h; cosmic-ray hits; and the final group is of clusters that are irregular profiles with broad wings. From Table 5 we see that the majority of these spectral exhibit single-peaked Mg ii k and h lines, with single-peaked emission in the Mg ii triplet line. Representative profiles of this type make up a median of 66% of the preflare clusters at each time step. When we consider the number of individual spectra that compose each of these representative profiles, the median percentage occurrence rises to 76%. These profiles that show single-peaked Mg ii k I and h lines, with single-peaked emission in the Mg ii triplet line, are the most abundant and consistent of the preflare clusters that we have identified. It is important therefore to understand how these profiles are distributed among the preflare data sets. To this end, Table 6 shows the number of this type of spectrum in each preflare data set at each time step. The distribution of all the types of preflare clusters across the data sets can be found in the additional tables in Appendix A. These are provided both in terms of the number of spectra and the proportion of the raster field of view. What we can see from Table 6 is that the spread of these spectra across the flares studied is not even. We find that these spectra are less likely to be found in X-class flare data sets (data sets 0–3), with only one X-class flare (data set 3) and all the M-class flares (data set 4–7) consistently showing these spectra to occur.

**Figure 3.** Examples of the eight categories of preflare representative profile. As before, the representative profile is shown in orange, with black corresponding to the individual profiles that contribute to it. The intensity scale of these spectra is presented in data number per second (DN s⁻¹) ,which is the number of counts received by the detector per second.
Download figure:
Standard image High-resolution image

Table 5. Distribution of Preflare Cluster Type through the Nine Time Steps Studied

	−2400 s	−2100 s	−1800	−1500 s	−1200 s	−900 s	−600 s	−300 s	0 s
Single-peak k and h + triplet	5 (285)	5 (298)	7(382)	4 (230)	4 (191)	5 (455)	6 (359)	5 (391)	10 (808)
Double-peak k and h + triplet	2 (44)	1 (19)	0 (0)	1 (94)	2 (397)	1 (20)	1 (30)	1 (15)	1 (29)
Single-peak k and h	0 (0)	1 (52)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
Double-peaked k and h	1 (76)	2 (24)	0 (0)	0 (0)	0 (0)	1 (2)	0 (0)	1 (2)	0 (0)
Broad-shouldered k and h	0 (0)	0 (0)	0 (0)	1(21)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)
Broad k and h	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	2 (144)	0	1 (13)
Cosmic Ray	1 (2)	0 (0)	0 (0)	0 (0)	1 (2)	0 (0)	1 (2)	2 (8)	1 (1)
Irregular with broad wings	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	0 (0)	1 (22)	1 (45)

Note. Unbracketed values show the number of representative profiles for this type, while bracketed numbers show the corresponding number of spectra that compose the representative profiles.

Download table as: ASCII Typeset image

Table 6. Distribution (among the Preflare Data Sets) of Single-peaked Mg ii k and h Lines, with Accompanying Single-peaked Emission in the Mg ii Triplet Lines

	−2400 s	−2100 s	−1800	−1500 s	−1200 s	−900 s	−600 s	−300 s	0 s
Data set 0	5 (1.7%)	0	0	0	0	0	0	0	0
Data set 1	0	0	0	0	0	0	0	0	0
Data set 2	0	0	0	0	0	0	0	4 (1%)	10 (1%)
Data set 3	93 (32.6%)	86 (29%)	108 (28%)	80 (35%)	102 (53%)	149 (33%)	131 (36%)	95 (24%)	246 (30%)
Data set 4	140 (49%)	176 (59%)	227 (59%)	122 (53%)	52 (27%)	254 (56%)	161 (45%)	214 (55%)	418 (52%)
Data set 5	16 (5.6%)	6 (2%)	19 (5%)	2 (1%)	0	17 (4%)	17 (5%)	15 (4%)	62 (8%)
Data set 6	28 (9.8%)	24 (8%)	6 (2%)	20 (9%)	30 (16%)	30 (7%)	33 (9%)	25 (6%)	65 (8%)
Data set 7	3 (1%)	6(2%)	22 (6%)	6 (2%)	7 (4%)	5 (1%)	17 (5%)	38 (10%)	7 (1%)

Note. The unbracketed numbers refer to the number of spectra, while the bracketed percentage refers to the proportion of the total spectra assigned to preflare clusters in the current time step that shows single-peaked Mg ii k and h lines, accompanied by single-peaked emission in the Mg ii triplet lines.

Download table as: ASCII Typeset image

Figure 4 shows the locations of preflare clusters for data set 3, which is the X2.1 flare that occurred on 2015 March 11. This event is chosen as a representative example of the preflare cluster locations seen across the other flare data sets used. Preflare cluster locations are shown overlaid on the SJI images taken during the raster. Also overlaid (as black contours) are the locations of the flare ribbons as measured at the flare peak time. We can clearly see that the majority of the preflare clusters are located in regions in which the flare ribbons are later seen during the course of the flare event or are related to transient bright points within the center of the ARs under study. Figure 5 shows the location of the preflare clusters overlaid on the corresponding SDO HMI line-of-sight magnetogram images. From these we can see that the location for most of the preflare clusters broadly aligns with regions of intersection between the positive and negative magnetic fields (the white and black areas in the images, respectively). This behavior is seen in all of the flare events studied.

**Figure 5.** In this figure, we see the locations of the preflare clusters found prior to the 2015 March 11 X2.1 flare at each of the nine time steps clustered. For each time step, the corresponding SDO HMI image is shown, with the location of the raster slit positions shown as the dotted lines. The locations of the spectra in each individual cluster are shown in a unique color, which is detailed by the neighboring color bar for each time step. The locations of the flare ribbons, determined at the peak time of the flare, are shown overlaid as green contours. Also overlaid as contours are the locations of the penumbra (purple) and umbra (gray).
Download figure:
Standard image High-resolution image

4. Discussion

The results of our k-means clustering of spectroscopic IRIS observations have revealed the presence of spectra that appear many minutes prior to flaring and that may constitute a precursor signal for flares. We will now discuss in further detail the most common and consistently seen of these spectra: those observed with single-peaked emission in the Mg ii k and h lines along with single-peaked emission in the Mg ii triplet line.

Mg ii h and k profiles commonly have a double-peaked structure, with a clear central reversal. These lines do however exhibit single-peaked forms under certain conditions, notably in the umbra of sunspots, in some areas of plage regions, and under flare conditions. Investigations have been undertaken into the thermodynamic conditions that could form this kind of profile. Carlsson et al. (2015) undertook modeling to investigate single-peaked Mg ii emission in plage. The findings of that study revealed that a hot, dense chromosphere, with temperatures around 6500 K, could lead to single-peaked profiles. Rubio da Costa & Kleint (2017) investigated the formation of these single-peaked profiles under flaring conditions and were able to reproduce these profiles as well as their ratio to their satellite lines. They were able to achieve this through either increased temperatures, densities, or velocities in the upper chromosphere, the formation height of the line center. Recent 1D radiative hydrodynamic (RHD) modeling of the Mg ii k and h and UV triplet lines in flare ribbons by Zhu et al. (2019) also found that increased electron density in the upper chromosphere could produce single-peaked Mg ii k and h profiles.

As we have seen in our work, these single-peaked profiles have been found to be consistently observed in the preflare data sets, associated with the locations of bright points, and often in regions of intersection of positive and negative magnetic fields. These results suggest the presence of heating events and the proximity to neutral lines between positive and negative polarity suggests that magnetic reconnection may be involved in causing the heating events.

It is important to note that our analysis did not only find these profiles with single-peaked Mg ii k and h lines with emission in the Mg ii triplet to occur in the preflare data sets. When we examine the RPs that are categorized as a preflare/active region (pf/AR), i.e., found in both preflare and active region data sets, 16% of these RPs are identified as having single-peaked Mg ii k and h lines along with emission in the Mg ii triplet. On close analysis of these RPs and their constituent spectra, some result from the mistaken classification of single-peaked umbral profiles in AR data sets with single-peaked triplet profiles in the preflare data sets. However, 60% of the remaining RPs do indeed contain spectra from AR data sets that exhibit single-peaked Mg ii k and h lines along with emission in the Mg ii triplet line. These pf/AR spectra are found to come from only two of the AR data sets in particular. Thus, while spectra with single-peaked Mg ii k and h lines along with emission in the Mg ii triplet are not unique to the preflare environment, we find that they are overwhelmingly more likely to occur prior in the preflare data sets. These pf/AR clusters additionally provide us with further evidence that profiles exhibiting this form are related to heating events. In Figure 6 we see an example of these spectra along with their locations within the active region. As can be seen in Figure 6, and in the other ARs where these spectra are observed, the spectra are clearly related to small-scale brightenings. These brightenings are transient and are most likely related to explosive events, possibly jets. Whether or not the exact mechanisms that result in the formation of these spectra in the explosive events/jets is the same as the mechanism that produces these profiles in the preflare case cannot be determined from this work. However, it is clear that in both cases, these profiles are both associated with some form of energy release and heating. Panos et al. (2018) investigated typical spectra during flaring using k-means clustering and found these single-peaked Mg ii k and h lines along with emission in the Mg ii triplet to be present both in flares and in transient small-scale explosive events, a result that our findings above agree with.

**Figure 6.** The left-hand panel of this figure shows the locations of spectra identified as preflare/active region that exhibit single-peaked Mg ii k and h lines along with emission in the Mg ii triplet. They are marked in blue and overlaid onto the corresponding IRIS SJI image. The right-hand panel shows the spectra at the marked location in black, with the representative profile of the cluster that they belong to shown in orange. The intensity scale of these spectra is presented in data number per second (DN s⁻¹), which is the number of counts received by the detector per second.
Download figure:
Standard image High-resolution image

Studies of explosive events and Ellerman bombs (Peter et al. 2014; Vissers et al. 2015) have also observed the Mg ii triplet line in emission. However, in the majority of these cases, the emission in the Mg ii triplet was double peaked. This is in contrast to the triplet emission seen in this work (and Panos et al. 2018), where single-peaked emission is observed. The formation of the Mg ii triplet line is discussed in detail in Pereira et al. (2015). In this paper, the authors discuss the possible drivers for the difference in the appearance of the triplet line and found that triplet emission shows single-peaked behavior when there is a dominant temperature increase, often in higher layers of the chromosphere. Zhu et al. (2019) found that from their simulations the triplet emission was formed due to increased electron densities in the upper chromosphere, i.e., in the same region as the Mg ii h and k lines. However, as in the Pereira et al. (2015) and Zhu et al. (2019) flare simulations, the applicability of both interpretations to the preflare atmosphere must be treated with caution as it is possible that the line formation in preflare atmospheres is substantially different from that in a flaring atmosphere.

We also investigated the relationship between the locations of the preflare clusters and the ratio of two spectral lines (O i 1355.6 Å and C i 1354.3 Å) that form in the same region of the atmosphere to the Mg ii triplet. These results were not conclusive but are detailed in Appendix B for interested readers.

4.1. Inversions

While the observational evidence suggests that these profiles could result from heating, we perform further investigation to better determine the thermodynamic conditions that give rise to the spectra observed in these locations. This would allow us to infer the conditions that give rise to these particular spectra.

As a first effort to investigate the atmosphere that produced these spectra, we turned to IRIS² (Sainz Dalda et al. 2019). IRIS² (Inversion based on Representative profiles Inverted by STiC) is a code that aims to provide fast recovery of model atmospheres for input spectra. The returned thermodynamic parameters of the model atmosphere are temperature, the logarithm of electron density, line-of-sight velocity, and microturbulence—all as a function of optical depth. It does this by utilizing a database of synthetic RPs and their associated representative model atmospheres, resulting from the inversion of observed IRIS Mg ii h and k RPs using the STiC inversion code (de la Cruz Rodríguez et al. 2019). We used IRIS² to investigate whether we can recover the model atmosphere for the spectra that exhibit single-peaked Mg ii k and h lines along with single-peaked emission in the Mg ii triplet. It was found that while IRIS² is able to identify some of these profiles, the error bars on the returned thermodynamic parameters of the model atmosphere were large. We therefore decided to produce a dedicated inversion of this type of spectra in order to discern the atmospheric properties with better accuracy.

We produced new inversions for the 2015 March 11 data set, from the time step 2400 s prior to flare occurrence. From Table 5 we see that at this time step there are five clusters identified that show the single-peaked form we are interested in investigating. From Table 6 we can see that four are present in the 2015 March 11 data set, with a total of 93 individual spectra. In addition to these spectra, a selection of non-preflare spectra was chosen to be inverted. These spectra were of the following types: far from the flare site in the quietest region of the raster; the penumbra of the active region's sunspot; the umbra of the sunspot (both single- and double-peaked profiles); and finally, nonflaring spectra close to the flare site. These spectra were then inverted considering both the Mg ii h and k lines—including two of the lines belonging to the Mg ii UV triplet located at wavelengths of 2797.9 and 2798.0 Å—and the C ii 1334 and 1335 lines. We have performed simultaneous inversions of the Mg ii h and k lines because the Mg ii h and k lines and the C ii 1334 and 1335 lines are sensitive to variations in the thermodynamics in roughly the same region of the solar atmosphere (see Figure 17 of Rathore et al. 2015). By inverting all of these lines simultaneously, we aim to untangle the contributions of the temperature and microturbulent motions to the width of the observed lines (da Silva Santos et al. 2018, 2020). Therefore, the values of T and v_turb corresponding to the mid-chromosphere obtained in this paper are likely more realistic than in previous works that consider only one line to derive these quantities. We have inverted these profiles with the STiC code. We used the FALC model (Fontenla et al. 1993) as the initial model guess for the first cycle of the inversions, while v_los and v_turb are introduced ad hoc as these variables were not included in the available FALC model. For more technical details about the inversions, see A. Sainz Dalda (2021, in preparation).

Figures 7 and 8 show a selection of results from the inversions. For each of the panels in these figures, the upper panel shows the observed C ii 1334 and 1335 Å line spectra (purple) with the best-fitted inverted synthetic profile (black). The middle panel shows the observed Mg ii spectra (purple), the model fit (black), and those corresponding to the FALC (dashed line). The two panels below this show the parameters of the inverted model atmospheres from the observed profiles (solid lines) and those corresponding to the FALC (dashed line). The temperature (T, orange) and electron density ( $\mathrm{log}({n}_{e})$ , blue) are shown on the left, while microturbulence (v_turb, purple) and line-of-sight velocity (v_LOS, green) are shown on the right. This adapted FALC model was used as the initial model guess during the first cycle of the inversion. We find that the lines are well fit. Figure 7, panel (a), shows an example inversion of one of the preflare spectrum, and panel (b) shows a single-peaked umbral spectrum. Figure 8 panel (a) shows a non-preflare spectrum, and panel (b) shows a penumbral spectrum. From comparing the preflare spectra to the other model atmospheres we can see clear differences. In the preflare model atmospheres, when considering the electron density, we see a single rise moving from larger to smaller $\mathrm{log}(\tau )$ , e.g., moving higher in the atmosphere. We then see a plateau that is increased compared to the FALC model between $-5\leqslant \mathrm{log}(\tau )\leqslant -4$ , followed by a dropoff in value. Within this same region, for the double-peaked profiles (Figure 8, panels (a) and (b)) the electron density shows a propensity to drop in value at $\mathrm{log}(\tau )\approx -5$ .

**Figure 7.** This figure shows the input spectra, inversion fit, and resultant model atmospheres for a single-peaked preflare Mg ii spectrum (panel (a)) and single-peaked umbral Mg ii spectrum (panel (b)). In each panel, the upper and middle images show the observed C ii and Mg ii spectra (purple), respectively with the model fits overlaid (black). The Mg ii images also have a synthetic profile produced from the FALC model (dashed blue) overlaid. Two double axes below show the parameters of the resultant model from the inversion of the observed profiles (solid lines) and for the FALC model (dashed line) used as the initial guess model in the first cycle of the inversion. The temperature and electron (orange and blue, respectively) density are shown on the left, while microturbulence (v_turb) and line-of-sight velocity (v_LOS) are shown on the right (purple and green, respectively).
Download figure:
Standard image High-resolution image

**Figure 8.** This figure shows the input spectra, inversion fit, and resultant model atmospheres for a double-peaked Mg ii spectrum (panel (a)) and a penumbral Mg ii spectrum (panel (b)). In each panel, the upper and middle images show the observed C ii and Mg ii spectra (purple), respectively, with the model fits overlaid (black). The Mg ii images also has a synthetic profile produced from the FALC model (dashed blue) overlain. Two double axes below show the parameters of the resultant model from the inversion of the observed profiles (solid lines) and for the FALC model (dashed line) used as the initial guess model in the first cycle of the inversion. The temperature and electron (orange and blue respectively) density are shown on the left, while microturbulence (v_turb) and line-of-sight velocity (v_LOS) are shown on the right (purple and green, respectively).
Download figure:
Standard image High-resolution image

Figure 9 a shows the observed spectra and associated model atmospheres for all 93 preflare spectra that were inverted. We can see that there is a broad similarity in distributions. This is particularly true in terms of temperature, where the temperature minimum is observed at $\mathrm{log}(\tau )\approx -3$ and then rises to a plateau (above the FALC atmosphere, dashed line) between $-5\leqslant \mathrm{log}(\tau )\leqslant -4$ , followed by a continued rise in temperature at greater heights in the atmosphere. However, from the Mg ii spectra, it is clear that there are in fact three distributions present among these spectra. Panel (b) in Figure 9 and panels (a) and (b) in Figure 10 show these three populations separately. These three distributions, while all exhibiting single peaks in the Mg ii h, k, and triplet lines, differ in the width of the h and k lines and in the intensity of the triplet line. The widest population is shown in panel (b) of Figure 9. We see that the associated model atmospheres show a temperature minimum at $\mathrm{log}(\tau )\approx -2.75$ , followed by a temperature increase with height. Temperatures are enhanced above the values of the FALC atmosphere and continue to rise with height until they reach a stable value of approximately 8 kK between $-5\leqslant \mathrm{log}(\tau )\leqslant -4$ . The electron density also has its minimum at around a height of $\mathrm{log}(\tau )\approx -2.75$ , after which it rises to values above those of the FALC. For the population in panel (a) of Figure 10, we see that the minima of temperature and electron density are closer to $\mathrm{log}(\tau )\approx -3$ . For this population the temperature increases with height from its minimum, again achieving values higher than the FALC. Unlike the population in Figure 9 panel (b), in the region $-5\leqslant \mathrm{log}(\tau )\leqslant -4$ the temperatures do not reach a stable plateau; instead, they show a gradual increase with height. The electron density increases above the values of the FALC and stays enhanced with respect to the FALC as height in the atmosphere increases. The population in Figure 10, panel (b), shows the narrowest line widths, as well as the least intense triplet emission. We can see that the temperature increase starts from just above $\mathrm{log}(\tau )\approx -3$ . The temperatures in this population increase with height from the minimum, but unlike the previous two populations, the temperature profile is very similar to that of the FALC atmosphere, i.e., rising gradually. This continues until $\mathrm{log}(\tau )\approx -5$ when there is a large increase in temperature above that of the FALC. Electron density is also seen to have its minimum just above $\mathrm{log}(\tau )\approx -3$ . At higher heights, the electron density is somewhat increased with respect to the FALC in this population, but when compared to the populations in Figure 9, panel (b), and Figure 10, panel (a), it is the most similar to the FALC. In all three of the populations, there are clearly discernible patterns in the temperature and electron density; the distributions of microturbulence and line-of-sight velocity are less clear. In the line-of-sight velocity, there seems to be a partition of velocity that occurs around the temperature minimum with larger downward velocities seen in the lower atmosphere. Looking at the distributions of microturbulence, there is a large variance, both within and across the three populations. Despite these differences in the populations, the overall trend for the model atmospheres derived from our inversion of the preflare profiles is that above $\mathrm{log}(\tau )\approx -3$ we see increased temperatures and increased electron densities. This result is derived directly from inversions. It is compatible with forward modeling (e.g., Carlsson et al. 2015; Rubio da Costa & Kleint 2017; Zhu et al. 2019 and Pereira et al. 2015), which suggests that single-peaked Mg ii h and k profiles and single-peaked Mg ii triplet emission can be caused by heating (and resulting temperature and density increase) in the chromosphere.

**Figure 10.** Each panel in this figure is laid out as follows. The upper two plots show the observed C ii and Mg ii spectra, respectively. Below these are the four parameters of the model atmospheres retrieved from the inversions: temperature (orange), microturbulence (green), electron density (blue), and line-of-sight velocity (purple). In the line-of sight velocity plots, negative values correspond to upflows, while positive are downflows. The gray dashed lines in these plots show the FALC model. Panels (a) and (b) show the second and third populations of the 93 preflare spectra that were inverted, separated based on the width of the Mg ii h and k lines. The total distribution and the first population are shown in Figure 9.
Download figure:
Standard image High-resolution image

We have established that our observations and retrieved models are consistent with a scenario in which preflare signals seem to be associated with heating events in the chromosphere. Our results suggest that such heating may be occurring across a wide range of heights, from the low to upper chromosphere. The locations of these events have a clear association with sites of the ribbons. We also find a somewhat weaker association with neutral lines in active regions, i.e., locations where the magnetic flux of opposite polarities is in close proximity to one another. These results suggest the possibility that these heating events could be driven by magnetic reconnection. These reconnection events could be linked to preflare phenomena such as tether-cutting reconnection building up a magnetic flux rope, or possibly could be related to flare trigger scenarios such as outlined in Bamba et al. (2017), where small-scale flux emergence at polarity inversion lines leads to reconnection, which could trigger the onset of flaring. Our results suggest that more detailed follow-up studies of the complex preflare atmosphere, including the magnetic field evolution and distribution, are needed to shed further light on the exact nature of these preflare signals.

5. Conclusions

In this paper, we have presented an investigation into the Mg ii spectra observed by IRIS during the preflare period using k-means clustering. Our analysis has found spectra that are commonly associated with the preflare data sets. The majority of these spectra are single-peaked Mg ii k and h lines along with single-peaked emission in the Mg ii triplet line. While profiles exhibiting these characteristics are not uniquely observed in the preflare data sets, they are overwhelmingly more common in these data sets. In the two non-preflare data sets in which profiles are found, they are associated with small-scale transient brightenings. Additionally, from our inversions, we find that the model atmospheres of these profiles show increases in both temperature and electron density in the chromosphere. We therefore conclude that it is highly likely that these spectra are the result of small-scale heating events occurring in the region of flaring from 40 minutes prior to flaring. One possible driver for this heating is magnetic reconnection. Further detailed studies of the magnetic environment in the vicinity of these events will provide more insight into the exact nature of these events. Our results provide constraints and shed light on the physical mechanisms that drive flares and eruptions.

IRIS is a NASA Small Explorer mission developed and operated by LMSAL with mission operations executed at NASA Ames Research center and major contributions to downlink communications funded by ESA and the Norwegian Space Centre. This work was supported by NASA contract NNG09FA40C (IRIS). A.S.D. is also supported by the NASA Heliophysics Guest Investigator (H-GI) "Open" Program (80NSSC21K0726).

We acknowledge that this work was carried out upon the unceded ancestral homeland of the Ramaytush Ohlone peoples, the original inhabitants of the San Francisco Peninsula.

Appendix A: Additional Tables

This appendix contains complementary tables detailing the distribution of all types of preflare clusters identified in this study across every data set. Tables 7 and 8 show the number of spectra of each type for each data set, while Tables 9 and 10 show the proportion of each raster that these types of spectra represent.

Table 7. Number of Spectra in Each of the Preflare Clusters for Each Data Set from 2400 to 1200 s Prior to Flaring

	−2400 s								−2100 s								−1800 s								−1500 s								−1200 s
Cluster Type	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h

Data set 0	5	0	0	0	0	0	0	0	0	6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	64	0	0	0	4	0	0	0	128	0	0	0	0	0	0

Data set 1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0

Data set 2	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	18	0	0	0	0	0	0	0	0	2	0	0	0	0	0

Data set 3	46	2	0	0	0	0	0	0	86	0	0	40	0	0	0	0	108	0	0	0	0	0	0	0	80	0	0	0	0	17	0	0	102	5	0	0	0	0	0	0

Data set 4	140	15	0	0	0	0	0	0	176	10	0	0	0	0	0	0	227	0	0	0	0	0	0	0	122	12	0	0	0	0	0	0	52	264	0	0	0	0	0	0

Data set 5	16	3	0	0	0	0	0	0	6	0	0	12	10	0	0	0	19	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Data set 6	28	6	0	0	0	0	0	0	24	3	0	0	0	0	0	0	27	0	0	0	0	0	0	0	20	0	0	0	0	0	0	0	30	0	0	0	0	0	0	0

Data set 7	3	18	0	0	76	0	0	0	6	0	0	0	14	0	0	0	22	0	0	0	0	0	0	0	6	0	0	0	0	0	0	0	7	0	0	0	0	0	0	0

Note. The entry for each data set and time step contains the number of spectra for each type of preflare cluster. These eight cluster types are (a) single-peaked Mg ii k and h lines, with emission in the Mg ii UV triplet lines; (B) double-peaked k and h lines, with emission in the Mg ii triplet line; (c) single-peaked Mg ii k and h lines; (d) double-peaked Mg ii k and h lines; (e) broad-shouldered Mg ii k and h; (f) broad Mg ii k and h; (g) cosmic-ray hits; (h) irregular with broad wings.

Download table as: ASCII Typeset image

Table 8. Number of Spectra in Each of the Preflare Clusters for Each Data Set from 900 to 0 s Prior to Flaring

	−900 s								−600 s								−300 s								0 s
Cluster Type	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h

Data set 0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	16	0	0	7	0	0	0	0	0	3	0	0	0	0	0	0	0	23

Data set 1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Data set 2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	0	0	0	0	0	0	0	10	0	0	0	0	0	0	0

Data set 3	149	4	0	0	0	0	0	0	131	6	0	0	0	0	0	0	95	0	0	0	0	0	0	0	246	5	0	0	0	0	0	5

Data set 4	254	16	0	0	0	0	0	0	161	0	2	0	0	0	128	0	214	7	0	0	0	0	0	14	418	0	0	0	0	0	0	14

Data set 5	17	0	0	0	0	0	0	0	17	4	0	0	0	0	0	0	15	1	0	0	0	0	0	3	62	0	0	0	0	13	0	0

Data set 6	30	0	0	0	0	0	0	0	33	7	0	0	0	0	0	0	25	0	8	0	0	0	0	0	65	5	0	0	0	0	0	0

Data set 7	5	0	0	0	0	0	0	0	17	13	0	0	0	0	0	0	38	0	0	0	2	0	0	0	7	19	1	0	0	0	0	3

Note. The entry for each data set and time step contains the number of spectra for each type of preflare cluster. These eight cluster types are (a) single-peaked Mg ii k and h lines, with emission in the Mg ii UV triplet lines; (B) double-peaked k and h lines, with emission in the Mg ii triplet line; (c) single-peaked Mg ii k and h lines; (d) double-peaked Mg ii k and h lines; (e) broad-shouldered Mg ii k and h; (f) broad Mg ii k and h; (g) cosmic-ray hits; (h) irregular with broad wings.

Download table as: ASCII Typeset image

Table 9. Proportion of the Raster Each of the Preflare Clusters, for Each Data Set from 2400 to 1200 s Prior to Flaring

	−2400 s								−2100 s								−1800 s								−1500 s								−1200 s
Cluster Type	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h

Data set 0	0.05	0	0	0	0	0	0	0	0	0.07	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.7	0	0	0	4	0	0	0	1.5	0	0	0	0	0	0

Data set 1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.2	0	0	0	0	0

Data set 2	0	0	0.02	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.2	0	0	0	0	0	0	0	0	0.02	0	0	0	0	0

Data set 3	3	0.1	0	0	0	0	0	0	5.5	0	0	40	0	0	0	0	7	0	0	0	0	0	0	0	5	0	0	0	0	1	0	0	6.5	0.3	0	0	0	0	0	0

Data set 4	2	0.2	0	0	0	0	0	0	3	0.2	0	0	0	0	0	0	3.6	0	0	0	0	0	0	0	2	0.2	0	0	0	0	0	0	0.8	4	0	0	0	0	0	0

Data set 5	1	0.1	0	0	0	0	0	0	0.4	0	0	0.7	0.6	0	0	0	1.2	0	0	0	0	0	0	0	0.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Data set 6	7	1.5	0	0	0	0	0	0	6	0.8	0	0	0	0	0	0	7	0	0	0	0	0	0	0	5	0	0	0	0	0	0	0	8	0	0	0	0	0	0	0

Data set 7	0.1	1	0	0	4.5	0	0	0	0.4	0	0	0	0.8	0	0	0	1.3	0	0	0	0	0	0	0	0.4	0	0	0	0	0	0	0	0.4	0	0	0	0	0	0	0

Note. The entry for each data set and time step contains the proportion of the raster ((number of spectra in cluster type/number of pixels in raster)*100) for each type of preflare cluster. These eight cluster types are (a) single-peaked Mg ii k and h lines, with emission in the Mg ii UV triplet lines; (B) double-peaked k and h lines, with emission in the Mg ii triplet line; (c) single-peaked Mg ii k and h lines; (d) double-peaked Mg ii k and h lines; (e) broad-shouldered Mg ii k and h; (f) broad Mg ii k and h; (g) cosmic-ray hits; (h) irregular with broad wings.

Download table as: ASCII Typeset image

Table 10. Proportion of the Raster of Each of the Preflare Clusters, for Each Data Set from 900 to 10 s Prior to Flaring

	−900 s								−600 s								−300 s								0 s
Cluster Type	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h	a	b	c	d	e	f	g	h

Data set 0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.18	0	0	0.07	0	0	0	0	0	0.03	0	0	0	0	0	0	0	0.26

Data set 1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Data set 2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.04	0	0	0	0	0	0	0	0.1	0	0	0	0	0	0	0

Data set 3	9	0.3	0	0	0	0	0	0	8	0.4	0	0	0	0	0	0	6	0	0	0	0	0	0	0	16	0.3	0	0	0	0	0	0.3

Data set 4	4	0.3	0	0	0	0	0	0	2.6	0	0.03	0	0	0	2	0	3.4	0.1	0	0	0	0	0	0.2	7	0	0	0	0	0	0	0.2

Data set 5	1	0	0	0	0	0	0	0	1	0.2	0	0	0	0	0	0	0.9	0.06	0	0	0	0	0	0.2	3.8	0	0	0	0	0.8	0	0

Data set 6	8	0	0	0	0	0	0	0	8.5	2	0	0	0	0	0	0	6	0	2	0	0	0	0	0	17	1.3	0	0	0	0	0	0

Data set 7	0.3	0	0	0	0	0	0	0	1	0.8	0	0	0	0	0	0	2.3	0	0	0	0.1	0	0	0	0.4	1.1	0.06	0	0	0	0	0.2

Note. The entry for each data set and time step contains the proportion of the raster ((number of spectra in cluster type/number of pixels in raster)*100) for each type of preflare cluster. These eight cluster types are (a) single-peaked Mg ii k and h lines, with emission in the Mg ii UV triplet lines; (B) double-peaked k and h lines, with emission in the Mg ii triplet line; (c) single-peaked Mg ii k and h lines; (d) double-peaked Mg ii k and h lines; (e) broad-shouldered Mg ii k and h; (f) broad Mg ii k and h; (g) cosmic-ray hits; (h) irregular with broad wings.

Download table as: ASCII Typeset image

Appendix B: C i/O i Ratio

From Pereira et al. (2015) and Zhu et al. (2019) we know that Mg ii triplet emission is likely due to heating in the chromosphere. IRIS observes many spectral lines that are also sensitive to this region of the solar atmosphere. Two such lines are O i 1355.6 Å and C i 1355.8 Å. If there were to be increased emission in these lines in addition to the regions where we find increased Mg ii triplet emission, it may provide information as to where in the atmosphere the heating in the chromosphere is occurring. Lin et al. (2017) discuss simulations of both of these lines in the case of the quiet Sun. This work highlights that these lines have strong diagnostic potential for the chromosphere, with the C i line, in particular, signifying activity in the middle chromosphere. This work also discusses how the ratio of these two lines can be used to infer electron density as well as the formation height of the lines. However, it is important to note that as these simulations were based on the quiet Sun; these relationships may not hold for the solar atmosphere in quiescent, preflaring, or flaring ARs. The relationship between O i and C i is also explored observationally in Cheng et al. (1980). In this work, the authors used SKYLAB observations of these two lines. They examine the O i/C i ratio for the quiet Sun, quiescent active region, and solar flares. They found that in the quiet Sun, the O i line dominates C i, leading the ratio to have large values. During flaring however this is reversed with C i dominating, and as such the ratio is small. As IRIS observes these two lines, the relationship between the location of our preflare clusters and the O i/C i ratio was investigated. However, due to an issue with the observing sequences of the preflare data sets used in this study the O i observing window was truncated. This resulted in the O i 1355.6 Å line being at the far edge of the observation window, while the C i 1355.8 Å was omitted entirely. Fortunately, within the observation window, another C i line formed in the same manner is observed at 1354.3 Å. While this line is somewhat less intense than the 1355.8 Å C i line it was chosen for use in the analysis. From a comparison of intensity maps of C i and O i it was found that the regions where the preflare clusters were identified are found to correspond to areas with enhanced C i in particular.

We also carried out an analysis of the ratio of the C i and Oi lines. The two spectral lines were first fitted with single Gaussian profiles, in order to determine their peak intensity, which were then used to find the ratio. As these lines can be very weak in nonflaring situations, some of the fits that were found were nonphysical e.g., the lines were indistinguishable from the background. We therefore imposed an intensity threshold on the results of the C i fitting. If the fitted intensity of the C i line fell below 4 dn s⁻¹, the data from that pixel were not considered in the ratio. Figure 11 shows an example of the resulting maps of the O i/C i ratio for the 2015 March 11 X2.1 flare. In this figure, white represents regions where no ratio was calculated. The color scale on the plots shows the ratio between 0 and 5. In the preflare rasters, we can see that the majority of the values of the ratio are >1, while in the case of the flaring raster, the majority of the values of the ratio are <1. This same behavior is observed across all flare events and is in agreement with the findings of Cheng et al. (1980). By overlaying the locations of the preflare clusters identified by the k-means process, shown in Figure 11 in pink, we find that the majority of the preflare spectra are found to occur either within or close to the regions where the C i threshold is met. As stated earlier, this implies that many of these preflare spectra are associated with C i emission, something that is confirmed when the individual spectra at these positions are examined. Whatever mechanism is driving the formation of these profiles involves heating in the lower chromosphere. These plots additionally show that the values of the O i/C i ratio in the locations of the preflare cluster locations show no clear, consistent value.

**Figure 11.** This figure shows the O i/C i ratio for the 2015 March 11 X-class flare. The nine preflare rasters that the k-means clustering was carried out on are shown, as well as the O i/C i ratio for the flare peak. For the preflare time steps, the locations of the found preflare clusters are shown in pink. Regions in white represent areas where the ratio is not calculated due to the fitted intensity of the C i line not meeting the necessary intensity threshold.
Download figure:
Standard image High-resolution image

Unsupervised Machine Learning for the Identification of Preflare Spectroscopic Signatures

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction