Brought to you by:

Spatiotemporal Interpolation Methods for Solar Event Trajectories

, , , , , and

Published 2018 May 11 © 2018. The American Astronomical Society. All rights reserved.
, , Citation Soukaina Filali Boubrahimi et al 2018 ApJS 236 23 DOI 10.3847/1538-4365/aab763

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/236/1/23

Abstract

This paper introduces four spatiotemporal interpolation methods that enrich complex, evolving region trajectories that are reported from a variety of ground-based and space-based solar observatories every day. Our interpolation module takes an existing solar event trajectory as its input and generates an enriched trajectory with any number of additional time–geometry pairs created by the most appropriate method. To this end, we designed four different interpolation techniques: MBR-Interpolation (Minimum Bounding Rectangle Interpolation), CP-Interpolation (Complex Polygon Interpolation), FI-Interpolation (Filament Polygon Interpolation), and Areal-Interpolation, which are presented here in detail. These techniques leverage k-means clustering, centroid shape signature representation, dynamic time warping, linear interpolation, and shape buffering to generate the additional polygons of an enriched trajectory. Using ground-truth objects, interpolation effectiveness is evaluated through a variety of measures based on several important characteristics that include spatial distance, area overlap, and shape (boundary) similarity. To our knowledge, this is the first research effort of this kind that attempts to address the broad problem of spatiotemporal interpolation of solar event trajectories. We conclude with a brief outline of future research directions and opportunities for related work in this area.

Export citation and abstract BibTeX RIS

1. Introduction

The new era of big data is impacting every facet of modern life. Reaching far beyond industry applications, many scientific domains are now being transformed from the disruptive technology and newfound potentials offered by massive amounts of data never seen before. In the case of solar physics, the Solar Dynamics Observatory (SDO) mission, developed by NASA and launched on 2010 February 11, captures over 70,000 high-resolution images of the Sun per day, which amasses more data than all previous solar data archives combined (Martens et al. 2011).

Many previous archives were acquired from either ground-based observatories, such as the National Astronomical Observatory, Nobeyama Solar Radio Observatory, Hida Observatory, and Norikura Solar Observatory, or space-based missions, such as Chandra, Kepler, Spitzer, and Tempo. In addition to these existing data sources, the deluge of solar data will only increase with the launch of the largest ground-based solar telescope in the world, the Daniel K. Inouye Solar Telescope (DKIST). Scheduled to begin operations in 2019, the DKIST will provide an unprecedented data stream over a wide range of instruments, capturing tens of images per second and upward of 25 TB of data per day (Berukoff et al. 2015). For comparison, the Solar Heliospheric Observatory (SOHO) mission, launched on 1995 December 2, operated at a 12-minute cadence, while the SDO currently operates at a 12 s cadence.

Solar physicists primarily collect and analyze solar images to better understand the variations of solar activity that influence life on Earth and directly impact many modern technological systems, including communication networks and energy infrastructures. It has been estimated that the damage from a severe solar storm could exceed $2 trillion in total economic impact, and a full recovery could take years (National Research Council 2008). These solar variations come from the different types of phenomena (events) that occur within the solar corona, such as active regions, sunspots, flares, filaments, coronal holes, X-ray bright points, and coronal jets. The SDO mission is the first mission of NASA's Living With a Star (LWS) program, which is a long-term project dedicated to studying aspects of the Sun that significantly affect human life, with the eventual goal of space weather prediction (Withbroe 2000).

Although a tremendous amount of solar data now exists, one of the main challenges that researchers face is the limited amount of accurately labeled data. Some events are trigger based and last only minutes, such as solar flares. The majority of the event detections are performed every 4 to 12 hr depending on the event type. On the other hand, filaments can exist for several days, and the best automatic detection conditions are through the use of Hα filtered images that are taken only every 12 hr by two different ground-based observatories, namely, Kanzelhoehe (Austria) and Big Bear Solar Observatory (USA). The time window between these two observatories is extensively long, especially when compared to other event types. Given that SDO imagery is available every 10 s, there is clearly a huge potential for providing finer-resolution event data in a more uniform fashion.

Another impact of missing data is on the quality of solar data mining results, specifically that of spatiotemporal frequent pattern mining studies (Aydin et al. 2015). The richness of vector-based solar data is, therefore, crucial for both solar physics and solar data mining communities. The study of the events in a case-by-case fashion and manual interpolation are time-consuming and repetitive tasks. In order to scale these tasks to solar big data, we propose four automated interpolation methods: MBR-Interpolation (Minimum Bounding Rectangle Interpolation), CP-Interpolation (Complex Polygon Interpolation), FI-Interpolation (Filament Polygon Interpolation), and Areal-Interpolation.

The rest of the paper is presented as follows. In Section 2, we provide background material on the solar data and related works, as well as the formal problem statement. In Section 3 we define our interpolation methods in detail, and then in Section 4 we lay out our experimental evaluation methodology. In Section 5, we show our experimental results, and we finish with conclusions and future work in Section 6. The Appendix contains algorithmic pseudocode and supplemental statistical plots.

2. Background

The SDO mission produces a total of 1.5 TB of raw data per day (Pesnell 2015). The observatory has three independent instruments: the Helioseismic and Magnetic Imager (HMI), the Extreme Ultraviolet Variability Experiment (EVE), and the Atmospheric Imaging Assembly (AIA). In this work, we primarily focus on AIA images, because they are the observations where most types of events are generally recognized. The AIA instrument captures full-disk solar images in 10 different electromagnetic wavelength bands across the ultraviolet and visual spectrum, designed to highlight different aspects of the solar corona (Lemen et al. 2012).

Given the unique characteristics of the solar event types, each solar phenomenon is detected under one or more specific wavelengths, using different detection techniques that are dictated by the solar detection module associated with that event type. Figure 1 illustrates an example of two tracked coronal hole events that occurred over the span of 2 days. The detection modules were created by an international consortium of independent research groups called the Feature Finding Team (FFT), which were selected by NASA to develop specialized modules to detect a set of solar phenomena (Martens et al. 2011). For example, coronal holes and active regions are detected from AIA images with the Spatial Possibilistic Clustering Algorithm (SPoCA; Verbeeck et al. 2014), while filaments are detected from ground-based Hα images using the Advanced Automated Filament Detection and Characterization Code (AAFDCC; Bernasconi et al. 2005). Examples of three different types of solar event detections are shown in Figure 2. Table 1 lists the event types with their respective data sources and detection module references. For more information, we refer the reader to Schuh et al. (2016), who comprehensively collected and analyzed the reports from these modules.

Figure 1.

Figure 1. Trajectory segment of two coronal hole event boundaries from 2012 January 23 07:00:00 to 2012 January 25 07:00:00.

Standard image High-resolution image

Table 1.  Summary of Event Types with Their Respective Source and Detection Module

Label Event Type Source FFT Module References
AR Active region SDO SPoCA Verbeeck et al. (2014)
CH Coronal hole SDO SPoCA Verbeeck et al. (2014)
EF Emerging flux SDO/HMI Emerging flux region module Martens et al. (2012)
FI Filament BBSO & Kanzelhoehe AAFDCC Bernasconi et al. (2005)
FL Flare SDO Flare Detective Trigger Module Pesnell et al. (2011)
SG Sigmoid SDO Sigmoid Sniffer Bernasconi et al. (2011)
SS Sunspot SDO EGSO SFC Zharkov et al. (2005)

Download table as:  ASCIITypeset image

Part of the output of the detection modules are event metadata, which contain location-based information such as the spatial boundary outline, or bounding box, and the centroid. To understand the intrinsic sequential characteristics among the solar event instances, a tracking algorithm was introduced in Kempton & Angryk (2015) that links the individual detections and generates event trajectories. Our interpolation module uses these event trajectories as input to the enrichment process.

2.1. Problem Definition

The task of spatiotemporal interpolation for solar event instances can be summarized as formulating automated techniques to enrich the instances by predicting their unknown locations. We employed the time–geometry pair model presented in Aydin et al. (2016) to represent solar event instances. Each solar event instance is considered as an evolving region trajectory, which is a chronologically ordered list of time–geometry pairs. Each pair represents the polygon-based spatial location (geometry) of an instance at a particular time. Our task is to estimate (interpolate) an unknown region-based location of a solar event instance within its life span using its own trajectory. In our interpolation techniques, our aim is to maximally preserve the spatiotemporal characteristics of instances such as location, area, shape, and rotation.

2.2. Related Work

The automated generation of continuous spatial or spatiotemporal data sets starting from irregularly distributed data is a common task for many disciplines, including solar physics. There are a variety of common methods that can perform this interpolation task, each with a set of advantages and disadvantages. The methods can be classified into two main categories: point interpolation and areal interpolation. The first type of method is interested in estimating the values of a point given a number of point values as an input. Alternatively, the second type of method is used to estimate an aggregate attribute of one areal unit system (i.e., a polygon) based on that of another, spatially incongruent (nonidentical in form) system in which the attribute data were collected. The units of the original attribute are known as source units, and those for which the attribute needs to be estimated are termed target units.

The most popular example of point interpolation in the literature is the house price interpolation (Li & Revesz 2002). Given the current house prices at some location, an unknown house price could be determined using point interpolation. Areal interpolation is extensively used in applications involving population data, such as demographic information from national censuses, which is usually based on arbitrarily designated census units such as census blocks, block groups, and census tracts. To associate such information with other area-based data sets such as market catchment areas, postal delivery zones, or areal units representing environmental phenomena like watersheds and soil types, areal interpolation is needed to align these areal units to be spatially congruent with those of the census (Logan et al. 2014). The aforementioned interpolations are spatial interpolations applied to point-based trajectories.

The problem that we are addressing here is moving region interpolation, which is a more difficult problem than point and areal interpolations. Moving region interpolation involves the interpolation of the vertices of a complex shape on a one-to-one basis. The challenge of this task resides in matching the vertices of the first original shape to the vertices of the second original shape to interpolate an accurate shape at a time point between the occurrence of these two original shapes. Thus, we will extend the traditional spatial interpolation methods to more sophisticated use cases of spatiotemporal data.

Some efforts have been devoted to designing spatiotemporal interpolation techniques. Craglia & Onsrud (2004) used a geostatistical version of the Kriging method for which the interpolated values are modeled by a Gaussian process. Furthermore, Tøssebro & Güting (2001) proposed a new method for interpolating between snapshots of moving regions using the combination of a rotating plane algorithm and an overlap graph, which does not require any user interaction and has a reasonable running time. Forlizzi et al. (2000) defined a data model for spatiotemporal databases that include complex evolving spatial structures such as line networks or multicomponent regions with holes. Another similar approach for moving region spatiotemporal interpolation uses parametric rectangles (Cai et al. 2000). Here, the authors introduced a new data model for representing, querying, and animating spatiotemporal objects with continuous and periodic change. While traditional moving points are defined only between now and $+\infty $, parametric rectangles can have arbitrary time interval durations. Finally, a new database system (PReSTO) was implemented that includes the new parametric rectangles concept (Cai et al. 2000). The aforementioned works raise a number of challenges, such as which spatiotemporal interpolation method is appropriate in which context, and how the data are to be stored, visualized, and queried.

Santosh (2010) proposed a novel method to address the problem of data representation of the above-discussed research works, by providing a unique shape modeling of any 2D shape. They addressed the problem of shape similarity matching based on the assumption that the uniqueness of any shape happens because of two equally salient properties. The first property is the radial distance between every pair of coordinates of the shape boundary and the centroid of the shape. The second property is the angle along the boundary points with respect to the centroid. By combining these two properties, a unique centroid shape signature, which they call a signature matrix, can be used to represent the shape. In this case, the combined properties are represented by a time series, where the time dimension of the centroid shape signature is replaced by angles and the centroid shape signature values represent the respective radii. However, while Santosh (2010) addresses the shape modeling and classification problems, Santosh disregards the task of mapping one shape to another, which, we believe, is a crucial step for the shape interpolation.

In our previous work, we introduced a new interpolation method for solar filament events (Filali Boubrahimi et al. 2016a, 2016b). From the perspective of spatiotemporal interpolation, a filament is a unique event type that has special shape characteristics. Two main features of a filament are the spine and barbs, which can be thought of as the skeleton of the polygonal shape. Our FI-Interpolation algorithm first detects the endpoints of the spine of the two original filament events. The vertices of the respective shapes are then matched together using the dynamic time warping technique. Finally, linear interpolation is applied for every vertex of the original shapes to produce the missing shape.

In this paper, we will provide finer details of the four spatiotemporal interpolation techniques that we presented earlier (Filali Boubrahimi et al. 2016a, 2016b, 2016c): Minimum Bounding Rectangle Interpolation (MBR-Interpolation), Complex Polygon Interpolation (CP-Interpolation), areal interpolation (Areal-Interpolation), and Filament Polygon Interpolation (FI-Interpolation).

2.3. Data Model

To conceptually model the spatiotemporal trajectories of solar events in our framework, we have utilized the temporal snapshot-based spatiotemporal data model described in Aydin et al. (2015). The trajectory data model uses time–geometry pairs (tgpi) as its fundamental building block. A time–geometry pair is a composite object for representing a spatial geometry (gi) at a particular time stamp (ti):

Equation (1)

The geometries are polygon-based spatial objects, each representing the location of a particular region at a given time instance. A trajectory object is represented by an ordered list of time–geometry pairs:

Equation (2)

The time–geometry objects are sorted based on their time stamp values (i.e., t1 < t2 < t3 < ... < tk). Trajectories are uniquely identified by an identifier (id).

2.4. Data Life Cycle

Before discussing our interpolation methods in detail, we briefly overview the data life cycle. Summarized in Figure 3, this process spans from initial raw data collection to serving user applications with final data sets. The data life cycle begins with images captured by the SDO. The raw images are processed by the event detection modules, and the resultant reports from the FFT modules are then fed to the tracking module that links the reports of events that represent the same phenomena over time. The last step of the data life cycle is interpolation to generate enriched trajectory objects.

Figure 2.

Figure 2. Examples of solar event instances: (a) eight active region detections at 2011 January 02 23:00:01; (b) three coronal holes detections at 2010 May 17 20:00:02; (c) nine filament instances at 2010 October 15 16:46:10.

Standard image High-resolution image
Figure 3.

Figure 3. Data life cycle of the solar event data.

Standard image High-resolution image

Some examples of potential uses of our enriched trajectory data products are Content-Based Image Retrieval (CBIR) systems (Banda & Angryk 2010), spatiotemporal solar data mining research (Aydin & Angryk 2016a, 2016b; Pillai et al. 2016; Aydin et al. 2017), and solar trajectory video generation frameworks (Filali Boubrahimi et al. 2016c). In addition to those, motion pattern queries and video motion tracing are potential future directions for solar event data that can benefit from our spatiotemporal interpolation methods.

An example of motion pattern queries is the flock pattern query, which captures the collaborative behavior of spatiotemporal data (Khalid 2009; Vieira & Tsotras 2013). The flock query mines a number of trajectories that are all enclosed by a disk of diameter epsilon for at least σ time stamps. In this application, having evenly sampled data for all the event types is crucial. In the context of solar event data, having a missing report does not necessarily mean that the event does not exist; it may also mean that the detection module was not operational at that time. In this case, spatiotemporal interpolation techniques can be useful in estimating the missing reports for all the types and to impose a consistent cadence, which improves the quality of the flock pattern query results. For the video motion tracing application, the extrapolation method used in tracking can be useful in predicting the location of the future report at times when the detection modules are operational. In this paper, we work on the interpolation part of the data life cycle.

2.5. Tracking Module

The tracking module, developed by Kempton & Angryk (2015), takes individual event detections and links them together into trajectories in an iterative fashion. The tracking algorithm first links the individual event instances by projecting a detected object forward using the known differential rotation of the solar surface and searches for the potential detections that overlap with this projected area at the next time step. If there is only one possible detection to be linked to, the algorithm links them together. The results from this step are then fed as the input for subsequent processing steps.

After the initial step of linking detections together, the algorithm repeats the search for possible detections to link to. In these later steps, the algorithm considers detections that had multiple paths in their search region. To determine which path a tracked object takes, several aspects of visual and motion similarity are compared to produce a probable path for the object. The resultant paths are again fed into another iteration of the algorithm, with larger and larger gaps allowed between detections to account for missed detections in the original metadata.

3. Interpolation Methods

Each of our four interpolation techniques is used in different contexts depending on the reported spatial characteristics of solar events. MBR-Interpolation is used for the event types whose instances are reported with minimum bounding rectangles (MBRs) but not complex chain codes. CP-Interpolation (CP stands for complex polygon) is used for all the other event types, except for filaments (FI), whose instances contain a chain code. FI-Interpolation is a specialized interpolation technique designed specifically for filaments. Finally, Areal-Interpolation is a method used in cases where FI-Interpolation and CP-Interpolation fail to generate a valid polygon geometry. Figure 4 shows a flowchart for the use case scenario of our interpolation methods.

Figure 4.

Figure 4. Flowchart for choosing the interpolation method.

Standard image High-resolution image

Once the interpolation method is chosen, the interpolation module takes a trajectory made of two time–geometry pairs as input and generates an enriched trajectory containing interpolated time–geometry pairs. The task of interpolating the missing event instances consists of estimating the location of the event at times when the vector data are not available. In doing so, it is creating a more enriched data set with broader coverage and uniformly reported events for querying and analysis.

After deciding the number of time geometries to be generated by the interpolation module, the time–geometry pairs are fed to the appropriate interpolation algorithm among the four proposed methods following the flowchart in Figure 4. Table 2 shows the different interpolation techniques available for each event type. The following subsections provide a detailed description of each interpolation technique along with their corresponding workflow. We will follow the general order of the decision tree shown in Figure 4, starting with MBR-Interpolation. Then, we will present CP-Interpolation and build on it with the description of the FI-Interpolation technique. Lastly, Areal-Interpolation will be presented, which is used in cases when other techniques fail.

Table 2.  Interpolation Techniques for Different Event Types

Event Type Available Spatial Information Interpolation Algorithm
Active region MBR and complex polygon CP-Interpolation or Areal-Interpolation
Coronal hole MBR and complex polygon CP-Interpolation or Areal-Interpolation
Emerging flux MBR only MBR-Interpolation
Filament MBR and complex filament polygon FI-Interpolation or Areal-Interpolation
Flare MBR only MBR-Interpolation
Sigmoid MBR only MBR-Interpolation
Sunspot MBR and complex polygon CP-Interpolation or Areal-Interpolation

Download table as:  ASCIITypeset image

3.1. Minimum Bounding Rectangle (MBR) Interpolation

The MBR-Interpolation technique is used when the only spatial information available for event instances is their MBR representation. Those event types are emerging flux, flares, and sigmoids. The MBR-Interpolation is the simplest technique, as it involves only matching two points, and the standard linear interpolation technique is then applied. The standard formula used for linear interpolation is shown in Equation (1), where p1 and p2 are the point locations at times t1 and t2, respectively:

Equation (3)

The upper left and lower right corners of the two given polygons are first matched as shown in Figure 5. Then, according to the number of polygons to be generated (which is decided earlier), a number of upper left and lower right corner points are generated using linear interpolation. Using those generated corner points, interpolated rectangles are generated and paired with their respective time stamps. An illustration of MBR-Interpolation is shown in Figure 5. The MBR-Interpolation workflow is shown in Figure 6, and the algorithm is presented in Algorithm 1 of the Appendix.

Figure 5.

Figure 5. Example MBR-Interpolation with 4 hr cadence and three interpolated spatial regions shown by dotted lines.

Standard image High-resolution image

3.2. Complex Polygon (CP) Interpolation

CP-Interpolation starts with transforming the polygon-based region geometries into centroid shape signatures. Centroid shape signatures are 1D functions that are derived from the contour of the shape. They are a compact representation of a shape (see Figure 7 for an example). In this paper, we use the standard centroid shape signature, which captures the shape of a geometry by computing the sequence of (Euclidean) distances of every point coordinate from the centroid of the geometry. The shape signature is derived by starting at a point coordinate p0 on the boundary of the geometry and proceeding in a clockwise direction, computing a sequence of distances for each of the point coordinates in the boundary. In analogy with a time series, a centroid shape signature records discrete distance values of point coordinates with its respective centroid in order. Similarly, a time series records discrete values of data points in chronological order. Figure 7 illustrates an example of centroid shape signature starting from the first endpoint p0.

Figure 6.

Figure 6. Centroid shape signature of a polygon.

Standard image High-resolution image
Figure 7.

Figure 7. Workflow for the MBR-Interpolation algorithm.

Standard image High-resolution image

Since a centroid shape signature can start at any point coordinate of the shape geometry, it is not starting point invariant. We use the Dynamic Time Warping (DTW) technique (Juang 1993) to align the two input polygons represented as centroid shape signatures. The result of DTW is a many-to-many point matching of the values of the two centroid shape signatures. An example of DTW alignment is shown in Figure 8. Linear interpolation is then applied between the matched pair of points to generate the interpolated polygons. Our algorithm is composed of four main parts: (1) centroid shape signature polygon transformation, (2) sliding window search, (3) points matching with DTW, and (4) linear interpolation. All the steps are illustrated in Figure 9. The first step of centroid shape signature transformation in CP-Interpolation is oversampling the points in the polygons. This is crucial to guarantee adequate coverage of all important boundary details and an accurate centroid shape signature representation. The resultant densified polygons are then transformed into centroid shape signatures. Next, in order to allow for a search for the proper rotational alignment, the values of the signature are repeated for a second revolution around the boundary and concatenated to the original signature. The target centroid shape signature (the second input tgp) is then searched for in the replicated source centroid shape signature using a sliding window search. The sliding window allows a search for the most similar alignment to the first input centroid shape signature. Figure 10(a) shows an example of three centroid shape signatures with their corresponding starting points from the same geometry. The measure used to discriminate between sliding windows is the warping distance of DTW. The warping distance quantitatively reflects the similarity between two shapes, where a higher warping distance indicates a lower similarity between the two polygons, and vice versa. It is worth noting that a slower-moving sliding window results in a finer granularity and a supposedly higher quality of search. Having a finer granularity in this context is defined as having a higher number of sliding window steps; thus, the search space is increased and the shape accuracy is expected to be better, though the time complexity is increased. There are a number of ways of measuring the (dis)similarity between two centroid shape signatures. One of the ways is by matching every single point belonging to the first shape signature to a point belonging to the second shape signature in a one-to-one fashion. This is referred to as the Euclidean similarity measure, and it is used in the MBR-Interpolation method. Another approach is DTW, which is referred to as an elastic measure because it finds the best possible alignment between two centroid shape signatures by looking ahead in the later points of the centroid shape signatures to find the best matches (Müller 2007). Unlike the naive Euclidean distance measure, DTW allows matching a point from one centroid shape signature to multiple points in the other in a many-to-one fashion. The point matching technique again relies on the use of DTW, which involves comparing two shape signatures to each other by mapping points of the first centroid shape signature to the second. A cumulative distance is calculated by summing the distance from each point in the first centroid shape signature to the point(s) to which it was mapped in the second centroid shape signature. The result of the DTW is a warping path, which is a list of pairs of coordinates that specify the matching between the two shape signatures. Each pair of coordinates (p1, p2) contains a point p1 from the first shape signature and p2 from the second centroid shape signature. A point may be repeated more than once in the warping path given that DTW creates many-to-many point matching, which means a point from the first or second centroid shape signature can be matched to one or more points. An example of the DTW technique is shown in Figure 8. Every dashed line shows a pair of coordinates that is part of the final warping path. Finally, we used linear interpolation, with a time step t as an input, on every pair of points in the warping path to find all the point coordinates that constitute the interpolated geometry. Similar to MBR-Interpolation, a number of interpolated centroid shape signatures are generated based on a user-defined time cadence and then translated into interpolated polygons. If CP-Interpolation generates a faulty polygon (defined as a topologically invalid polygon such as one having self-intersection), the Areal-Interpolation method is used instead. An illustration of the workflow of the CP-Interpolation is shown in Figure 9and the detailed algorithm is shown in Algorithm 2 of the Appendix.

Figure 8.

Figure 8. Example of DTW alignment.

Standard image High-resolution image
Figure 9.

Figure 9. Workflow for the CP-Interpolation algorithm.

Standard image High-resolution image
Figure 10.

Figure 10. (a) Sliding window over the centroid shape signature of the first geometry, with three different starting points shown in red; (b) fixed shape signature of the second geometry to match.

Standard image High-resolution image

3.3. Filament (FI) Interpolation

FI-Interpolation is a special case of CP-Interpolation adjusted for the filament event type. Filaments are particularly different from our other event types owing to their long and thin structure, which makes it difficult for CP-Interpolation to correctly interpolate their shape. We provide a detailed investigation of the improved interpolation ability of FI-Interpolation over CP-Interpolation for filaments in Section 5.2.1. As presented previously in Filali Boubrahimi et al. (2016b), the FI-Interpolation algorithm starts by detecting the endpoints, also known as the filament feet, which is one of the prominent spatial characteristics of filament events (Wang 2008). The two input filament polygons are then transformed into two centroid shape signatures. In order to align the two input filament polygons represented as centroid shape signatures, DTW is used in a similar way to how it was used in CP-Interpolation. Our interpolation algorithm for filaments is composed of four main parts: (1) endpoint detection, (2) centroid shape signature polygon transformation, (3) point matching, and (4) linear interpolation. The first step of the FI-Interpolation consists of finding the most distant pairs of coordinates that represent the feet. To do so, we use the k-means clustering algorithm (Hartigan & Wong 1979). We first determine the M% most distant coordinates to the centroid of the input filament polygon (where M is a user-defined parameter). The most distant points to the polygon centroid will be the input set to the k-means clustering algorithm that will cluster them into two clusters. Lastly, the most distant coordinates from each respective cluster are selected as endpoints.

The next step is to transform the input filament polygons into centroid shape signatures starting from their respective endpoints. Given the unique boundaries of filaments, it is important to densify boundary points to ensure an accurate signature model, similar to CP-Interpolation.

Another important aspect for centroid shape signatures is the choice of the starting point of the centroid shape signature that will later be used for matching different geometries. One of the properties of filaments is that while moving across the solar disk they will not tilt more than ±40° according to data captured from 1919 to 2014 (Tlatov et al. 2016). In other words, the top endpoint of the first filament geometry is matched to the top endpoint of the second filament geometry.

An illustration of the workflow of FI-Interpolation is shown in Figure 11. The FI-Interpolation takes as input the two starting points of the respective two input polygons to be matched. The first step is to transform all the 2D coordinates of the boundary into 1D centroid shape signature values. Then, DTW is applied to find the best alignment between the two input centroid shape signatures. Finally, point interpolation is applied between every pair of coordinates of the warping path to derive the point coordinates of the interpolated geometries. If FI-Interpolation generates a faulty polygon, Areal-Interpolation is used instead. The algorithm for filament interpolation is shown in Algorithm 4 of the Appendix.

Figure 11.

Figure 11. Workflow for the FI-Interpolation algorithm.

Standard image High-resolution image

3.4. Areal-Interpolation

Areal-Interpolation follows a simpler procedure. Figure 12 shows an example of Areal-Interpolation of two coronal holes. Algorithm 5 in the Appendix provides the detailed algorithm for Areal-Interpolation. To deduce the interpolated area, we applied linear interpolation on the areas of the two input geometries with respect to the corresponding interpolation time, which we call the desired area. The next step consists of choosing the input geometry with the closest temporal proximity that we will later transform into the interpolated one. For example, if the time of interpolation is closer to the first time–geometry pair tg1, then the start geometry is used as the interpolated polygon shape. Finally, in order to reach the desired area, the chosen input geometry is scaled either up or down.

Figure 12.

Figure 12. Example of Areal-Interpolation of two coronal holes from left to right.

Standard image High-resolution image

Areal-Interpolation is a naive method when compared to the CP-Interpolation and FI-Interpolation methods, and it is used only when these methods fail. CP-Interpolation and FI-Interpolation use centroid shape signatures of the geometry that considers the change in the geometric properties of the shapes (rotation, translation, and scale). CP-Interpolation and FI-Interpolation take into account the correctness of the interpolated geometry morphism, location, and area. On the other hand, Areal-Interpolation considers only the correctness of areal and location estimations. Areal-Interpolation does not directly estimate the geometry shapes. It employs a simple spatial scaling on the temporally close input geometry, which makes it naive in predicting the transition shape between the input geometries. Areal-Interpolation has been used less than 1% of the time.

4. Experimental Methodology

An ideal interpolation method should account for the positional accuracy based on the solar rotation, shape morphism, and areal evolution of an event instance. An interpolated geometry is considered completely accurate if it perfectly aligns with an existing ground-truth report at the given interpolation time. Therefore, all of the experiments in this section compare the ground-truth polygons with the interpolated polygons for evaluation. The comparison process starts by iterating through every tracked trajectory and uses the evenly spread pairs (${{ \mathcal P }}_{n}$, ${{ \mathcal P }}_{n+2}$) of geometries to interpolate the skipped geometry ${{{ \mathcal P }}_{n+1}}^{{\prime} }$ and compare it with the ground-truth skipped geometry ${{ \mathcal P }}_{n+1}$ that has been generated by an event recognition module, implemented by solar physicists. An example is shown in Figure 13, where the first interpolated geometry ${{{ \mathcal P }}_{1}}^{{\prime} }$ was generated using ${{ \mathcal P }}_{0}$ and ${{ \mathcal P }}_{2}$, the second interpolated ${{{ \mathcal P }}_{3}}^{{\prime} }$ was generated using ${{ \mathcal P }}_{2}$ and ${{ \mathcal P }}_{4}$, and so on.

Figure 13.

Figure 13. Comparison process of the ground-truth and interpolated geometries of an active region event type.

Standard image High-resolution image

We evaluate the interpolation methods against three criteria: areal, location, and shape accuracy. We selected a number of measures for each category of accuracy criteria, each of which has its own domain and constraints. For areal accuracy, Jaccard and cosine measures were selected. The DTW similarity measure and Fourier transform descriptors were used for shape accuracy. Lastly, a derived aggregate measure was used for location accuracy. In the following subsections, we will detail each measure and explain its usefulness and limitations.

4.1. Areal Accuracy Measures

4.1.1. Jaccard Index

The Jaccard similarity index formula is shown in Equation (4). It is a metric that compares the area of two geometries in a Euclidean space by quantifying the ratio of the shared area between two geometries with respect to their combined areas:

Equation (4)

The Jaccard index considers not only the size of the shared area but also the size of the shared area relative to the size of the combined union area of two geometries, which makes it a scale-invariant measure.

4.1.2. Cosine Measure

The cosine similarity between two geometries has the same numerator as the Jaccard, which is the area of the intersection of the two geometries However, the denominator is the square root of the product of two areas. The formula for the cosine measure is

Equation (5)

4.2. Shape Metrics

4.2.1. Dynamic Time Warping

While DTW was used in CP-Interpolation and FI-Interpolation for matching the polygons' points by aligning their centroid shape signatures, the method can also be used as a shape dissimilarity measure. In addition to the warping path, DTW generates a warping distance between two centroid shape signatures that represents the dissimilarity between these two shapes. The larger the warping distance value, the more discrepancy exists between the two shapes.

DTW arranges the two input centroid shape signatures on the sides of an n × m grid (n and m being the length of the two centroid shape signatures), with one centroid shape signature on the top and the other on the left-hand side. Both of the sequences start on the bottom left of the grid. Within each cell of the grid, a distance value is given based on the corresponding elements of the two sequences. The best alignment between the two centroid shape signatures is acquired by looking for the path from the bottom left corner to the top right corner that minimizes the total incremental distance. The total distance is called the warping distance, and it represents the minimum of the sum of the distances between the individual elements on the path. If the shapes match perfectly, which is the best-case scenario, there is a one-to-one pairwise matching between the two sequences, which results in a zero warping distance. Figure 14(a) summarizes the steps involved in the centroid shape signature and DTW shape similarity measure.

Figure 14.

Figure 14. Steps involved in the shape-based measures using (a) centroid shape signature and (b) Fourier transform, both with DTW.

Standard image High-resolution image

4.2.2. Fourier Transform

While measuring the similarity with DTW helps us identify the similarity of two geometries based on their shapes, there are a couple of weaknesses of this method when not applied properly. The first constraint of the DTW similarity approach is that it is sensitive to choosing the right starting points of both geometries.

Similarly, centroid shape signatures may vary based on the starting point. Furthermore, since each centroid shape signature is a local representation of shape attributes extracted from the spatial domain, it is sensitive to noise. To reduce the limitation of the centroid shape descriptor, one more step can be taken before applying the DTW, which is the transformation of the shape signature from the spatial domain to the frequency domain using Fourier descriptors (Zhang et al. 2001). An advantage of this is the robustness to noise that is attenuated in the frequency domain.

The Fourier transform applied to a discrete series of complex values, which is, in this case, the centroid shape signature, is called Discrete Fourier Transform. Figure 14(b) summarizes the steps involved in the Fourier descriptor measure. Generally, a 1D Fourier descriptor is obtained through Fourier transform on a centroid shape signature function derived from the point coordinates of the boundary (x(t), y(t)), such that $n\in 0,1,2,\,...,\,N-1$. A typical shape signature function is the centroid distance function r(t), which is used in this work. Consider r(t) to be a complex series with N samples of the form x0, x1, x2, ..., xN − 1, where x is a complex number $x={x}_{\mathrm{real}}+{x}_{\mathrm{imaginary}}$. The Fourier transform of the series is defined by

Equation (6)

Equation (7)

However, if this equation is applied as is, it will still be starting point sensitive. To make the starting point an independent variable, the coefficients returned by the Fourier transform need to be normalized, which is achieved by dividing all of the coefficients by the first nonzero frequency component (Fourier coefficient) as expressed in Equation (7). Figure 15(a) illustrates how four centroid shape signatures, which are out of phase, are represented with the same normalized Fourier descriptor as shown in Figure 15(b). This suggests that the starting point no longer impacts the sequence of Fourier coefficients.

Figure 15.

Figure 15. (a) Four out-of-phase centroid shape signatures of the same geometry with their (b) Fourier transform in the frequency domain.

Standard image High-resolution image

4.2.3. Fréchet Distance Metric

The Fréchet distance is a metric that assesses the similarity of the curvatures of the two shapes, as well as the location and ordering of the points along the two input curves. It is different from the DTW and Fourier measures because the latter do not take into account the location context of the input curves. Fréchet is often referred to in the literature as the "dog-man" measure (Alt & Godau 1995). This refers to the popular analogy of a man walking with a dog on a leash, each of them walking on a different path with different trajectories. The dog and the man can vary their speeds and stop but cannot walk backward. The Fréchet metric is the minimum leash length required to complete the traversal of both curves. Formally, the Fréchet distance between two curves A and B, where α and β are arbitrary continuous nondecreasing functions with [0, 1] domain, is given by

Equation (8)

The Fréchet distance in this context refers to the leash length between the boundaries of the two different input geometries, respectively. The distance between the centroid shape signatures represents the similarity between the interpolated and ground-truth geometries. We show an example of the different leash lengths between the trajectories of a dog and a man at different time steps in Figure 16. Here, the Fréchet distance happens to be the final leash length, which is the smallest leash length required to be able to jointly traverse both curves. The Fréchet distance between two geometries is often better than other measures that do not take into account the ordering of the points along the shape geometries, such as Hausdorff, which we will discuss later (Eiter & Mannila 1994).

Figure 16.

Figure 16. Fréchet distance between a man and a dog.

Standard image High-resolution image

4.3. Location Distance Metrics

Location accuracy measures are particularly useful when the ground-truth and interpolated geometries do not overlap in space. Since there is no intersection between the geometries, their Jaccard index value is zero. A total miss in spatial position can still be acceptable when the geometry shapes are similar and they have reasonably close proximity in space. Geometries that have high overlap and different shapes should be considered as good as the geometries that are similar in terms of shape and spatially close (but not intersecting), an example of which is shown in Figure 17. For the first case shown in Figure 17(a), while the interpolated and ground-truth geometries are spatially close (which results in a high location accuracy), they do not overlap. Alternatively in Figure 17(b), the interpolated geometry's accuracy is penalized by the shape measures; however, the location and areal accuracies are relatively high.

Figure 17.

Figure 17. Ground-truth and interpolated geometries of an active region that are (a) shapewise similar and nonoverlapping and (b) dissimilar shapewise but overlapping.

Standard image High-resolution image

4.3.1. Simple Distance Measures

The simple distances are three nonaggregated measures: minimum (min), centroid (ctr), and maximum (max) distances. The minimum distance refers to the minimum pairwise Euclidean distance between the two geometries. When two geometries are intersecting, the minimum distance is zero and the Jaccard value of the two geometries is nonzero. The second measure is the centroid distance, which refers to the Euclidean distance between the two geometries' centroids. The maximum pairwise distance refers to the maximum Euclidean distance between the vertices of two geometries.

4.3.2. Aggregated Min–Max Distance Measure

The formula for the min–max metric is given in Equation (9). It is a Euclidean metric that averages the minimum distance Dmin and the maximum distance Dmax between the interpolated and ground-truth geometries:

Equation (9)

4.3.3. Aggregated Min-centroid Distance Measure

Another aggregated location accuracy measure is the min-centroid measure, which is a Euclidean metric that averages the minimum and centroid distance between the interpolated and ground-truth geometries. The formula for the min-centroid metric is given by

Equation (10)

4.3.4. Hausdorff Distance Measure

The Hausdorff distance is a metric that is used to assess how far two subsets of a metric space are from each other (Huttenlocher et al. 1993). Formally, the Hausdorff distance between two given finite point sets $A=\{{a}_{1},\ldots ,{a}_{p}\}$ and B = {b1, ..., bq} is defined by

Equation (11a)

Equation (11b)

The $| | .| | $ symbol refers to the Euclidean norms of the points in A and B. The function $h(A,B)$ is referred to as the directed Hausdorff distance that finds a point ${a}_{i}\in A$, with the lowest pairwise Euclidean distance to the set of points in B. Then, it finds the point ${b}_{j}\in B$ such that ai and bj have the highest Euclidean distance. If h(a, b) = d, then there is a guarantee that each point of A is within distance d to B. Since the directed Hausdorff distance h is not a symmetric function, i.e., $h(A,B)\ne h(B,A)$, this metric space property is satisfied by taking the maximum of both h(B, A) and h(A, B) distances.

5. Experimental Results

In Section 4, we presented a total of 11 measures, which we grouped into three categories. First, there are measures that assess the similarity between the interpolated and ground-truth geometries in terms of area (Jaccard and cosine). Next, we discussed another group of measures that aim to assess the similarity from a location accuracy perspective (min, ctr, max, Hausdorff, minMax, and minCtr). Finally, we discussed the group of shape similarity measures (DTW, Fourier, and Fréchet).

In this section, we will use MBR-Interpolation as the baseline interpolation methodology that we compare with CP-Interpolation and FI-Interpolation for active region (AR), coronal hole (CH), sunspot (SS), and filament (FI) event types. To do so, we will compare the distributions of the chosen subset of measures when the baseline method is used versus when CP-Interpolation and FI-Interpolation are used.

5.1. Similarity Measures Study

It is crucial to choose the right similarity measures that assess the strength of the match between the expected and the ground-truth geometries. An ideal interpolation method is one that preserves positional accuracy with respect to the rotation of the solar disk, as well as the shape morphism and areal evolution of the event geometry. Table 3 shows the summary of the advantages and disadvantages of all the measures.

To better understand how the measures across three different groups are correlated for every event type, we present their correlation matrices (generated using the Pearson correlation coefficient) in Figures 2630 in the Appendix. It can be seen from the correlation matrices that some correlations generally persist across different event types, though the correlation strength is not exactly the same. In the next subsections we will study and justify our choice of measures across the three groups of our measures.

5.1.1. Areal Measures

For the areal accuracy measures, it appears that Jaccard and cosine are highly positively correlated with a correlation strength domain of [0.94, 0.98]. This is quite possible as Jaccard and cosine share the same numerator as shown in Equations (4) and (5). While Jaccard measures the interpolated and ground-truth geometries' area of intersection with respect to their union area, cosine measures the area of intersection with respect to the geometric mean of the two geometry areas. Jaccard is relatively pessimistic compared to the cosine measure because, generally, the geometric mean of the geometries is smaller than the areal union of the geometries. However, there are distinct advantages of using Jaccard and cosine. When the similarity is measured between a large geometry and a smaller geometry, Jaccard will be more pessimistic than the cosine measure. This suggests that the cosine measure is assessing the areal similarity between imbalanced shapes more accurately than Jaccard. On the other hand, when assessing the similarity between geometries with similar areal characteristics, it makes more sense to use a pessimistic similarity measure.

5.1.2. Location Measures

Several interesting correlations occur among the location-based similarity measures. There are two strong correlations that persist over all the event types that are between the ctr and minCtr distances (whose correlation is always 1) and max and minMax with a domain of strengths of correlations of [0.98, 1]. The reason why the aggregated measures (minCtr and minMax) are correlated is because the min measure values are negligible compared to the max and the ctr values. Besides the fact that the min distribution is skewed to the right, with most of the values being zero, it does not provide any insight on the location accuracy when the two geometries intersect. The latter information can be inferred from the Jaccard and the cosine measures if their value is zero. While the min measure can particularly be useful for providing a location accuracy assessment when the geometries are disjoint, this assessment can be biased toward both large and small geometries. Therefore, due to these limitations, we excluded min as a similarity measure appropriate for this work.

minCtr is also negatively correlated with the cosine measure where the strength of correlation is between [−0.84, −0.58]. This negative correlation signifies that the higher the shared area between the interpolated and ground-truth polygons is, the lower their aggregated min and ctr distances.

Figure 18.

Figure 18. MBR-Interpolation on a chain code of a polygon.

Standard image High-resolution image

The correlation strength between the cosine and minCtr is particularly weak in the case of filament (FI) event type, as shown in Figure 19, which can be explained by the unique physical characteristics of filament event type. Since filaments are relatively thin polygons and contain barbs that originate from their long central spine, it can be challenging to generate an interpolated shape with a centroid that closely matches the ground-truth polygon's centroid. In other words, a high overlap in the area, between the interpolated and ground-truth shapes, may still not guarantee that the two geometries are similar based on distance measures. For example, a filament head may highly overlap with an interpolated filament tail, which makes the cosine value high, but since the head and tail of the two geometries are far, the minCtr distance measure is high as well.

Figure 19.

Figure 19. Correlation matrix of similarity measures for filament (FI) events.

Standard image High-resolution image

On the other hand, the sigmoid event type's spatial location is always reported using the minimum bounding rectangle of the event. The rectangular nature of the sigmoid's reports implies that a centroid is located in the middle of the shape with equal distance from each point on one side to its counterpoint on the other side. These properties ensure that having a high spatial overlap in rectangular geometries, when represented as MBRs (i.e., high cosine), leads to a low minCtr distance, as shown in Figure 20 and Figure 30 in the Appendix.

Figure 20.

Figure 20. Correlation matrix of similarity measures for sigmoid (SG) events.

Standard image High-resolution image

It is also worth noting that there exists a positive correlation between the ctr and max distances, which are high for the case of small- to medium-sized event types such as sigmoids, flares, and sunspots (0.54, 0.84, and 0.53 correlation scores, respectively) and lower for event types with a larger area, such as coronal holes and active regions (0.39 and 0.36 correlation scores, respectively). From the former correlation, it can be noticed that the max measure is biased toward the size of the events. An example of this bias is when the max distance is the same between two large and two small events. The max distance value may be very significant for the case of small events that are very far apart; however, the same value of the max distance could be negligible for the case of two large events that are close in proximity or intersecting and having a large max distance. Another observation about the max measure is that for the case of small events, it can be correlated with the ctr measure; however, for the case of larger events, the distance is inherently larger. As a matter of fact, the variability between the max and the ctr measures is dependent on the size of the event, which does not give an appropriate assessment of the interpolation quality. For that reason, the max measure is biased and is not a good candidate measure for this application.

Another aggregated measure is the minMax, which takes the mean of the max and the min measures. It is also not an appropriate measure for this application since it combines the aforementioned limitations of the min and the max measures. The centroid (ctr) measure provides the right insight about the spatial proximity of the geometries without being biased toward event sizes. However, minCtr is more informative and thus more powerful than ctr. minCtr not only measures the spatial proximity of the geometries' centroids but also takes into account whether the geometries are overlapping or not. Therefore, we chose to use minCtr as our location-based accuracy measure.

5.1.3. Shape Measures

For the last group of shape-based similarity measures, it can be noticed that there is no strong correlation of DTW and the other measures that persist over all the event types. Though DTW is sometimes correlated with the location-based distance measures, as in the case of flares and emerging fluxes in Figures 30 and 29 in the Appendix, it is generally not correlated with any other measure. On the other hand, Fourier is not well correlated with any measure across all the event types. The Fréchet distance is positively correlated with all the distance measures, except the min, for most of the event types. Therefore, in the shape-based similarity measures, we will keep DTW and Fourier measures.

5.2. Properties of Investigated Similarity Measures

5.2.1. Correlation

The last round of experiments aims to compare the performance of our interpolation algorithms with the baseline MBR-Interpolation algorithm. We will also compare the CP-Interpolation and FI-Interpolation methodologies to show the need for the FI-Interpolation algorithm. For the event types that report spatial location using a boundary polygon (chain code), we used MBR-Interpolation on the MBR of the input geometry chain code and compared the interpolated MBR with the ground-truth geometry chain code. Figure 18 shows the baseline MBR-Interpolation process used for comparison with the CP-Interpolation and FI-Interpolation techniques. Figure 21 shows the distribution of the five similarity measures using the baseline MBR-Interpolation method and FI-Interpolation or CP-Interpolation. Our goal in this set of experiments is to show that the baseline MBR-Interpolation method is not sufficient for accurate interpolation and that our algorithms provide a better representation in terms of shape, area, and location.

Figure 21.

Figure 21. Histograms of Jaccard, cosine, DTW, Fourier, and minCtr measures using MBR-Interpolation and CP-Interpolation.

Standard image High-resolution image

The first column of Figure 21 shows the Jaccard value distribution of all the event types that are initially reported using their chain codes. It can be noticed that all the events, except filaments, have Jaccard values for the MBR-Interpolation method, with their mean generally centered around [0.4, 0.5], while our CP-Interpolation method's Jaccard values are negatively skewed and reach much higher values. The Jaccard values most frequently fall in the [0.85, 0.9] range for active region, coronal hole, and sunspot event types, respectively. On the other hand, for filament event type, the Jaccard values fall most frequently around 0. Zero Jaccard values signify that there are cases where the interpolated and ground-truth geometries are neither intersecting nor touching, which we explained earlier as being related to the unique shape of filaments and their infrequent reports. We note that the same behavior as Jaccard value distribution is noticed in the cosine measure, as they are highly correlated.

The DTW distribution is positively skewed for all event types having the mean of the distribution in the first bin, including filaments. This suggests that although the filament geometries are not spatially intersecting, they do look similar in terms of shape. For the Fourier measure, MBR-Interpolation values are much higher than CP-Interpolation and FI-Interpolation values, except for the sunspot (SS) event type. As shown in Table 4, sunspot instances represent the smallest events in the data set. They are also generally circular in shape and represented by an extremely dense boundary in terms of total point coordinates, which makes it challenging to find the right starting point and estimate the right shape. Therefore, even though DTW is helpful in regard to the shape similarity for sunspots, it relies on the time series sliding window to find the best matching centroid shape signature, which by itself is an approximate process. The centroid shape signature matching process can be faulty sometimes, especially for the case of round shapes, as a lot of sequence of points share relatively the same radius distances to the centroid of the shape.

Table 3.  Summary of the Evaluation Measures and Their Usefulness for Interpolation of Solar Events

  Measures Advantages Disadvantages Interval Useful
Areal Jaccard Useful for assessing shapes with similar characteristics Does not assess well largely overlapping geometries that are small and very large. [0, 1]
  Cosine More useful for assessing areal similarity between imbalanced shapes It does not penalize shapes that have highly different areas [0, 1]
Distance Min Useful in case of nonoverlapping geometries Biased toward small area geometries and unfair for irregularly shaped (with lots of concavities) large geometries. $[{\bf{0}},+\infty )$ ×
  Max Useful for almost perfectly overlapping shapes Biased against highly overlapping large geometries and favors small ones. $[{\bf{0}},+\infty )$ ×
  Ctr Useful in case of nonoverlapping geometries Slightly biased toward the area of the geometries. $[{\bf{0}},+\infty )$ ×
  minMax Useful for very near but nonoverlapping geometries Biased toward events with high area. $[{\bf{0}},+\infty )$ ×
  minCtr Useful for nonoverlapping geometries Biased against geometries with high area. $[{\bf{0}},+\infty )$
  Hausdorff Calculates the relative closeness of all the points from one geometry to the points of the other geometry It similar to max. it is not scale-invariant, large geometries are unfairly assessed. $[{\bf{0}},+\infty )$ ×
Shape DTW Good for shape similarity in case the starting point is well selected, but it is not always the case especially for AR that have many point coordinates. Rotation variant, needs shape alignment, and vulnerable to the changes in starting points. $[{\bf{0}},+\infty )$
  Fréchet Assess the curvature similarity Rotation invariant and scale-invariant. $[{\bf{0}},+\infty )$ ×
  Fourier Starting point and rotation invariant Possible loss of information when transforming to finite frequency domain $[{\bf{0}},+\infty )$

The bold values are the desirable or optimal values of the given measures.

Download table as:  ASCIITypeset image

Finally, minCtr values are significantly improved when CP-Interpolation or FI-Interpolation is used compared to MBR-Interpolation, with the exception of coronal holes. This finding coincides with the fact that coronal holes are by far the largest structure on average (Table 4) and almost twice as big as the second-largest event type (i.e., sigmoids1 ) in the data set. For the case of large events, having even small erroneous fluctuations at the level of the polygon chain code can quickly change the location of the centroid and mislead the whole interpolation process. On the other hand, when MBR-Interpolation is used for the case of large events, the location accuracy is improved at the price of shape accuracy. Therefore, it is hard to estimate the centroids of coronal hole regions with high precision using CP-Interpolation.

To this end, the performances of FI-Interpolation and the general CP-Interpolation were compared using the same set of similarity measures used to evaluate the performance of the interpolation techniques. Figure 22 shows the distribution of the five best similarity measures (see Table 3 for our choices) for FI-Interpolation and CP-Interpolation when applied on the filaments. The goal of this experiment is to provide an explanation of why FI-Interpolation was used for filament event type, as opposed to CP-Interpolation. The first observation that can be made from Figure 22 is that the distributions of Jaccard and cosine measures are positively skewed, which is expected owing to the unique characteristics of the filaments (i.e., thin and long), as indicated previously. Another observation is that FI-Interpolation performs better than CP-Interpolation across all five measures for filaments. While the improvements are marginal for the case of DTW, they are significant for the rest of the measures. This suggests that both FI-Interpolation and CP-Interpolation are more accurate at estimating the shapes of the interpolated polygons than MBR-Interpolation. Additionally, FI-Interpolation is the most accurate in estimating the location of the polygons and the areal coverage of the events. We can conclude that FI-Interpolation performance encompasses CP-Interpolation.

Figure 22.

Figure 22. Histograms of Jaccard, cosine, DTW, Fourier, and minCtr measures with FI-Interpolation vs. CP-Interpolation.

Standard image High-resolution image

Table 4.  Characteristics of the Solar Event Data Sets with Their Corresponding Trajectory Life Spans and Subtrajectory (Segment) Life Spans in Hours

Event Type Tag Total Segments Total Trajectories Avg. Trajectory Life Span Avg. Segment Life Span
Active region AR 37,175 6394 17.00 8.00
Coronal hole CH 2100 30,106 285.00 8.00
Emerging flux EF 1967 12,262 26.00 14.00
Filament FI 2430 2212 116.00 57.00
Flare FL 3357 11,645 0.86 0.36
Sigmoid SG 6362 15,352 18.00 6.00
Sunspot SS 8353 1180 172.00 13.00

Download table as:  ASCIITypeset image

5.2.2. Density

The density plots of the five chosen similarity measures are shown for each event type in Figures 2325. All the values have been normalized using z-score normalization. It can be noticed that the density function of Jaccard is generally negatively skewed for most of the event types, with the exception of filaments. The Jaccard density function is skewed to the right for filaments, which means that most of the Jaccard values are between 0 and 0.5. This is relatively low compared to the other event types. Similarly, the cosine density is also positively skewed for filaments, which suggests that cosine's values have a tendency to be low. The reason behind this tendency can be twofold: (1) the minimum reporting interval of filaments from the BBSO and Kanzelhoehe observatories (12 hr), which is a long period relatively to the other event types, and (2) very few intersections of the ground-truth and interpolated geometries, which seems to occur mainly owing to the unique spatial properties of the filament shapes (thin and long). For the shape-based similarity measures, namely, DTW and Fourier, we can see that all of the density functions are positively skewed. The density functions are skewed because the original data distribution of the measures also follows a skewed distribution. Similarly, the minCtr density function is also positively skewed for the case of all the event types between −σ and μ.

Figure 23.

Figure 23. Density function of the five similarity measures for Filament Polygon Interpolation (FI-Interpolation) of the filament (FI) event type.

Standard image High-resolution image
Figure 24.

Figure 24. Density function of the five similarity measures for the MBR-Interpolation of the following event types: sigmoid (SG), emerging flux (EF), and flare (FL).

Standard image High-resolution image
Figure 25.

Figure 25. Density function of the five similarity measures for CP-Interpolation of the following event types: active region (AR), coronal hole (CH), sunspot (SS), and filament (FI).

Standard image High-resolution image
Figure 26.

Figure 26. Correlation matrix of similarity measures for active region (AR) events.

Standard image High-resolution image
Figure 27.

Figure 27. Correlation matrix of similarity measures for coronal hole (CH) events.

Standard image High-resolution image
Figure 28.

Figure 28. Correlation matrix of similarity measures for sunspot (SS) events.

Standard image High-resolution image
Figure 29.

Figure 29. Correlation matrix of similarity measures for emerging flux (EF) events.

Standard image High-resolution image
Figure 30.

Figure 30. Correlation matrix of similarity measures for flare (FL) events.

Standard image High-resolution image

6. Conclusions and Future Works

In this paper, we introduced four interpolation methods that enrich the tracked evolving region trajectories of solar events, which are reported every day by the FFT modules (Martens et al. 2011). Given the wide variability of spatiotemporal characteristics across different solar event types, as well as the outputs and reporting frequencies of all event detection modules, this work is crucial to enrich and standardize the solar event metadata for more robust and reliable advanced data-driven research efforts.

We developed four algorithms to ensure as accurate interpolation as possible, depending on the event type. When the only spatial information available for an event instance is its minimum bounding rectangle (MBR), we use the MBR-Interpolation method. CP-Interpolation is the most generally applicable interpolation method, and it is also more robust in interpolating the locations properly. CP-Interpolation makes use of centroid shape signatures and dynamic time warping to accurately interpolate complex polygons over time. FI-Interpolation takes advantage of the spatial characteristics unique to the filament event type to more efficiently and accurately estimate the shape of the interpolated polygons. Finally, Areal-Interpolation is used in the case when CP-Interpolation and FI-Interpolation fail to generate valid polygon geometries.

This is the first research effort that addresses the problem of solar event instance interpolation that includes the spatial characteristics of each event type in the process. As a future direction of this work, we plan to create a large-scale solar event trajectory data set using our tracking and interpolation techniques, including spatiotemporal co-occurrence pattern and sequence mining, content-based image (and region) retrieval systems, and various solar data science endeavors.

We would like to thank our anonymous reviewers for their valuable, very detailed, and constructive feedback.

This project has been supported in part by funding from the Division of Advanced Cyberinfrastructure within the Directorate for Computer and Information Science and Engineering, the Division of Astronomical Sciences within the Directorate for Mathematical and Physical Sciences, and the Division of Atmospheric and Geospace Sciences within the Directorate for Geosciences, under NSF award no. 1443061. It was also supported in part by funding from the Heliophysics Living With a Star Science Program, under NASA award no. NNX15AF39G.

The SDO data are available courtesy of NASA/SDO (https://sdo.gsfc.nasa.gov/) and the Atmospheric Imaging Assembly (AIA), Extreme Ultraviolet Variability Experiment (EVE), and Helioseismic and Magnetic Imager (HMI) science teams.

If you are searching for papers describing details on particular SDO instruments, as well as the SDO data pipeline, we would recommend using this resource: https://sdo.gsfc.nasa.gov/mission/publications.php.

Appendix:  

Figures 2630 show how the measures across three different groups are correlated for every event type. The figures contain their correlation matrices (generated using the Perarson correlation coefficient).

Algorithm 1. MBR-Interpolation points alignment

Input: Two input polygons - ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}$.
Output: centroid shape signature representation of input polygons ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}-{{\mathfrak{TS}}}_{{\mathfrak{s}}}$ and ${{\mathfrak{TS}}}_{{\mathfrak{e}}}.$
1: ${{centroid}}^{s}\ \leftarrow $ GetCentroid$({{ \mathcal P }}^{s})$
2: ${{centroid}}^{e}\ \leftarrow $ GetCentroid$({{ \mathcal P }}^{e})$
3: $i\ \leftarrow 0$
4: for all vertex ps in ${{ \mathcal P }}^{s}$ (starting from ${{ep}}^{s^{\prime} }$) do
5: ${{\mathfrak{TS}}}_{{\mathfrak{s}}}[i]\ \leftarrow $ GetDistance(centroids, ps)
6: $i\ \leftarrow i+1$
7: end for
8: $i\ \leftarrow 0$
9: for all vertex pe in ${{ \mathcal P }}^{e}$ (starting from ${{ep}}^{e^{\prime} }$) do
10 ${{\mathfrak{TS}}}_{{\mathfrak{e}}}[i]\ \leftarrow $ GetDistance(centroide, pe)
11: $i\ \leftarrow i+1$
12: end for
13: return $\langle {{\mathfrak{TS}}}_{{\mathfrak{s}}}$, ${{\mathfrak{TS}}}_{{\mathfrak{e}}}\rangle $

Algorithm 2. Complex Polygon Interpolation

Input: Two input polygons and the sliding factor - ${{ \mathcal P }}^{s}$, ${{ \mathcal P }}^{e}$ and SF.
Output: Centroid shape signature representation of ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}-{{\mathfrak{TS}}}_{{\mathfrak{s}}}$ and ${{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }$.
1: ${{\mathfrak{TS}}}_{{\mathfrak{s}}}\ \leftarrow $ ConvertToShapeSignatures$({{ \mathcal P }}^{s})$
2: ${{\mathfrak{TS}}}_{{\mathfrak{e}}}\ \leftarrow $ ConvertToShapeSignatures$({{ \mathcal P }}^{e})$
3: ${{\mathfrak{TS}}}_{{\mathfrak{s}}}^{{\mathfrak{2}}}\ \leftarrow $ ShapeSignaturesConcatinate$({{\mathfrak{TS}}}_{{\mathfrak{s}}})$
4: ${{\triangleright \mathfrak{TS}}}_{{\mathfrak{s}}}^{{\mathfrak{2}}}$ is the concatenation of ${{\mathfrak{TS}}}_{{\mathfrak{s}}}$ and ${{\mathfrak{TS}}}_{{\mathfrak{s}}}$
5: ${end}\ \leftarrow $ 2 * GetNumberOfVertices$({{\mathfrak{TS}}}_{{\mathfrak{s}}})$
6: ${\triangleright }$ end this is twice the length of the shape signature
7: ${w}_{e}\ \leftarrow $ GetNumberOfVertices$({{\mathfrak{TS}}}_{{\mathfrak{s}}})$
8: ${w}_{s}\ \leftarrow 0$
9: $i\ \leftarrow 0$
10: while ws < end do
11: ${window}\ \leftarrow $ GetSubShapeSignatures $({{\mathfrak{TS}}}_{{\mathfrak{s}}}^{{\mathfrak{2}}},{w}_{s},{w}_{e})$
12: ${ \mathcal W }{ \mathcal D }[i]\ \leftarrow $ GetWarpingDistance$({window},{{\mathfrak{TS}}}_{{\mathfrak{e}}})$
13: ${w}_{s}\ \leftarrow {w}_{s}+{SF}$
14: ${w}_{e}\ \leftarrow {w}_{e}+{SF}$
15: $i\ \leftarrow i+1$
16: end while
17: ${tsi}\ \leftarrow $ IndexOfMinimum $({ \mathcal W }{ \mathcal D })$
18: ${\triangleright }$ tsi corresponds to the index of the centroid shape signature that best matches ${{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\mathfrak{2}}}$
19: ${{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }\ \leftarrow $ GetSubShapeSignatures $({{\mathfrak{TS}}}_{{\mathfrak{s}}}^{{\mathfrak{2}}},{tsi}\ast {SF},{tsi}\ast {SF}\ast {w}_{e})$
20: return $\langle {{\mathfrak{TS}}}_{{\mathfrak{s}}},{{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }\rangle $

Algorithm 3. Polygon Linear Interpolation at time t

Input: centroid shape signature representation of the first input polygon ${ \mathcal P }$ - ts0, centroid shape signature representation of the second input polygon ${{ \mathcal P }}_{2}$ - ts2
Output: Interpolated polygon ${{ \mathcal P }}_{1}$
1: ${ \mathcal W }\ \leftarrow $ GetWarpingPath(ts0 ,ts2)
2: for all ${{pt}}_{0},{{pt}}_{2}\in { \mathcal W }$ do
3: ${{pt}}_{1}.x\ \leftarrow x={{pt}}_{1}.x+\tfrac{(t-{t}_{1})}{({t}_{2}-{t}_{1})}* ({{pt}}_{2}.x-{{pt}}_{1}.x)$
4: ${{pt}}_{1}.y\ \leftarrow y={{pt}}_{1}.y+\tfrac{(t-{t}_{1})}{({t}_{2}-{t}_{1})}* ({{pt}}_{2}.y-{{pt}}_{1}.y)$
5: end for
6: return ${{ \mathcal P }}_{1}$

Algorithm 4. Filament Polygon Interpolation

Input: Two input polygons - ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}$, M - percentage of furthest points to be taken
Output: Centroid shape signature representation of ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}$ - ${{\mathfrak{TS}}}_{{\mathfrak{s}}}$ and ${{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }$.
1: ${{centroid}}^{s}\ \leftarrow $ GetCentroid$({{ \mathcal P }}^{s})$
2: ${{centroid}}^{e}\ \leftarrow $ GetCentroid$({{ \mathcal P }}^{e})$
3: $\langle {{ep}}^{s1},{{ep}}^{s2}\rangle \ \leftarrow $ DetectEndpoints(${{ \mathcal P }}^{s}$,M)
4: $\langle {{ep}}^{e1},{{ep}}^{e2}\rangle \ \leftarrow $ DetectEndpoints(${{ \mathcal P }}^{e}$,M)
5: $\langle {{ep}}^{s^{\prime} },{{ep}}^{e^{\prime} }\rangle $ PairEndpoints$({{ \mathcal P }}^{s},{{ \mathcal P }}^{e})$
6: ${\triangleright\ {ep}}^{s^{\prime} }$ and ${{ep}}^{e^{\prime} }$ are the matched endpoints of ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}$
7: $i\ \leftarrow 0$
8: for all vertex ps in ${{ \mathcal P }}^{s}$ (starting from ${{ep}}^{s^{\prime} }$) do
9: ${{\mathfrak{TS}}}_{{\mathfrak{s}}}[i]\ \leftarrow $ GetDistance(centroids, ps)
10: $i\ \leftarrow i+1$
11: end for
12: $i\ \leftarrow 0$
13: for all vertex pe in ${{ \mathcal P }}^{e}$ (starting from ${{ep}}^{e^{\prime} }$) do
14: ${{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }[i]\ \leftarrow $ GetDistance(centroide, pe)
15: $i\ \leftarrow i+1$
16: end for
17: return $\langle {{\mathfrak{TS}}}_{{\mathfrak{s}}},{{\mathfrak{TS}}}_{{\mathfrak{e}}}^{{\prime} }\rangle $
procedure DetectEndpoints(${ \mathcal P }$, M)
1: ${ctr}\ \leftarrow { \mathcal P }.{centroid}$
2: for all ${p}_{i}\in { \mathcal P }$ do
3: ${{dist}}_{i}\ \leftarrow $ GetDistancepi, ctr
4: ${ \mathcal S }\ \leftarrow { \mathcal S }\cup {{dist}}_{i}$
5: end for
6: ${ \mathcal S }\ \leftarrow $ Sort $({ \mathcal S })$
7: ${ \mathcal S }^{\prime} \ \leftarrow $ GetFurthest $({ \mathcal S }$, M)
8: ${\triangleright }$ M percentage of furthest points to be taken
9: $\langle {{cl}}_{1},{{cl}}_{2}\rangle \ \leftarrow $ 2-Means $({ \mathcal S }^{\prime} )$
10: ${{ep}}_{1}\ \leftarrow $ GetMaxDistance(cl1, ctr)
11: ${{ep}}_{2}\ \leftarrow $ GetMaxDistance(cl2, ctr)
12: return $\langle {{ep}}_{1},{{ep}}_{2}\rangle $
13: end procedure

Algorithm 5. Areal-Interpolation

Input: Two input polygons - ${{ \mathcal P }}^{s}$ and ${{ \mathcal P }}^{e}$, occurrence time of the two input polygons - ts and te , interpolation time - t, buffer distance bd and small buffer distance epsilon.
Output: Interpolated polygon - ${ \mathcal P }$
1: ${lifespan}\ \leftarrow {t}_{e}-{t}_{s}$
2: if $t\geqslant t+\tfrac{{lifetime}}{2}$ then
3: ${ \mathcal P }\ \leftarrow {{ \mathcal P }}^{e}$
4: else
5: ${ \mathcal P }\ \leftarrow {{ \mathcal P }}^{s}$
6: end if
7: ${desiredArea}\ \leftarrow $ GetInterpolation$({{ \mathcal P }}^{s}.{area},{{ \mathcal P }}^{e}.{area})$
8: if ${desiredArea}=={ \mathcal P }.{area}$ then return ${ \mathcal P }$
9: else if ${desiredArea}\gt { \mathcal P }.{area}$ then
10: while ${ \mathcal P }.{area}\lt {desiredArea}$ do
11: ${ \mathcal P }.{buffer}({bd})$
12: if ${ \mathcal P }.{area}\gt 0.8\ast {desiredArea}$ then
13: ${bd}=\epsilon $
14: end if
15: end while
16: else
17: while ${ \mathcal P }.{area}\gt {desiredArea}$ do
18: ${ \mathcal P }.{buffer}(-{bd})$
19: if ${ \mathcal P }.{area}\lt 1.2\ast {desiredArea}$ then
20: ${bd}=\epsilon $
21: end if
22: end while
23: end if
procedure LinearInterpolation(area1,ts,area2,ts,ts)
1: ${area}1\ \leftarrow {area}1+\tfrac{t-{t}_{s}}{{t}_{e}+{t}_{s}}$
2: return area1
3: end procedure

Footnotes

  • The sigmoids' polygon boundaries are not reported, and their MBRs are not tight (i.e., not truly minimal); therefore, they are reported as large structures.

Please wait… references are loading.
10.3847/1538-4365/aab763