Event classification of volcanic earthquakes based on K-Means clustering: Application on Anak Krakatau Volcano, Sunda Strait

It is important to quickly recognize any physical changes in volcanology and accompanying phenomena at each stage of an eruption in terms of mitigating volcanic eruptions. Automatic classification of the type of volcanic earthquake is required, especially since the data recorded by seismic equipment is classified as big data. Analyzing big data manually will take a lot of time. Therefore, we use unsupervised machine learning such as K-means clustering to generate an automated system of classifying the volcanic events based on their waveform and spectrum characteristics. We examine the clustering of volcanic earthquakes at Anak Krakatau volcano, Sunda Strait during June to July 2014. We use one seismic station which is KRA4 to calculate the K-means clustering at Anak Krakatau volcano. We apply unsupervised machine learning such as K-means clustering to classify volcanic earthquakes. We successfully applied the K-means clustering method and found three clusters that represent the volcanic earthquake types based on the characteristics of the waveform in time and frequency domains. We observed different waveform and frequency characteristics for different clusters. The result is Cluster 1 is characterized by rapid increases in a few seconds, then gradual decreases with time, and the frequency dominant range of 4-4.7 Hz. Cluster 2 is characterized by gradual increases in a few seconds, then gradual decreases with time, and the frequency dominant range of 6-6.5 Hz. Cluster 3 is characterized by gradual increases in a few seconds, then gradual decreases with time in longer duration, and the frequency dominant range of 7-7.5 Hz. This study is useful to automatically classify the big data of daily volcanic activity that is generated continuously to mitigate the volcanic hazard.


Introduction
Anak Krakatau is a volcanic island located in the Sunda Strait, between the Indonesian islands of Java and Sumatra.It emerged from the sea following a series of powerful volcanic eruptions and a massive collapse of the pre-existing Krakatoa volcano in 1883.The 1883 eruption of Krakatoa was one of the most catastrophic volcanic events in recorded history, with significant global impacts [1,2].The Sunda Strait, where Anak Krakatau is located, is a narrow waterway connecting the Java Sea to the Indian Ocean.Due to its location and geological setting, the region is known for its high volcanic and tectonic activity.Anak Krakatau is considered a relatively young and active volcano, forming as a consequence of ongoing volcanic activity in the region.It has continued to grow since its emergence in the early 20th century through regular eruptions, with periods of varying intensity.Recently, the Anak Krakatau collapsed the volcano body in December 2018 [3].This condition generated the tsunami disaster in the southwest direction from the active crater that caused many causalities [3,4].This volcano activity phase comes back to the 1953 eruption style.The cycle of the big eruption of Anak Krakatau that caused the body to collapse is about 70 years.So, it is quite important to conduct research focused on disaster mitigation to minimize volcanic risk in the future.
The K-means clustering is a widely used unsupervised machine learning algorithm that is used in data analysis and pattern recognition [5,6].While K-means clustering is not directly related to volcanoes, it can be applied to analyze various aspects of volcano-related datasets.Since the volcano actively generates various types of earthquakes during their daily activity.So, automatic analysis is very important to understand the big data of volcanic activity because it is hard to identify the type of volcanic earthquake manually due to the time limitation.Many volcanologists have focused on this field in recent years [7,8,9].We use unsupervised machine learning using K-means clustering to classify the volcanic earthquakes automatically to identify the type of volcanic earthquakes.
In this study, we apply the clustering of volcanic earthquakes automatically to the Anak Krakatau volcanic activity during June to July 2014 because there are some types of volcanic earthquakes are observed.This study is useful in volcanic hazard assessment and contributes to volcano monitoring and risk mitigation strategies.

Data and Methods
We use the continued seismic data during June 1 to July 30, 2014, that was recorded by the Sumatera Institute of Technology (ITERA).During June to July 2014, various of volcanic earthquakes were observed with varying intensities of magnitude according to the report of Center of Volcanology and Geological Hazard Mitigation of Indonesia [10].We analyze the seismic waveform recorded at the KRA4 station which is located about 3.84 Km from the active crater (see Figure 1).The seismic station is a 1-component short-period seismometer with a sampling rate of 100 Hz and an A/D resolution of 24 bits.We examine the ratio of short-time average to long-time average (STA/LTA) to perform the event detection for volcanic earthquakes at Anak Krakatau volcano during June to July 2014 (see Figure 2).The criteria of STA/LTA in our analysis are the length time window of STA and LTA are about 10 s and 80 s, respectively.The threshold between signal and ratio is about 3.5 for event detection.We set a time window is about 30 s for cutting the waveform in our analysis.We only detect high-quality volcanic earthquake waveforms, which are characterized by high S/N.We identify the noise level in the time window before the arrival time of the P-wave.We successfully detected 120 events that showed the high-quality waveform during June to July 2014.We parameterized the parameters to calculate the K-Means clustering in our analysis which consists of parameters of mean, max amplitude, rate of attack, rate of decay, and dominant frequency [7].The parameters are shown in Eq.1 to Eq. 5 as follows; Mean Max Amplitude

Dominant Frequency
(()) Where   is the mean of the event array in m/s.s is the event array; n is the total number of events, and i is the number of events.

The result of K-means calculation
The results of the K-means clustering calculation in 2-Dimension are shown in Figure 3.We observed three clusters are generated.The total number of Cluster 1, 2, dan 3 are about 66, 25, and 9 events, respectively.Each cluster is concentrated on a specific area which means that the unsupervised machine learning based on the K-means clustering is applicable in our analysis.Figure 4 shows the example of the waveform that represents the characteristic of the waveform of each cluster in our analysis.The waveforms of each cluster have different characteristics.Three clusters of volcanic earthquakes that represent different volcanic earthquake types are observed.The waveform characteristic of Cluster 1 rapidly increases to maximum amplitude at the beginning of about 1-1.5 s, then decreases with time at about 4-5 s.The waveform characteristic of Cluster 2 gradually increases to maximum amplitude at the beginning of about 1-2 s, then decreases with time at about 4-5 s.The waveform characteristic of Cluster 3 gradually increases to the maximum amplitude at the beginning of about 2 -2.5 s, then decreases with a longer time duration of 6-8 s.
Figure 5 shows the example of the spectrum that represents the characteristic of the spectrum of each cluster according to the waveform event in Figure 4.The spectrums of each cluster have different max amplitudes that correspond to the frequency dominant.The range of frequency dominant of Cluster 1, 2, and 3 are at about 4-4.7 Hz, 6-6.5 Hz, and 7-7.5 Hz, respectively.The characteristics of the spectrum show different ranges of frequency dominant that probably represent the different volcanic earthquake types.We discuss the characteristics of waveform and the spectrum of each cluster in Discussion.

Discussions
The shape characteristics of the waveform of each cluster represented by events 85, 13, and 76 that correspond to Cluster 1, 2, and 3, respectively, are shown in Figure 6.The waveform characteristics of Cluster 1 are characterized by a rapid increase in a few seconds at the beginning, then gradually decreasing with time.This characteristic may correspond to a Volcanic Explosion event [7, 9, 11. 12].The waveform characteristics of Cluster 2 are characterized by a gradual increase in a few seconds, then gradually decrease with time.This characteristic may correspond to a Volcano-Tectonic shallow depth (VT-B) event [7,9,11].The waveform characteristics of Cluster 3 are characterized by a gradual increase in a few seconds, then gradually decrease with time with longer time duration compared to Clusters 1 and 2. This characteristic may correspond to a Hybrid event [7]. Figure 7 shows the characteristics of the smoothed spectrum of events 85, 13, and 76 that correspond to Cluster 1, 2, and 3, respectively.It is clear that the spectrum of events 85, 13, and 76 that correspond to different clusters shows different frequency dominant.The frequency dominant of events 85, 13, and 76 are about 4.6 Hz, 6.06 Hz, and 7.3 Hz, respectively.It suggests that different clusters show different volcanic earthquake types.The volcanic earthquake types of each cluster according to the frequency dominant range in our analysis are probably similar to earthquake types based on the waveform characteristic that was explained before [9.11,13,14].During June-July 2014, the PVMBG reported some volcanic explosions occurred with intensity at about 50-60 explosion events per month, and some various volcanic earthquakes were observed that may be related to shallow volcanic earthquakes [15].
Figure 8 shows the similarity of the characteristics of events in each cluster and how well the clusters are separated in our analysis [16.17].The score for the three clusters is about 0.84 for all clusters.The silhouette score shows the index similarity is high in our analysis.It means the characteristics of the event according to the characteristic of the shape of the waveform and frequency dominant in each cluster are similar in our analysis.
In this study, the classification of volcanic events using machine learning will be useful in volcanic monitoring due to its automaticity.The manual monitoring of continuous seismic data analysis is not efficient.The frequency of seismic events often becomes unmanageable to accurately classify manually, especially when it is most needed, such as during periods of unrest.One of the purposes of automatic volcanic event classification is the determination of the status of a volcano more precisely.This will help us in terms of preparedness for volcanic risk reduction.The automatic volcanic event classification has been examined on Cotopaxi volcano, Mexico, to classify the seismic events automatically to minimize the hazards associated with the volcanic eruptions [18,19].Another previous study successfully applied the classification algorithms based on the seismicity factor to determine the updated volcano's status [20].

Conclusions
We have investigated the seismic data of volcanic earthquakes at Anak Krakatau volcano during June 1 to July 30, 2014.We applied K-means clustering calculation to classify the volcanic earthquakes automatically.The result shows that 3 clusters are generated that show different characteristics in time and frequency domains.The waveform characteristics of Clusters 1, 2, and 3 are characterized by a rapid increase at about 1-1.5 Hz at the beginning followed by a decrease with time at about 4-5 s, a gradual increase at about 1-2 Hz at the beginning followed by a decrease with time at about 4-5 s, a gradual increase at about 2-2.5 Hz at the beginning followed by a decrease with longer duration at about 6-8s, respectively.The range of dominant frequencies of Clusters 1, 2, and 3 are at about 4-4.7 Hz, 6-6.5 Hz, and 7-7.5 Hz, respectively.The Index similarity of each cluster is high which shows the characteristics of events in each cluster are similar in our analysis.Each cluster may reflect the different types of volcanic earthquakes.Cluster 1, 2, and 3 may correspond to the Volcanic Explosion event, VT-B event, and Hybrid event, respectively.This study is useful for automatically classifying volcanic event that is generated in their daily activities, and give us a brief understanding of volcanic event type.This helps us to mitigate volcanic hazards such as the determination of the volcano's status according to the seismicity level.Future work focusing on a longer time duration of analysis is needed to obtain the event cluster more accurately and more precisely.

Figure 1 .
Figure 1.Map of Anak Krakatau volcano.The red triangle represents the active crater.The white rectangle represents the seismic station used in this study.The colormap corresponds to the elevation in unit meter.

Figure 2 .
Figure 2. The event detection by STA/LTA at the length of time window at about 200 s that recorded on June 15 th , 2014 at 21:40 Western Indonesian Time (WIB).The red line represents the starting time of event detection.

Figure 3 . 5 3. 2 .
Figure 3.The cluster of events according to K-means calculation in 2-Dimension.The blue, red, and green circles represent the Cluster 1, 2, and 3, respectively.The PCA means the principal component analysis.

Figure 4 .
Figure 4.The characteristics of the observed waveform are in three clusters.The blue, red, and green waveforms represent the waveform of events in Cluster 1, 2, and 3, respectively.The top-right corner number in the graph represents the number of events.

Figure 5 .
Figure 5.The characteristics of the observed spectrum are in three clusters.The blue, red, and green spectrums represent the spectrum of events in Cluster 1, 2, and 3, respectively.The topright corner number in the graph represents the number of events.

Figure 8 .
Figure 8.The silhouette score for 3 clusters.The red, blue, and green bars correspond to Cluster 1, 2, and 3, respectively.The dashed red line shows the index similarity in our analysis.