Complex Network Approach to the Statistical Features of the Sunspot Series

Complex network approaches have been recently developed as an alternative framework to study the statistical features of time-series data. We perform a visibility-graph analysis on both the daily and monthly sunspot series. Based on the data, we propose two ways to construct the network: one is from the original observable measurements and the other is from a negative-inverse-transformed series. The degree distribution of the derived networks for the strong maxima has clear non-Gaussian properties, while the degree distribution for minima is bimodal. The long-term variation of the cycles is reflected by hubs in the network which span relatively large time intervals. Based on standard network structural measures, we propose to characterize the long-term correlations by waiting times between two subsequent events. The persistence range of the solar cycles has been identified over 15\,--\,1000 days by a power-law regime with scaling exponent $\gamma = 2.04$ of the occurrence time of the two subsequent and successive strong minima. In contrast, a persistent trend is not present in the maximal numbers, although maxima do have significant deviations from an exponential form. Our results suggest some new insights for evaluating existing models. The power-law regime suggested by the waiting times does indicate that there are some level of predictable patterns in the minima.


Introduction
Solar-cycle prediction, i.e. forecasting the amplitude and/or the epoch of an upcoming maximum, is of great importance as solar activity has a fundamental impact on the weather conditions of the Earth, especially with increasing concern over the various climate change scenarios. However, predictions have been notoriously wayward in the past [1,2]. There are basically two classes of methods for solar cycle predictions: empirical data-analysis-driven methods and methods based on dynamo models. Most successful methods in this regard can give reasonably accurate predictions only when a cycle is well advanced (e.g. 3 years after the minimum) or with guidance from its past [3,4]. Hence, these methods show very limited power in forecasting a cycle which has not yet started. The theoretical reproduction of a sunspot series by most current models shows convincingly the 'illustrative nature' of the existing record [5]. However, they generally failed to predict the slow start of the present cycle 24 [6]. One reason cited for this is the emergence of prolonged periods of extremely low activity. The existence of these periods of low activity brings a big challenge for solar-cycle prediction and reconstruction by the two classes of methods described above, and hence prompted the development of special ways to evaluate the appearance of these minima [7]. Moreover, there is increasing interest in the minima since they are known to provide insight for predicting the next maximum [8].
Some earlier authors have both observed and made claims for the chaotic or fractal features of the observed cycles, but the true origin of such features has not yet been fully resolved. For instance, the Hurst exponent has been used as a measure of the long-term memory in time series [9,10]-an index of long-range dependence that can be often estimated by a rescaled range analysis. The majority of Hurst exponents reported so far for the sunspot numbers are well above 0.5, indicating some level of predictability in the data. Nonetheless, it is not clear whether such characterization is due to an underlying chaotic mechanism or the presence of correlated changes due to the quasi-11-year cycle [2,11]. By removing the sinusoidal trend of the time series by Fourier truncation in advance, the Hurst exponent is reduced to about 0.12, demonstrating a rather different estimation for the multifractal nature of the fluctuations [12]. This argument should be carefully interpreted since a properly designed adaptive detrending algorithm to remove the 11-year cycle in sunspots yields an estimation of the Hurst exponent to be 0.74 [13]. Anyway, it is the irregularity (including the wide variations in both amplitudes and cycle lengths) that makes the characterization of the possible multifractal features of the solar activity an interesting, challenging and, as yet, unsolved issue. In contrast to the 11-year cycle per se, we concentrate on the recently proposed hypothetical long-range memory mechanism on time scales shorter than the quasi-periodic 11-year cycle [14].
In this work, we provide a distinct perspective on the temporal sunspots evolution (in particular the strong maximal activities and quiescent minima) by means of the so-called visibility graph (VG) analysis (as explained in figure 1). Such graphs (mathematical graphs, in the sense of networks) have recently emerged as one alternative to describe various statistical properties of complex systems. In addition to applying the standard method, we generalize the technique further-making it more suitable for studying the observational records of the solar cycles.

Data and network construction
Both the international sunspot number (ISN) and the sunspot area (SSA) series [15] are used in this work, and we have obtained consistent conclusions in either case. The lengths of the data sets are summarized in table 1. We perform a VG analysis using both monthly and daily sunspot series, which yields, respectively, month-to-month and day-to-day correlation patterns of the sunspot activities. Note that we depict the annual numbers only for graphical visualization and demonstration purposes (we use the annual numbers to demonstrate our method-the actual analysis is performed in daily and monthly data). We discuss the results with the ISN (in figures 2 and 3) in the main text and illustrate the results for the SSA (in figures 4 and 5) with notes in the captions. Moreover, we compare our findings based on observational records to the results obtained from data produced by simulations from computational models [16,17].  Recently a variety of methods has been proposed for studying time series from a complex networks viewpoint, providing us with many new and distinct statistical properties [18][19][20][21][22]. In this work, we restrict ourselves to the concept of the VG, where individual observations are considered as vertices and edges are introduced whenever vertices are visible. More specifically, given a univariate time series x(t i ) i=1,...,N , we construct the 0-1 binary adjacency matrix A N ×N of the network. The algorithm for deciding non-zero entries of A i, j considers two time points t i and t j as being mutually connected vertices of the associated VG if the following criterion is fulfilled for all time points t k with t i < t k < t j [23]. Therefore, the edges of the network take into account the temporal information explicitly. By default, two consecutive observations are connected and the graph forms a completely connected component without disjoint subgraphs. Furthermore, the VG is not affected by choice of algorithmic parameters-most other methods of constructing complex networks from time series data are dependent on the choice of some parameters (e.g. the threshold ε of recurrence networks, see more details in [21]). While the inclusion of these parameters makes these alternative schemes more complicated, they do gain the power to reconstruct (with sufficient data) the underlying dynamical system. For the current discussion, and with the limited data we consider here, we prefer the simplicity of the VG.  The VG approach is particularly interesting for certain stochastic processes where the statistical properties of the resulting network can be directly related to the fractal properties of the time series [24][25][26][27][28]. Figure 1 illustrates an example of how we construct a VG for the sunspot time series. It is well known that the solar cycle has an approximately 11-year period, which shows that most of the temporal points of the decreasing phase of one solar cycle are connected to those points of the increasing phase of the next cycle (figure 1(A)). Therefore, the network is clustered into communities, each of which mainly consists of the temporal information of two subsequent solar cycles ( figure 1(B)). When the sunspot number reaches a stronger but more infrequent extreme maximum, we have inter-community connections, since they have better visibility contact with more neighbors than other time points-hence, forming hubs in the graph. The inter-community connections extend over several consecutive solar cycles encompassing the temporal cycle-tocycle information. Depending on various notions of 'importance' of a vertex with respect to the entire network, various centrality measures have been proposed to quantify the structural characteristics of a network (cf [29]). Recent work on VGs has mainly concentrated on the properties of the degree and its probability distribution p(k), where degree k measures the number of direct connections that a randomly chosen vertex i has, namely, k i = j A i, j . The degree sequence reflects the maximal visibility of the corresponding observation in comparison with its neighbors in the time series (figure 1(C)). Based on the variation of the degree sequence k i , we consider 1837, 1848, 1860 and 1870 as hubs of the network, which can be used to identify the approximately 11-year cycle reasonably well ( figure 1(B)).
Furthermore, in the case of sunspot time series, one is often required to investigate what contributions local minimum values make to the network-something that has been largely overlooked by the traditional VGs. One simple solution is to study the negatively inverted counterpart of the original time series, namely, −x(t i ), which quantifies the properties of the local minima. We use k −x and p(k −x ) to denote the case of −x(t i ). Here, we remark that this simple inversion of the time series allows us to create an entirely different complex network-this is because the VG algorithm itself is extremely simple and does not attempt to reconstruct an underlying dynamical system. As shown in figure 1(C), k −x captures the variation of the local minima rather well. We will use this technique later to understand the long-term behavior of strong minima of the solar cycles.
The degree distribution p(k) is defined to be the fraction of nodes in the network with degree k. Thus if there are N nodes in total in a network and n k of them have degree k, we have p(k) = n k /N . For many networks from various origins, p(k) has been observed to exhibit power-law behavior: p(k) ∼ k −γ . In the case of VGs, p(k) is related to the dynamical properties of the underlying processes [23]. More specifically, for a periodic signal, p(k) consists of several distinct degrees indicating the regular structure of the series; for white noise, p(k) has an exponential form; for fractal processes, the resulting VGs often have power law distributions p(k) ∼ k −γ with the exponent γ being related to the Hurst exponent H of the underlying time series [24]. It is worth pointing out that when one seeks to estimate the exponent γ it is often better to employ the cumulative probability distribution F(k) = k>k 0 p(k) so as to have a more robust statistical fit.
Estimating the exponent γ of the hypothetical power law model for the degree sequence of VG can be done rather straightforwardly, but, the statistical uncertainties resulting from the observability of sunspots are a challenge for reliable interpretation. Meanwhile, fitting a power law to empirical data and assessing the accuracy of the exponent γ is a complicated issue. In general, there are many examples which claimed to have power laws but turned out not to be statistically justifiable. We apply the recipe of [30] to analyze the hypothetical power-law distributed data, namely, (i) estimating the scaling parameter γ by the maximum likelihood and (ii) generating a p-value that quantifies the plausibility of the hypothesis by the Kolmogorov-Smirnov statistic.
Aside from the aforementioned degree k i and degree distribution p(k), in appendix A, some alternative higher order network measures are suggested which may also be applied to uncover deeper dynamical properties of the time series from the VG.

Figures 2(A) and (B)
show the degree distributions p(k) of the VGs derived from the ISN x(t i ) with heavy-tails corresponding to hubs of the graph, which clearly deviates from Gaussian properties. In contrast, p(k −x ) of the negatively inverted sunspot series −x(t i ) shows a completely different distribution, consisting of a bimodal property (figures 2(C) and (D), extra large degrees are at least two orders of magnitude larger than most of the vertices ( figure 2(D)).
Since well-defined scaling regimes are absent in either p(k) or p(k −x ) (nor do they appear in the cumulative distributions as shown in the insets, see caption of figure 2 for details of the statistical tests that we apply), we may reject the hypothetical power laws-in contrast to what has been reported in other contexts [23,24].
In the context of studying grand maxima/minima over (multi-)millennial timescale using some particular indirect proxy time series, the main idea lies in an appropriately chosen threshold, excursions above which are defined as grand maxima, respectively, below which are defined as grand minima [31,32]. In this work, we use a similar concept but here in terms of degrees of the corresponding VG. We define a strong maximum if its degree k is larger than 100 (figures 2(A) and (B)), a strong minimum if its degree k −x is over 1000 (figures 2(C) and (D)). Note that, in general, our definition of strong maxima/minima coincides neither with the local maximal/minimal sunspot profiles, since degree k takes into account the longer term intercycle variations, nor with those defined for an individual cycle. Our definition avoids the choice of maximum/minimum for one cycle, which suffers from moving-average effects (e.g. [1]). In contrast, our results below are robust with respect to the choice of the threshold degrees, especially in the case of the definition of strong minima, since large degrees are very well separated from others. The gray line in figure 3(A) shows the sunspot numbers overlaid by the maxima/minima identified by the large degrees. We find that the positions of strong maxima are largely homogeneously distributed over the time domain, while that of the strong minima are much more clustered in the time axis-although irregularly ( figure 3(A)). We emphasize that the clustering behavior of the strong minima on the time axis as shown in figures 3(A) and 5(A) does not change if threshold degrees are varied in the interval of (200, 8000) to define strong minima, i.e. k > 7000, as shown in figures 6(A) and (B). Therefore, the bimodality as observed in figures 2(C) and (D) and 4(C) and (D) is not due to the finite size effects of time series.
The hidden regularity of the time positions of maxima/minima can be further characterized by the waiting time distribution [32]: the interval between two successive events is called the waiting time. The statistical distribution of waiting-time intervals reflects the nature of a process that produces the studied events. For instance, an exponential distribution is an indicator of a random memoryless process, where the behavior of a system does not depend on its preceding states on both short or long time scales. Any significant deviation from an exponential law suggests that the underlying event occurrence process has a certain level of temporal dependence. One representative of the large class of non-exponential distributions is the power laws, which have been observed in many different contexts, ranging from the energy accumulation and release property of earthquakes to social contacting patterns of humans [33].
In the framework of VGs, the possible long temporal correlations are captured by edges that connect different communities (the increasing and decreasing phases of one solar cycle belong to two temporal consecutive clusters). As shown in figure 3(B), the distribution of the waiting time between two subsequent maxima sunspot deviates significantly from an exponential function, although the tail part could be an indicator of an exponential form. In contrast, we show in figure 3(C) that the waiting times between subsequent strong minima have a heavytail distribution where the exponent γ ≈ 2.04 is estimated in the scaling regime. This suggests that the process of the strong minima has a positive long term correlation, which might be well developed over the time between 15 and 1000 days where a power-law fit is taken. Waiting time intervals outside this range are due to either noise effects on shorter scales or the finite length of observations on longer time scales. Again, the power law regimes identified by the waiting time distributions are robust for various threshold degree values in the interval (200, 8000) (for instance, the case of k > 7000 is shown in figures 6(C) and (D)).

Discussion
In contrast to the computations described above with observational data, we now demonstrate the inadequacy of two models of solar cycles. By applying the VG methods to model simulations we demonstrate that the observed data have, according to the complex network perspective, features absent in the models. We first choose a rather simple yet stochastic model that describes the temporal complexity of the problem, the Barnes model [16], consisting of an autoregressive moving average ARMA(2, 2) model with a nonlinear transformation where α 1 = 1.906 93, α 2 = −0.987 51, β 1 = 0.785 12, β 2 = −0.406 62, γ = 0.03 and a i are identically independent distributed Gaussian random variables with zero mean and standard deviation SD = 0.4. The second model is a stochastic relaxation Van der Pol oscillator, which is obtained from a spatial truncation of the dynamo equations [17]. The equations reaḋ where ω = 0.2993, µ = 0.2044, ξ = 0.0102, ξ s is Gaussian noise with zero mean and SD = 1, and r is adjustable but often chosen to be 0.02. The variable x is associated with the mean toroidal magnetic field, and therefore the sunspot number is considered to be proportional to x 2 , which prompts us to construct VGs from x 2 (−x 2 respectively). Both models reproduce the  rapid growth in the increasing phase and slow decay in the decreasing phase of the activity cycles adequately. For the truncated model of the dynamo equations, a statistical significant correlation between instantaneous amplitude and frequency has been established, while the Barnes model shows virtually no correlation [34], which is generally termed the Waldmeier effect.
From both models, we generate 10 000 independent realizations, each of them has a 1 month temporal resolution and the same time span as we have for the observations (namely, over 300 years). We then construct VGs from both x t i and −x t i from each realization in the same way as we processed for the original observation. As shown in figure 7, p(k) can neither mimic the heavy tails of the distributions we observed in figures 2(A) and (B), nor can p(k −x ) capture the bimodality of the large degrees for strong minima as we have observed in the case of observational raw records in figures 2(C) and (D). This does not occur even if the parameter r of the second model is adjusted (figures 7(C) and (D)). One reason for the absence of any bimodality of p(k) in the nonlinear oscillator model is the fact that the model was designed to reproduce smoothed sunspot number time series. The often-used 13-month running average method in the literature is known to suppress the maximum/minimum amplitudes of the series [1]. Therefore, the visibility condition for each time point is changed if the 13-month smoothing technique is applied to the original data. We show the degree distribution p(k) of VGs reconstructed from smoothed ISN series in figure 8(B), where the bimodality is absent.
As shown in figure 7, strong maxima/minima are not well separated. Consequently we are prevented from identifying unique waiting time sequences. The corresponding analysis then depends significantly on the choice of threshold degrees.
Dynamo theory provides several hints that might explain the features observed in the long-term evolution of the solar activity. The two theoretical models tested in this work can reproduce the main qualitative features of the system reasonably well, however, within the context performed in this study, cannot yield a complete and conclusive rendition of the statistical properties of p(k). The power-law regimes obtained from waiting time sequences suggest that the interaction patterns for two subsequent minima can be much more complicated than has been previously described as the instantaneous amplitudes-frequencies correlation using rather simple models [34,35].

Conclusions
In this work, we apply a recently proposed network approach, namely the VG, to disclose the intricate dynamical behavior of both strong maximal and minimal sunspot numbers with observational records. More specifically, we show that: (iii) There is no evidence for a long term inter-cycle memory. This is in agreement with the present paradigm based on alternative methods (see e.g. reviews by Petrovay [5] and Usoskin [36]), and provides an observational constraint for solar-activity models.
Therefore we propose that our results on the difference between maxima and minima could be used for evaluating models for solar activity because they reflect important properties that are not included in other measures reported in the literature. From the methodological perspective, we are the first to propose a generalization for the construction of VGs from the negatively inverted time series. This has been seen, via our analysis of sunspot observations, to show complementary aspects of the original series. Note that the negatively inverse transformation is crucial for understanding when asymmetry is preserved in the time series. Therefore, from the methodological viewpoint, it is worth analyzing the dependence of the resulted VGs on an arbitrary monotonic nonlinear transformation. Furthermore, as presented in appendix B, a systematic investigation of the general conclusion as to whether large sampled points correspond to hubs of VGs will be a subject of future work-especially in the presence of cyclicity and asymmetry. On the other hand, one can study the fluctuation properties of the sunspot series by using any de-trending techniques as a preprocessing. Note that here we study the network properties of VGs from the original time series, instead of constructing a network from local fluctuations. In the case of studying the statistics of the fluctuations, it will be an interesting task to test the robustness of the network approach with respect to different de-trending algorithms to the sunspot series [12,13].
Note that we construct VGs directly based on the raw sunspot series without any preprocessing. Many researchers prefer to base their studies on some kind of transformed series, since most common methods of data analysis in the literature rely on the assumption that the solar activity follows Gaussian statistics [2]. It is certain that the conclusions will then show some deviations depending on the parameters chosen for the preprocessing. The pronounced peaked and asymmetrical sunspot-cycle profiles prompt one to develop techniques such that the possible bias due to the unavoidable choice of parameters should be minimized. The complex network perspective offered by VG analysis has the clear advantage of being independent of any a priori parameter selection.
The procedure for network analysis outlined here can be directly applied to other solar activity indicators, for instance, the total solar irradiance and the solar-flare index. Some earlier analysis has revealed that, on average, the duration, rise and decay times of temporal flares show distinct asymmetric properties, which may vary either in phase or statistically show no correlation with the solar cycles [37,38]. Note that our results reported here mainly concern the asymmetry between the maximal and minimal sunspots (from a VG perspective). We expect that our network approach could be used to study more asymmetric properties of the temporal sunspot series, e.g. to disclose the asymmetric distribution of sunspots between the northern and southern hemispheres [4,39]. We compared our results to two rather empirical models and showed that the distinctive correlation patterns of maximal and minimal sunspots are currently absent from these two models. Using our analysis for more refined dynamo-based models (e.g. [40]) would be straightforward. Certainly, further work on this line of research will examine any differences given by the particular quantity and strengthen the understanding of the hypothetical long-range memory process of the solar activity from a much broader overview.  underlying geophysical mechanisms remains a challenging task and is largely open for future work. Note that there is in general a strong interdependence between these different network structural quantities.

Appendix B. Correlation between k i and x i
A general rule of understanding the scale-free property of the degree distribution of complex networks is the effects of a very few hubs having a large number of connections. In the particular case of VGs, hubs are related to maxima of the time series, since they have better visibility contact with more vertices than other sampled points. However, this result cannot be generalized to all situations, for instance, it is easy to generate a time series such that its maxima are not always mapped to hubs in the VGs. One simple way to better explore this correlation is to use scatter plots between the degree sequence and the sunspot time series. As we show in figure B.1, the Spearman correlation coefficients ρ are very small (still significantly larger than zero) in the case of VGs reconstructed from the original time series. This provides an important cautionary note on the interpretation of hubs of VGs by local maximal values of the sunspot numbers. In contrast, if the network is reconstructed from −x, hubs of VGs could be better interpreted by local minimal values of the sunspot data since the correlations become larger. These results hold for both the ISN and SSA series (figure B.2). One reason for the lack of strong correlation between the degree k i and x i is because of the (quasi-)cyclicity of the particular time series, which has a similar effect as the Conway series [23]. It is this concave behavior over the time axis (although quasi-periodic from cycle to cycle) that prevents the local maxima from having highly connected vertices. It remains unclear how local maxima of a time series are mapped to hubs of VGs-we defer this topic for future work especially in the presence of cyclicity. This situation becomes even more challenging if some sort of asymmetric property is preserved in the data, as we have found for the sunspot series.