The correlation analysis of the daily Covid-19 new cases data series in Albania

We analyzed herein the new covid-19 daily positive cases recorded in Albania. We observed that the distribution of the daily new cases is non-stationary and usually has a power law behavior in the low incidence zone, and a bell curve for the remaining part of the incidence interval. We qualified this finding as the indicator intensive dynamics and as proof that up now, the heard immunity has not been reached. By parallelizing the preferential attachment mechanisms responsible for a power law distribution in the social graphs elsewhere, we explain the low daily incidence distribution as result of the imprudent gatherings of peoples. Additionally, the bell-shaped distribution observed for the high daily new cases is agued as outcome of the competition between illness advances and restriction measures. The distribution is acceptably smooth, meaning that the management has been accommodated appropriately. This behavior is observed also for two neighbor countries Greece and Italy respectively, but was not observed for Turkey, Serbia, and North Macedonia. Next, we used the multifractal analysis to conclude about the features related with heterogeneity of the data. We have identified the local presence self-organization behavior in some separate time intervals. Formally and empirically we have identified that the full set of the data contain two regimes finalized already, followed by a third one which started in July 2021.


Introduction
The spread of contagious illness is analysed from different point of view and interesting models have been proposed in many paper and researches. The COVID-19 behaviour data belong to such systems but also characteristic behaviours and specific properties are expected to be present. We acknowledge that the COVID-19 behaviour has been influenced by biological and physical factors and also countries' specific administrative measures that impose nonlinearity and non-stationarity on the corresponding data series. In this framework, Albania represents a typical heterogenous medium. Regarding to the population size and concentration, social differences, and mobility issues between rural and urban areas, this system is particular and interesting [7]. The standard models discussed in [1], [2], [18], [19], are expected to be not effective in the prediction and forecasting of the future behaviour in such cases. The complex effect of the individual or community's immune features, nature of social network, geographical heterogeneity, population densities, communication habits, etc, are some among many factors that directly affect the use of the theoretical assumptions in the deterministic mathematical models. In this context, we have focused our view on the empirical and descriptive evidence for time data series, on the analysis of the distributions for some relevant observable, on the identification of the local and global trends or dynamics, analysing the selfaffinities of the data etc. In this framework and for a general reference about stationary of the data series, we evidenced that the variance of the data series was not constant, and also the first derivative of the data series was found non stationary. However, for specific time windows intervals we observe stationary behaviour for the first differences which open the green light for implementation of the deterministic models according to the general statistics. But in the general context, using deterministic models of the SEIR group is not practically productive because we obtained that the parameters of the model depend on the time windows selected for analysis.

A comment for the application of the deterministic model for specific case
The mostly used models related with to the time dynamics can be the categories based on the autoregressive properties and causes-respond modelling. They both need a preliminary stationarity analysis for the variables presumed to be used. In principle, by concerting all effects of the factors that contribute at the given level of daily positive cases, we can formally accept that the number of new occurrences has the general functional form where the functional F include variables or the cumulative sum (actually infected and recovered people), and the term u is the noise. The autoregressive version of Yt is interesting and practically important if we use neural networks methods or deterministic modelling. In purpose, the lag time can be fixed (14 to account for incubation phase). In analysing the mathematical rigorousness of a general model (1) for the data series of the new COVID-19 cases in Albania, we started from a very simple descriptive analysis based on the variances. So, we observed that the covariance matrix for our series for lags 1-14 has elements with negative values that testify the high nonstationary of the series and also, the effect of the outsiders' factors. Both those features undermine the modelling success. In the theoretical limit, the well-known SEIR or SEIR based models discussed in [1], [2] or similar numerical calculation based in deterministic forms discussed [18] are referred as applicable in the study of the COVID behaviour, so we can use this paradigm and the fit of the data to such models to evidence the position of our system compared to the theoretical limit. In the reference [19] it is reported that based on them SEIR model equations the time dynamics of the COVID-19 spread in Lombardian region have been reproduced quantitatively. In (1) the symbols used are Λ, per-capita birth rate; μ, per-capita natural death rate; α, virus-induced average fatality rate; β, probability of disease transmission per contact (dimensionless) times the number of contacts per unit time; ϵ, ate of progression from exposed to infectious (the reciprocal is the incubation period) and γ, recovery rate of infectious individuals (the reciprocal is the infectious period).Previous works as [7] have underlined that our system is highly disturbed.
Refereeing the time as a variable, we checked if this phase has been resumed for our system. So far, we have performed the calculation following the algorithm based on equations (1) and also the general Richard' model applied for other contagious illness analysed in the references [18], [19]. We obtained that the parameters calculated by a fit procedure and reproduction of the natural data observed, varies by the time, differs by different time windows intervals. Also, the fit has been realised with a high 3 deviance and variances for the output and response variable. We have used those aspects preliminarily to focus our analysis in the descriptive and empirical analysis.

Descriptive methods used for analysing data series of the daily new cases for Covid-19 spread
Based on the preliminary evidence of nonlinearity for the time series of the COVID-19 spread in Albania, we are focused onthe descriptive analysis to collect information about the phenomena under the study. Firstly, we considered the correlation and the covariance between daughter series which have been produced by cutting the original series from the above in the intervals ending form one to the time lag, says ℎ ( ) = ( , − + ). Herein, the Pearson and Spearman coefficients are used to acknowledge the similarities between those sub-series. They can represent relationship, inter-dependence and indicators of casualties including the infection maturation time. By comparing the results of the correlations with principal factor analysis, we have identified the relationship between successive occurrences. Next, we employed the empirical mode decomposition (EMD)to identify the trend of the dominant process and the amplitude rate alteration. The EMD method has been introduced by Huang [3] to study the high nonlinear processes, so it is considered as useful tools. The technique has demonstrated its robustness in the analysis of the irregular or high nonlinear signal. In a recent consideration in the reference [7] this method has been acknowledged also as very fruitful for the analysis of COVID data series for Albania. Herein we have used the improved EMD-techniques called EEMD and VEMD described in references [4] and [5]. Shortly speaking, the EMD and its subvariants, mimics the idea of the Fourier decomposition, but the components now are not harmonics and moreover, no they do not assume fixed frequency. Given a signal x(t), the EMD algorithm can be summarized in short by following steps: (a) start by identifying all local extrema for the irregular signal x(t), (b) interpolate between minima (resp. maxima),ending up with some envelope ( ), ( )and calculate compute the mean called IMF (the Intrinsic Mode Functions), (d) iterate on the residual m(t) until a condition threshold for d(t) is reached, say it could be considered a zero-mean component. The signal therefore is considered a s made up of superposition of those modes, not necessarily orthogonal where residuals r(t) are small terms remaining after the last mode assessment. Note that by construction the last IMF results in the lower frequency mode in the signal, so it gives the global trend of the series if the threshold is chosen appropriately.The time-frequency features obtained from the IMF's can be represented by the Hilbert-Huang spectrogram (HHS) or, in its time marginalized form, the marginalized Hilbert-Huang spectrum, see [4], [5].Performing EMD analysis is a straightforward procedure, with many mathematical details and physical interpretation which we are not listing here. Therefore, we have used EMD analysis herein only for qualifying the presence of the regular regime and its stage of the dynamics mathematically. So, if the underlying regime was not finalized, we have used other approaches to proceed with the prediction or forecasting of the near future behaviour. In the(variation) VEMD we have fixed the tolerance (threshold) corresponding at one individual to account for the fact that that natural entities are can be diagnosed as positive or non-positive. The variables have been used in the original form the official database, in the per million units. Next, we analysed the distribution that emerge from the data series that is the densities of the daily occurrences. In this case we aimed to connect the mathematical properties of the series observed above, as for example the frequencies of occurrences estimated by the EMD, with the physical mechanism. For this purpose, we considered the empirical fit of time series data proposed in [5] , [13] and literatures referred therein. Accordingly, the power law distribution is considered as argument for the presence of the preferential attachment rule, where the probability for a new link is where is the number of links for the node (i) and k are its links with other nodes before another node is offered to be connected? According to [19] the emerging power law distribution has the exponent around -3 but further estimation gives a broader range: Considering the arguments of the initial step on the estimation of the distribution that fits the densities data, and especially the bin-width optimization procedure, adding to the fact that the number of the data points are statistically small (~500 days until Covid-19 have started), the analysis in this part is considered only qualitative. Also, based on the specific behaviour which apparently shows similarity with series undertaking self-organization dynamics, we proposed to use the log-periodic functions (LP) that are known as capable to describe critical behaviour or discrete scale of invariance processes, see [6], [17] and references therein. Note that in a previous study [8] the presence of the DSI structure in covid 19 spread has been analysed particularly. For easiness of the reader, we remember that the DSI structure is a specific scale invariance where the scaling parameter is discrete that is in the equation The analytical form of the solutions is a log-periodic function where in is the critical time, y is the logarithm of the observable, ω is the cyclic frequency related to the DSI parameters and A, B are constants. The critical point is interpreted as the time moment when the regime is most likely to change. Finally, we have analysed the fractal and multifractal structure of the series for a detailed understanding of the self-affinities of the data.

The descriptive analysis for the COVID new case in Albania
The data time series for COVID -19 new cases for Albania represent a similar trending behaviour with other series form neighbours, Figure 1. From the general descriptive point of view, the trend of the series is a very important feature to be considered. So, in the intervals [1,280] and [310,350] the trend of the daily occurrence is increasing. After time point 350 (350 days after first evidence), the series has a decreasing trend. So, we can expect that if this trend has to do with specific regimes, the best approach would be considering each one particularly. By direct assessment we observed that the variance of the COVID-19 data series changes significantly for series ending in different time moment. The variance has started to stabilise by the end of the first semester 2021. This behaviour is observed for all six Albania neighbour countries considered herein for the comparison. Evidently, series are non-stationary for all countries, but the stabilising behaviour has reached the horizon. The  ,it seems that series of different lags have different directional growth. By crossing the findings of the covariance coefficients with naïve linear models' approach, we concluded that up now, the deterministic models and auto-regressive forms are not efficient to describe the process. Next, we performed a qualitative principal component analysis to estimate the linkage between series of different lags. We take empirically = = 14 corresponding the overall claiming that the full period of the incubation is around two weeks. When considering the linkage = ( ( − … )).

Figure 3. auto correlation patterns for Covid series in Albania and 5 other neighbour countries
We observed that the variance is expressed usually in one or two components, and this is easy to be accepted as long as series with lags 1 or 2 include each one (by one day difference). It says that again, a model is not descriptive at least if we consider all data set. We observe that when considering the new positive cases recorded as the variable of interest (the response), in a model of the type = (( − … )), the dimension or the number of the largest unequal eigenvalues of the covariance matrix for the first period is 13 and for the second is 8. This number varies for other countries considered from 9 to 14 for the first period and 8 to 14 for the second period. A very intriguing period is the decreasing values one during the spring 2021, because it corresponds to the massive vaccination. We observe that data series for Albania have the maximum of the components, but other components have non significant weight. A similar case looks for Turkey where again the second components weight around 7% whereas for all other countries considered herein, at least three or four components weights significantly. As we observed by the direct covariance analysis presented above, under the assumption that a certain model could describe the linkage between the daughter series, the most significant links are concentrated in the last four days say the actual data has direct linkage with the occurrences before 4, 3, 2 and 1 days. This view sheds some light on the underline process or set of process that produce daily new cases as its outcome. To this end, it is worth to mention that the covariance and correlation analysis could provide only descriptive and qualitative knowledge for the processes related to the time data series under study. For a deeper view on the system, we need to use other techniques and approaches.

The EMD analysis for the Covid data series in Albania
As we described above, the EMD method is very intriguing and helpful for the analysis of the irregular and signal nonlinear and nonstationary processes. It is fully data-oriented, and we chose it herein to reinforce the descriptive mathematical view instead of complicated and debatable models for the series under study. Again, it is worth to underline the fact that aside of many factors, there are typically complex one in the game, as the effect of the social distancing rules in the pandemic spread, and the quality of data records, e.g., the correctness which they represent the real epidemic spread in the population. This said, the data-oriented techniques become very important and useful. So, by implementing the simple EMD algorithm we observed that the time data series has its highest characteristic cyclic time (the inverse of the frequency of the last IMF) greater than the time spanned from COVID beginning up now. Noting that the IMF are not necessarily related to any physical processes in system, its accepted that the last IMF represents the trend of the dominant process, so we can read it as the global regime indicator. However, the last mode has a high amplitude which testify for undesirable effects, so we proceed with the noise assisted technique (VEMD) introduced in [4] and discussed also in the applications [13].We obtained that COVID time data series for Albanian and Greece are decomposed in 11 IMF against 10 for other four countries used herein to compare the behaviour. We observed that the 5 last modes represent a more physical view of the data. From them we distinguish traces of the periodicity on the signal but their amplitude varies significantly in all IMF. The most substantial periods are observed in the two last modes. For Albania it resulted that the full observation time is partitioned in 3 distinct time intervals corresponding to composite regimes, and the last one is not completed, so the full macro regime has not finished yet. Herein we chose the variance the corresponding of one individual in its margin. Remember that EMD is a recursive nonlinear filter which decomposes a time series into a set of narrow-band scales (IMFs), see [3], [4] for further details. Therefore, according to the refereed literature herein, we can get known about the dynamics at each scale at the highest level of resolution-instantaneous frequency and instantaneous amplitude-via the Hilbert transform. From the spectrogram constructed by operating Hilbert transform according to the EEMD (ensemble EMD), we observe that the power spectrum is characterised by alternating peaks among local frequencies measured in 1 units. The highest value for the frequency is obtained around 50 which corresponds to the period around 9 days. Note that the meaning of the local frequencies can be interpreted mostly in mathematical framework as numerical trend because we didn't perform a detailed and thorough analysis on each EMD mode. Finally, we showed the power spectrum to illustrate the local behaviour of the time series under analysis. We observe that the frequency modulation is dense reporting a high non-stationary process.

Figure 5. The trend as by the last IMF
To extend the verification of the regime as obtained above, we have performed a brief critical behaviour analysis based on the evidences of the DSI structure. In this case, the critical time correspond with the time moment where the regime is most likely to change, as discussed in theoretical framework in [19]. According to [8] we analysed this aspect by exploring the fit of the data to the q-log-periodic function to account for non-dominant DSI structure for the series. Initially we explored this aspect by fitting the simple ad-hoc LP function (10) to the data on certain time windows where guessed near-to-critical dynamics. We observed that local self-organization behaviour is present in short intervals, but usually it disappears after few weeks of first appearance. The last trace of the log periodic dynamics is observed in the closing stage of the series corresponding with the spring 2021. However, in general we can observe the log-periodic only locally in the time windows, and following the analysis reported in [8], it resulted that in short terms those functions can produce good prediction of the regime changes. The long-term behaviour cannot be suited to a single function that is in full agreements with empirical preliminary results that there are at least more than two regimes on the time evolution of new cases. By concluding that a deterministic model is not conclusive and regarding limitation of the extension of log-periodic behaviour mentioned also in [8], we also performed a neural network overview to forecast the behaviour in short terms. Note that our simulation based on SEIR and RICHARD variants presented in [1], [19], [18], [12] etc., have produced large deviances from actual data and we are not commenting specifically herein.  Figure 6. Power spectrum for the data series 6. The distribution of daily occurrences From a general point of view, the system would be considered in a stationary state if the distribution of a characteristic parameter is stable. There is a large literature about stability for the distribution based on Levy -stable concept, but in general, if the variable is undefined or infinite, we assume that the distribution is not stable. So, we firstly realised the "discrete analogue" distribution by counting the frequencies of the events = [ , +1 ]where x is the daily per million new cases. We call it binning process. Before realizing the binning, we observe that the variance is significantly high and so, estimating that the distribution is non-stationary we used Freedmen-Djaconic optimization rule (FD). Notice that the bin optimization rule aims on the fixing the bin width ℎ = − such that where k-moments are given by general formula = ∫ ( ) , given that ( ) is the probability density for random variable x. As usual, we considered only the first moments e.g., the mean and the variance. We followed FD rule that basically does not assume normal distribution for the errors or deviances of variable x. by employing such steps, the distribution resulted a multipart function where two parts are easily distinguishable. The first one based on the low occurrences is fitted better to the power law curve, but it can be approached by exponentials also. Having only few points in this zone, we did not perform detailed tests and statistics to discriminate between those two possible partial pdfs so we used the contextual arguments based on the reference [10]. The remaining part is a disturbed bell curve. It can be better approached with gamma-based functions or better by qgaussian discussed in [17]. Those functions are based on so called q-exponentials of general forms   Albania and Serbia, we identify the two-part distribution, where the last one corresponding to the high daily occurrences is a q-gaussian. In the case of Greece, it resulted that the distribution has started to develop of unique profile with high dominance of small daily occurrences reported. Italia has a distribution near to a pareto law. According to the analysis provided in [9], the double part distribution indicates the presence of more than one mechanism. In the literature [10] it has been highlighted the PL origin of the distribution for Covid -19 spread, so we acknowledge the preferential attachment mechanisms responsible for the power law behaviour on the distributions. We admit that the bell-shaped distribution observed for high daily new cases is related to the mixed competitive effects of the factors of the physical and administrative nature. For the remaining part we do not conclude mathematically because of small number of points. In some extend, we can admit that this distribution emerges as result of nonlinearpreferential attachment rule that leads to the distribution function ( )~− exp (− ). Note that in a previous work contemplating attractiveness for links based on the local mean field [12] the distribution of the high links values in a social network has the q-gaussian shape introduced by [17]  Note that in this last formula, the q-parameter measure also the distance from the stationary state. Again, by using the results of the reference [12] we argue that the system is under dynamical process which is expected to push it toward a relaxation phase.

Conclusions
The evolution of the recorded data for COVID-SARS-2 in Albania exhibits significant nonstationary and heterogeneity properties due to the processes related to the illness propagation, immunity of the peoples, geographical and urban heterogeneity etc. Also, another issue is related to the administrative 12 procedures related to the pandemic management. By employing classical descriptive analysis and auxiliary tools we gathered important knowledge about the time series containing new positive cases recorded. We observed seasonality on the behaviour with a time lag about 9 days. The variance of the series has started to stabilize last months, and it looks similar with other neighbour countries, but at around 30 days in advances. In some respect, the properties of the series are similar with Greece and Italy, which may be related to the similar protocols used from the administration. By implementation of the EMD technique we observe that the initial process has entered the closing stage and the longterm regime of the new COVID cases is expected to finish at the end of the spring 2021. From the macro scale analysis based on the distribution, it resulted that the system is characterized by a two-part distribution. The low incidences belong to a power law whereas higher one belongs to a belldistribution that is fitted well with a q-gaussian forms. The system state seems to be under a dynamical relaxation process where the distribution would become a one-part function.