From single layer to multilayer networks in mild cognitive impairment and Alzheimer’s disease

We investigate the alterations of functional networks of patients suffering from mild cognitive impairment and Alzheimer’s disease (AD) when compared to healthy individuals. Departing from the magnetoencephalographic recordings of these three groups, we construct and analyse the corresponding single layer functional networks at different frequency bands, both at the sensors and the regions of interest (ROI) levels. Different network parameters show statistically significant differences, with global efficiency being the one having the most pronounced differences between groups. Next, we extend the analyses to the frequency-band multilayer networks (MN) of the same dataset. Using the mutual information as a metric to evaluate the coordination between brain regions, we construct the αβ MN and analyse their algebraic connectivity at baseline λ 2−BSL (i.e., the second smallest eigenvalue of the corresponding Laplacian matrices). We report statistically significant differences at the sensor level, despite the fact that these differences are not clearly observed when networks are obtained at the ROIs level (i.e., after a source reconstruction procedure). Next, we modify the weights of the inter-links of the multilayer network to identify the value of the algebraic connectivity λ 2−T leading to a transition where layers can be considered to be fully merged. However, differences between the values of λ 2−T of the three groups are not statistically significant. Finally, we developed nested multinomial logistic regression models (MNR models), with the aim of predicting group labels with the parameters extracted from the MN (λ 2−BSL and λ 2−T ). Using these models, we are able to quantify how age influences the risk of suffering AD and how the algebraic connectivity of frequency-based multilayer functional networks could be used as a biomarker of AD in clinical contexts.


Introduction
Network Neuroscience [1] consists of the analysis of structural and functional networks coming from different brain recording modalities. The rationale behind this approach is based on the fact that the brain is a complex, nonlinear, dynamical system where neural populations are constantly coupling and decoupling over different spatial and temporal scales with the aim of transmitting and processing information. Importantly, brain functions emerge from these interactions, from basic biological regulation to highly complex cognitive abilities. Network neuroscience is a suitable tool to study these processes, where nodes can be single neurons (λ S /2) [22]. In this regime, the network acts as a whole [24]. This finding enables new strategies to build networks maximising λ 2 , as applied in [26], and can be useful in clinical contexts. With this regard, Tewarie et al analysed the λ 2 from resting state MEG recordings (healthy subjects) and compared the empirical results to those obtained from a biophysical model [27]. Interestingly, they conclude that the healthy brain works at the transition point between the aforementioned regimes, and hypothesise that such a state could promote layers integration and segregation processes. A more recent work has extended this notion to the case of MN with heterogeneous interlayer links, focussing on the impact of missing links on the value of the λ 2 , and observing both regimes in synthetic signals and in real, resting state MEG registers [28].
In this paper, we analysed the single layer and the MN of patients suffering from mild cognitive impairment (MCI) and AD, and compared their structure with the equivalent networks obtained from healthy individuals. Using MEG recordings, we quantified, for every subject, the coordination between every pair of brain regions at the sensors and regions of interest (ROIs) level, but also at each frequency band and between frequency bands. To that end, we chose mutual information (MI) [29] as the connectivity metric. The reason behind this choice is the fact that, at the cost of no clear interpretation in physiological terms, MI is agnostic to the underlying statistical structure of the data. Hence, we can apply the same measure to quantify intra-layer and inter-layer links, ensuring both types of links are in the same range, and without any assumption regarding the mode of cross-frequency interaction (PAC/AEC). First, we obtained different network parameters (average shortest-path, weighted clustering coefficient, local and global efficiency and network outreach) at each individual frequency band (δ, θ, α and β), each of them leading to an independent single layer network. Next, we constructed the frequency-based MN by connecting the α and β layers through inter-layer links assessing the level of coordination between the two frequency bands. We investigated the role of λ 2 in the αβ MN as a possible biomarker in AD and MCI. Following references [24,28], we hypothesised that, as the condition advances, we might be able to observe differences in the interaction between frequency bands, especially in α and β bands, since (i) they are the ones accumulating more power in the spectrum, (ii) they have been shown to display significant differences in the statistical complexity and entropy [30], and (iii) they have been reported to be the frequency bands with the highest impairment [31].

Recruitment, data acquisition and preprocessing
We analyse resting-state MEG recording from 111 elder adults, recruited from the Center for Prevention of Cognitive Impairment, the Seniors Center of Chamartín District and the Hospital Clínico San Carlos Neurology Department in Madrid (Spain). The Hospital Clínico San Carlos Ethics Committee approved the study. Accordingly to their cognitive status, 48 participants were classified as healthy controls (control group (CG), 46 met the criteria for amnestic MCI, and the remaining 17 received the diagnostic of dementia due to AD. The complete protocol, including diagnostic and excluding criteria, can be found in [30]. Tables 1-3 in the appendix (section 4) summarise demographic information and mini-mental state examination (MMSE) scores for each participant.
We recorded 4 min of resting state brain's magnetic activity through a 306 channel vectorview MEG system from each subject. Due to the tSSS filtering procedure, signals recorded by magnetometers and gradiometers are highly correlated and redundant [32]; thus, only data coming from the 102 magnetometers were employed for further analyses. Ocular, muscular and jump artefacts were removed automatically and visually confirmed later by an MEG expert. Additionally, ocular and electrocardiographic contributions to the MEG signal were discarded via independent component analyses. Clean data were segmented into 4 s epochs, and the resulting dataset comprised 29 epochs (samples) from each subject. Clean epochs for each subject were band-pass filtered between 1 and 45 Hz so that low frequency and power line artefact contributions were discarded. A realistic model using a single-shell was employed to generate the forward model of a 1 cm spacing grid with 2459 sources inside the brain. Magnetic activity at the brain level was reconstructed in each of the nodes by using linearly constrained minimum variance beamformer [33]. Once time series for all of the 2459 sources were calculated, a representative time-course for each cortical region of a reduced version of the Harvard-Oxford atlas [34] was chosen by selecting the centroid source of each region. We retained 62 ROIs at each sample for every subject. More details on the procedure can be found in [30].
Lastly, for all subjects in both modalities (sensors and ROIs), we conducted a frequency decomposition of all signals to get an estimate of each time series into four classical frequency bands: δ : 1-4 Hz; θ : 4-8 Hz; α : 8-13 Hz and β : 13-31 Hz. In order to estimate the signal for each frequency band, we simply computed the Fourier transform of the signal and reconstructed it via inverse Fourier transform in the range of frequencies specified above. γ band was disregarded for all subjects due to lack of power in this specific frequency range (31-50 Hz).

Network construction
For each subject and sample in both modalities, we constructed two types of functional networks. On the one hand, four single-layer networks, one for each frequency band, and on the other, one αβ multilayer network. To unify our analyses we estimated the functional connectivity in every network in terms of MI, either it be single layer (evaluated at each frequency band separately) or multilayer (evaluated in α and β bands and between them). MI works comparing the amount of shared information between any two variables. In our case, these two variables are time series (in sensors or ROIs space), either from the same layer (α or β), or from different layers (one from α and one from β). Being X and Y any two time series, the MI between them is defined as: where p(x, y) stands for the joint probability of X = x and Y = y, and p(x), p(y) their individual probabilities.
To estimate probabilities we chose the Freedman-Diaconis rule [35], which performs reasonably good without assuming normality on the data [36]. Note that the multilayer network will be composed of two layers, α and β, and that we allow interlayer links (obtained through the MI) connecting any two nodes from any two layers. In this sense, our networks follow a complete multilayer architecture, not a multiplex one [28].
To test the statistical significance of the weight of every link, we applied an edge-level nonparametric scheme based on circular shifting, a method of the family of permutation testing. To that end, we generated new surrogate data under the null hypothesis of no temporal correlation [37] for each pair of time series considered [38]. We kept the original value of MI obtained, and then we followed an iterative procedure to test if it was statistically significant or not: (i) we kept one of the time series untouched, while in the other, (ii) we shifted a collection of consecutive points to its end, and (iii) we got the MI between both time series (the original and the shifted one). Repeating this process 200 times gave us an ad hoc null distribution of no temporal correlation for that particular pair of time series, against which we test our original estimation of MI. We set every comparison with a significance level of α = 0.01 and kept the original estimation of MI if and only if its probability of appearance in the null distribution was lesser than α . Otherwise, we set it to zero. This method is preferred over other normalisation schemes due to its ability to control for multiple comparisons at any desired level of significance α , making no assumptions about the underlying distribution [37]. Also, circular shifting preserves the inner autocorrelation structure of the time series [38]. The length of each shifted segment l s is adjusted for each frequency band across all subjects and samples, as the frequency component will determine the prototypical cycle of autocorrelation In contrast to other synchronization measures, MI has no upper bound, potentially biasing any further comparison relying on the strength of the network. To avoid this issue, we normalised all the weights in every network by linearly rescaling it between 0 and 1: where max[MI ij ](min[MI ij ]) indicates the largest (smallest) weight (i.e., MI) in the network. Note that this step is carried out for every network before any further calculation. Hence, it will differ depending on the network under consideration. In αβ MN, we consider all links (inter and intralayer), while in single layer networks, the procedure is conducted over intralayer links (as layers are isolated).

Multilayer analyses
To study the dependency between layers, we first obtained the Laplacian matrix L from each αβ multilayer network, whose elements are: where s i = i =j p ij is the strength of a node i. Then, solving the eigenvector equation:  The second smallest eigenvalue λ 2 (i.e., algebraic connectivity) is associated with the structure of the network, and in the multilayer/multiplex case, it undergoes a phase transition when layers become structurally dependent [22,24,28]. Given the fact that the evolution of the disease involves massive synapse loss and brain areas disconnection, we hypothesise that, functionally, this should be reflected in the integration between frequency bands, particularly between α and β. Particularly, we expect layers to be more disconnected than what we expect in a normal state (CG). Therefore, we will test (a) if the integration between bands, as quantified by the algebraic connectivity, is altered in MCI and AD at baseline (λ 2−BSL ) and (b) how far each group is from the transition point, where λ 2 undergoes the transition typically observed in multilayer/multiplex networks when layers are structurally merged (λ 2−T ). The former test is achieved simply by comparing the values of λ 2 obtained from each network between groups, which we will call λ 2−BSL from now on. Bear in mind that this nomenclature (baseline) is simply used to differentiate the values of λ 2 when obtained empirically from every network in every subject. The latter, however, requires simulating how the network would evolve towards the transition and the calculation of λ 2 at that point, which we will call λ 2−T . We followed two different strategies to obtain λ 2−T , resembling the approach presented in [28]. In each αβ network, we (a) multiplied every interlayer link by a constant (p; heterogeneous strategy), or (b) we first added the mean strength of the interlayer quadrant of the network and then multiplied every interlayer link by a constant (p; homogeneous strategy). We explored values of p from 0.1 to 70, obtaining the Laplacian matrix and its associated eigenvalues at each step. To detect the transition point, we obtained the absolute difference of λ 2 and λ 3 as a function of p(Δ(p) = |λ 2 − λ 3 |). Local minima in Δ(p) indicate crossing points between eigenvalues; we were interested in the first minimum, where λ 2 and λ 3 cross, reflecting the structural transition in the multilayer. For illustrative purposes, figure 1 shows one example of λ 2 , λ 3 and Δ(p) as a function of p, for the heterogeneous (a) and homogeneous (b) strategies, obtained from a random control subject.
Note that eigenvalues (λ 2 and λ 3 ) increase faster in the homogeneous strategy, as we are adding the average interlayer strength to all interlayer links before multiplying by the constant. As with λ 2−BSL , we will compare the values obtained from both the heterogeneous (λ 2−T1 ) and homogeneous (λ 2−T2 ) strategies between groups. Given that groups are not balanced, we had to implement an iterative strategy that uses all the information available in the trials/samples, ruling out any effect of inter-sample variability, and any group difference in such variability (iterative two-way ANOVA).
Lastly, we fitted a multinomial logistic regression model to the data, trying to predict group labels. This procedure allowed us to quantify a conversion rate to AD, taking into account λ 2−BSL , λ 2−T , age and sex. To fit the model, we previously averaged values of λ 2−BSL and λ 2−T across samples. A schematic representation of the pipeline to conduct our multilayer analyses can be found in the appendix (section 4, figure 7).

Single layer analyses
For each layer network (coming from sensors or ROIs) we obtained its weighted clustering coefficient [39], local and global efficiency [40], shortest path and outreach [5]. In unweighted networks, the clustering coefficient, also known as transitivity, measures the ratio of neighbours of a node that are also neighbours between them, or equivalently, the number of triangles in the graph. Its average, hence, quantifies the cliquishness of the network [41]. In weighted networks, the weighted clustering coefficient measures the local cohesiveness taking into account the intensity of the interactions present on the local triplets [39]: The normalisation factor (s i (k i − 1)) ensures that c w (i) is bounded between 0 and 1, accounting for the weight of each link times the maximum number of possible triplets.
To measure the local and global efficiency, we followed to the formulation proposed by [40], considering the ability of connecting nodes in a low number of steps at each scale (local and global): where d ij is the shortest topological distance between nodes i and j, computed using the Dijkstra's algorithm [42]. In this sense, the global efficiency is the harmonic mean of the inverse of the characteristic path length. Local efficiency, on the other hand, is obtained as: where G i is the subgraph formed by the neighbours of i without i. This property is local in the sense that it is obtained at each node as the inverse of the average shortest path of the neighbours of i if that node were not present in that subgraph. In biophysical terms, the efficiency of the network reflects a trade-off between the cost of the paths connecting different nodes and its performance in carrying out a given task. Lastly, the outreach parameter, formulated originally in [5], can be obtained as: where r ij is the physical distance between nodes and V(i) is the subgraph of nearest neighbours of i. To obtain r ij , we compute the Euclidean distance of the three spatial coordinates, which are given by the position of each sensor over the scalp or by the position of each ROI's centroid. The average outreach over all nodes reflects the dominance of long-range (high outreach) or short-range (low outreach) connections, and has been found to be altered in MCI and Alzheimer patients [5,43].

Group comparisons: methodological considerations
Groups are not balanced: control and MCI groups are more populated (N = 48 and 46, respectively) than AD (N = 17). Also, we count with 29 trials/samples from each individual. Irrespective of the level of the type of network (multilayer or single layer), any group comparison entails the methodological drawback of different sample sizes. Under this circumstance, we can either average the parameter across samples, or treat trials from each subject as different observations (realisations) of the parameter from each group. In the former, every test would imply comparing three vectors of sizes 48, 46 and 17 and necessarily drawing upon nonparametric statistics, with more relaxed assumptions on the data (normality is not required, and violating the assumption of homoscedasticity is not as dramatic as in parametric tests), at the cost of less sensitivity to real group differences. In the latter, vectors to compare would be of length 48 × 29 = 1392, 46 × 29 = 1334 and 17 × 29 = 493, which allow us to make use of parametric statistics. Nonetheless, we face a possible problem of intra-subjects variability. Hence, we claim that to test for groups differences, the statistical approach should (i) take into account all sources of variability: between samples, in each individual; between subjects, in each group; and between groups, (ii) track the (possible) interactions between variability sources, (iii) be as sensitive as possible to group differences while (iv) using as much available data as possible.
Considering all together, each comparison was conducted with an iterative two-way ANOVA. The analysis of variance is parametric, thus more sensitive to real differences in the data in comparison to nonparametric approaches, and it is able to take into account the variability from samples, as well as the interaction between sample variability and group belonging. In each realisation, we subsampled 17 individuals from the control and MCI groups at random, thus ensuring the design is balanced. For each subject, we kept the value of the parameter under consideration in every sample (that is, 29 values for each subject) and performed a two-way ANOVA. We set GROUP and SAMPLE as factors for the analysis (analogous to a repeated measures ANOVA), where the main effects will be differences between groups and differences between samples. The interaction (GROUP × SAMPLE) would indicate that the variability in samples is group dependant and not homogeneous.
P-values indicate significant differences, but do not inform about the practical significance of results, something far more important in empirical studies [44]. At each step, then, we also estimated the size of each effect, as measured by the partial eta squared (η 2 p ), a common measure in mixed effects designs that relate to the amount of variance explained by the factor in consideration. We repeated the process (iterative twoway ANOVA, also calculating the effect size for each comparison) 100 times, with a different subsample in each iteration, ensuring the maximal efficiency in statistical terms, as we were using all available information from every subject. We were especially interested in observing group differences (main effect of group), and expected that the main effect due to samples, as well as the interaction between group and samples remained non-significant. In this regard, we considered that an effect was significant if and only if 90 out of 100 iterations were significant, evaluating in each case its associated p-value after correcting for multiple comparisons with FDR. In such cases, the final effect size was the average over the significant iterations.

Results
Every group comparison has been conducted with the iterative two-way ANOVA explained above. For every test, we provide p-values, always meeting the criteria established before (90 out 100 iterations reaching that p-value after FDR correction). We focussed on the results for the main effect of group, as our goal is to demonstrate differences that depend on the stage of the disease. Note that, in any of our analyses, the main effect of sample and the interaction between sample and group were significant, indicating that we can safely attribute differences to group belonging and not to variability in the samples. Additionally, we qualified each result with its corresponding size effect (η 2 p ). Recall that the effect size quantifies the amount of variance in the data explained by the effect under consideration (group membership, samples or interaction between them). To grasp the implications of a particular value, it would be preferable to relate our results to previous literature [45], but given the lack of studies in which effect sizes are reported, we will follow the general recommendations originally provided by Cohen [46] to define small (η 2 p = 0.01), medium (η 2 p = 0.06) and large (η 2 p = 0.14) effects.

Single layer analyses
We carried out a single layer analysis of each of the four main frequency bands of the human brain, obtaining an independent network for each frequency. Networks were obtained in two modalities, at the (i) sensor level and (ii) the ROIs level (using the Harvard-Oxford atlas). Next, we calculated five network parameters for each modality and group, namely the weighted clustering coefficient, the global efficiency, the local efficiency, the outreach, and the shortest path. Figure 2 shows, for every frequency layer (δ, θ, α and β), the comparison between one of these parameters, the shortest path, for the MCI, AD and control groups at two scales. At the global level, parameters are averaged over all nodes (or edges if it is an edge-level metric), yielding one single value for each sample and layer. Bars (with corresponding standard error) indicate the average in that layer for each group, with asterisks indicating the p-value of the comparison after FDR correction only if that p-value is met in 90 out of 100 iterations ((a) and (c) in the figure for sensors and ROIs respectively). At the local (node) level, results are illustrated over scalp plots (sensors, (b)) or cortical surface plots (ROIs, (d)), with the average value of η 2 colour-coded. In that case, we set to 0 the value of η 2 if that comparison is not significant, or take the average over all significant iterations if it is (p < 0.05 after FDR correction in at least 90% iterations).
In this example, we illustrate the results found in shortest path, showing that in sensors there are significant global differences between controls and AD, and between MCI and AD (figure 2(a)) in δ (p < 0.001 in both cases, η 2 p = 0.047 5); θ (p < 0.05 and p < 0.001, η 2 p = 0.042 2); α (p < 0.001 in both cases, η 2 p = 0.075 4) and β (p < 0.001, both cases η 2 p = 0.083 8). In this regard, there is an effect of group in all bands, with sizes being small in δ and θ and medium in α and β, which is coherent with previous reports indicating that these last two bands are the most affected. The effect of samples and the interaction with group were not significant in any comparison. Note how, in any case, the effects are smoothed when taken globally: when we make comparisons at the node level, local alterations show greater effect sizes ( figure 2(b)), specially in temporo-parietal areas (right hemisphere, α, η 2 p ≈ 0.1), and temporo-parietal, frontal and occipital areas, again with a right hemisphere preference in β band (η 2 p ≈ 0.14). In ROIs, global alterations are less pronounced, as we only found differences in β band between controls and AD (p < 0.001) and between MCI and AD (p < 0.001), with a medium effect size (η 2 p = 0.086 6). Local differences are coherent with sensors: we find small effect sizes in δ and θ (η 2 p ≈ 0.02) corresponding to alterations in the parietal cortex (somatosensory association cortex), supramarginal gyrus, and entorhinal cortex. In the α band, the significantly affected areas (η 2 p ≈ 0.04) are: the orbitofrontal prefrontal cortex, and specially cuneus, precuneus, superior, and medial orbitofrontal cortex, as well as the cingulate cortex. Nonetheless, the largest effects, as in sensors, are found in the β band, including all previously mentioned areas, but including broader regions (prefrontal cortex) and lingual gyrus (η 2 p ≈ 0.15). Interestingly, in α and β, the alteration is bilateral, not confined to one hemisphere. Next, we calculated the weighted clustering (equation (5); figures 3(a) and (b)), the local efficiency (equation (7); panels (c) and (d)), the outreach (equation (8); panels (e) and (f)) and the global efficiency (equation (6); panels (g) and (h)). See section 2.3 for more details on the equations to obtain every measure. Interestingly, the weighted clustering coefficient on sensors was significantly different between CG-AD (p < 0.001) and MCI-AD (p = 1e − 06) in the δ band. We also found differences in θ but only between CG-AD (p < 0.001, η 2 p = 0.029). When the same analysis is translated to the ROIs level differences are reported at higher frequency bands: the θ layer showed differences between CG and AD (p < 0.05, η 2 p = 0.023), as well as the α layer (CG-AD, p < 0.001, η 2 p = 0.036). Finally, the β layer is the one showing more differences both at CG-AD (p = 1e − 06) groups and MCI-AD (p = 1e − 05, η 2 p = 0.093). We found a similar trend in the local efficiency (only δ being significant in sensors, and β in ROIs). The outreach parameter showed differences between the analysis at the sensor and the ROIs level as well. Again, δ and θ layers are the ones showing statistical significant differences between CG-AD and MCI-AD, while at the ROIs level differences appear at layers α and β, i.e. move to higher frequencies and only between MCI-AD (α) and CG-AD (β). Only global efficiency was consistent in both modalities, with all bands showing differences between controls and AD group, and between MCI and AD group. Note that global efficiency is the only measure that works at a global scale by construction, while the rest has been calculated at the node level and averaged over the whole network.

Multilayer analyses
Next, we investigated the differences between groups and modalities when αβ MN are constructed (see section methods) and analysed. First, we checked whether network density and strength (i.e., the sum of the links' weights) differ between groups. As shown in figures 4(a) and (b), we could not find any difference between groups in density, regardless of the modality. The effect of the samples and the interaction between group and samples were not significant either. Figures 4(c) and (d) compares the strength between groups. In this case, any tested effect was statistically significant for sensors (main effects of group and sample, and interaction between them). On the contrary, we found a significant difference between MCI and AD groups (p < 0.05), with a small effect size (η 2 p = 0.046) in ROIs. This could be due to the normalisation procedure (circular shifting) or due to real differences in strength, derived from the disease. Take into account that, although the effect is small (around 4.6% of the variance in strength is attributable to the clinical condition), this difference could bias further analyses.  Next, we compared the values of λ 2 at baseline and at the transition point, the latter obtained from the homogeneous and heterogeneous strategies (see methods), using the iterative two-way ANOVA. It is worth noting that, as a consequence of the normalisation scheme applied (circular shifting), it is possible that in some cases a network is disconnected, leading to an algebraic connectivity (λ 2 ) equal to zero. None of the networks built from sensors met this condition, whereas in ROIs, this was the case of the 14.5% of networks in the CG, 15.97% in MCI and 9.13% in AD. All disconnected networks were discarded for further analyses.
We found significant differences between controls and MCI (p < 1e − 06) and between MCI and AD (p < 0.05) in λ 2 at baseline (figure 5), with a small size effect (η 2 p = 0.046 5). The effect of samples and the interaction between group and sample were not significant. Although the group effect could explain around 4.6% of the variance in λ 2 , it is indicating that as the disease progresses, α and β bands become more independent, given that at each group, the average decreases. This might be reflecting a loss in the ability of layers to interact between them, which might be paramount for the brain to work properly. Interestingly, this tendency is also observed in ROIs, although in this case, differences are not statistically significant. Take into account  that this could be a side effect of the missing values reported before (as we are excluding some data from the analyses due to network disconnection).
Next, we investigate what the value of λ 2 would be when reaching the transition point. With this aim, we increase the weight of the inter-layer links, as explained in the method section, using (i) a homogeneous strategy and (ii) a heterogeneous strategy. Both approximations to reach the transition point are consistent in each modality (sensors/ROIs), albeit results are opposite (figure 6): in sensors (panels (a) and(c)), for both approaches, the AD group had a smaller λ 2 when reached the transition point, followed by CG, with MCI having the largest. On the contrary, in ROIs (panels (b) and (d)), AD had the highest values of λ 2 when compared to the other two groups, and MCI the smallest. However, none of these differences was statistically significant.
Finally, we tried to fit different, nested multinomial logistic regression models (MNR models), with the aim of predicting group labels with the parameters extracted from the MN (λ 2−BSL and λ 2−T ). Importantly, we first averaged the parameters across samples with the consequent information and predictive power loss. Nonetheless, this step is necessary to conduct the logistic regression, as keeping all samples would incur in overfitting. In the context of linear regressions, keeping all samples would be analogous to duplicate subjects, increasing the percentage of cases correctly classified artificially.
MNR models are a special case of generalised linear models (GLM), with the particularity that, instead of predicting probabilities, they predict the logarithm of the odds of a certain category (relative risk), using the logit function as the canonical link function [47]. This makes them especially suitable to classify subjects and have been extensively used in machine learning approaches. In our case, the advantage of applying an MNR model is that it enables us to give a probability of membership to each group depending on the values of each parameter. Hence, we could predict a conversion ratio if any of those parameters change beyond a given value. The goal was to keep the most parsimonious model, i.e. keeping the smallest number of terms while explaining as much variance as possible. Thus, we proceeded iteratively, adding one extra term (parameter) at each step. In all cases, being π k the theoretical probability associated to each category in the response variable (CG/MCI/AD), and taking the CG as the reference category, K, we can define K − 1 non-redundant equations of the form: ln where K is the reference category (CG) and k any of the other two (MCI/AD). Each regression coefficient had one subindex (j = 1, 2, . . . , p) for each of the p independent variables (parameters added to the model) and another one (k = 1, 2, . . . , K − 1) to identify each equation. As we only had three categories, every model considered was composed of two equations, each one describing the relative risk of one subject being MCI(AD) with respect to CG, and containing a coefficient for the intercept and a separate slope on each predictor (parameter). To decide which model is preferred over the others, we observe how adding or removing a parameter reduces the deviance from the fit, calculated as: where y ij is the ith observation in the jth category (group),π ij its categorical, estimated probability and m i its corresponding sample size. The deviance parameter is obtained in GLM as the difference in the fit between the full and reduced models [48]. The full model uses a separate parameter for each sample, hence having as many parameters as subjects (it is also called 'complex model'). The reduced model includes only a set of parameters and, although its fit is always worse in comparison to the full model, it does not require as many parameters. In the case of logistic (either binary or multinomial) regression models, the deviance parameter is estimated comparing the maximum likelihood of the full model with the likelihood of the reduced one, and asymptotically follows a χ 2 distribution [49]. As mentioned above, at each step, we added one term (variable) and compared it with the previous model to test if adding one extra term reduces the deviance of the fit. To compare between models, we simply subtracted their deviances and checked the cumulative probability of the result under its χ 2 distribution with degrees of freedom equal to the difference of each model's degrees of freedom. A significant (1 − P (dev1−dev2) < α) indicates that the coefficient of the new term is significantly different from zero, and we can safely include it in the model, reducing its deviance. We first defined a simple model (mdl (1)) containing only demographic variables (age and sex) to predict group labels. More details regarding their deviances and values for the coefficients can be found in the appendix, section 4. We depart from this model, adding more terms for each parameter under consideration. From all the other models we have fitted, the best scores are obtained when we only add one extra term for λ 2 at baseline, in both modalities. This is expected, as λ 2−BSL was the only parameter showing significant group-dependent differences (statistically significant in sensors and with a clear tendency in ROIs).
The model for sensors (mdl(2S)) is: (2) = 0.0520, indicating that adding the new term does not reduces the deviance of the fit significantly. Nonetheless, the intercept and the coefficient of λ 2 were significant in the second equation (AD vs CG), which again is expected given that these groups should be the most different.

Conclusions
In this paper, we first investigated the differences between single layer functional networks of patients suffering from MCI and AD. Next, we extended our analysis to frequency-band MN and presented an application of the algebraic connectivity (λ 2 ) as a biomarker of AD in a clinical context.
We provided a complete analysis of single layer networks, focussing on the shortest-path parameter but also analysing differences between groups in the weighted clustering coefficient, the global and local efficiency and the outreach. It is worth noting that path length-based metrics are hard to interpret [50,51], but we included them as an initial step before analyzing the multilayer structure. We found that each parameter had its own distribution over sensor and ROI space, consistent between modalities, that strongly depended on the band and group under consideration. Take into consideration that each rhythm has its own functional role, spatial pattern of synchronization [52] and complexity/entropy distribution [30]. While all network parameters showed some statistically significant differences between controls and AD, the global efficiency was the one showing more differences, both at the sensor and ROIs level. Interestingly, we reported difference between the network parameters when both modalities were compared with a particular (and general) pattern: at the sensor level, differences between groups are more significant at networks related to low-frequency bands, while at the ROIs level, differences are more significant at higher frequencies. Note that every measure used to analyse MEG/EEG signals is affected by signal leakage and source reconstruction [53][54][55], which may introduce spurious estimates of functional connectivity, biasing the estimation of the network topology. Although there are different algorithms and atlases at disposal to project brain activity, all of them have advantages and pitfalls; thus, some degree of error in the estimation of the network parameters is unavoidable. It is precisely in this regard that we analysed the functional networks obtained at the sensor and ROIs level, with the aim of detecting differences between both projections of the brain activity. Furthermore, we applied the method of circular shifting, ensuring that every weight in the network is significant in comparison to a data-driven null model of connectivity. In this regard, the choice of MI over other FC metrics less sensitive to signal leakage comes from the fact that it enables us to estimate intra and interlayer connectivity in the same scale, and under the same assumptions. Given that we expect leakage to affect the three groups similarly, we consider that results won't be biased in any direction.
Concerning the multilayer analysis, we focussed on algebraic connectivity, a parameter related to the structural and dynamic properties of MN. We constructed the αβ MN, computed their algebraic connectivity and simulated the value of this parameter at the transition point where the MN suffer a well-known transition where layers are structurally merged. We reported significant differences between controls, MCI and AD in sensors and a strong (but not statistically significant) tendency in ROIs, consistent with the former. Our inability to prove these differences at the ROIs level might come from missing values that correspond to disconnected networks (where λ 2 = 0), as such cases were discarded from our analyses. Nonetheless, both modalities were consistent, and the underlying differences in λ 2 were informative when trying to classify subjects, as predicted by our logistic regression model presented in section 3. We conclude that in the biological progression of the disease, α and β bands tend to disconnect, losing the optimal level of communication. In this direction, clinical approaches might focus on reconnecting functionally these layers. Nonetheless, more data, preferably from follow-up sessions of the same subjects, could improve our statistical power and give new hints on how this parameter evolves with the disease progression. Our goal was to explore if the interdependency between bands is associated with the disease and if it could be used as a biomarker in AD. Thus, our model included only this term along with demographic variables. In the future, we could connect this finding with other biomarkers  67  2  30  26  CG  71  1  30  27  CG  73  2  28  28  CG  75  1  27  29  CG  65  2  28  30  CG  73  1  29  31  CG  72  1  27  32  CG  80  1  27  33  CG  73  2  30  34  CG  68  2  28  35  CG  70  2  28  36  CG  68  2  30  37  CG  70  1  30  38  CG  75  2  28  39  CG  80  2  27  40  CG  71  1  30  41  CG  70  2  30  42  CG  75  1  27  43  CG  69  2  30  44  CG  74  2  30  45  CG  65  2  30  46  CG  75  2  found in the literature, and formulate a model that includes more terms from different sources: single layer network measures that differ between groups, along with MMSE scores and physiological variables, in line with previous literature aiming at quantifying the MCI-AD conversion rate [56][57][58][59][60][61]. Regarding our simulations on the transition point of the algebraic connectivity (λ 2−T ), we could not find any remarkable difference, regardless of the strategy followed (homogeneous or heterogeneous). This could reflect an ill-posed formulation of the problem, and alternative strategies might be more successful. Note that the phase transition in λ 2 has been only observed in ideal setups, where all interlayer weights have the same value, and density is fixed for every network.
Finally, it is worth noting that we consider the current study just as a starting point. We believe that more research is needed to have a full description of the problem, with longitudinal data and more patients in each group, to unveil differences to this level of detail. Also, this circumstance points towards a crucial weakness of our study in particular and many others in general: the small sample sizes, especially in the AD group. Despite the fact that two-way ANOVA revealed systematically that the effect of samples (and the interaction with group) 73  was not significant, and hence we could safely compare all datum from every subject, we require more data to give robustness to our conclusions. In future studies, it would be interesting to explore differences in the interplay between other frequency bands, building MN comprising other combinations of layers, or possibly including all of them. In our study, we chose α and β because those ranges accumulate more power in the spectrum, show clear differences in complexity and entropy [30] and have been identified as the ones with highest impairment in AD [31].

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.

A.3. Null model for λ 2
The simplest model (mdl (1)) containing only demographic variables (age and sex) to predict group labels reads: ln π MCI π CG = −3.64 + 0.053 8 × age − 0.17 × sex (15) ln π AD π CG = −22.6304 + 0.3032 × age − 0.6336 × sex. (16) The deviance of the model is dev mdl(1) = 203.8440. The coefficients express both the effects of the predictor variables on the relative risk and the logarithmic odds of being in one category versus the reference category (CG). Intercepts were −3.64(p = 0.2760) and −22.6304(p = 0.0001), with coefficients 0.0538(p = 0.2397) and 0.3032(p = 0.0001) for age and −0.1700(p = 0.6836) and −0.6336(p = 0.3265) for sex. Note how the intercept in the first equation and the coefficients for sex in both equations were not statistically significant, while the intercept and effect of age was statistically significant only when we compare AD and CG groups (second equation), which makes sense if we take into account that these two groups should differ more. With this model, we could safely state that the relative risk of being AD versus CG increases exp(0.3032) = 1.3542 times for each year increase, given that the rest of the variables are equal. Interpretations regarding sex must be taken with caution, given that the coefficients associated with this effect are not significant. Nonetheless, we will keep this term for practical purposes.