Representation and reconstruction of covariance operators in linear inverse problems

Eardi Lila; Simon Arridge; John A D Aston

doi:10.1088/1361-6420/ab8713

1. Introduction

An inverse problem is the process of recovering missing information from indirect and noisy observations. Not surprisingly, inverse problems play a central role in numerous fields such as, to name a few, geophysics (Zhdanov 2002), computer vision (Hartley and Zisserman 2004), medical imaging (Arridge 1999, Lustig et al 2008) and machine learning (De Vito et al 2005).

Solving a linear inverse problem means finding an unknown x, for instance a function or a surface, from a noisy observation y, which is a solution to the model

$\begin{equation}y=\mathcal{K}x+\varepsilon ,\end{equation} \tag{ 1 }$

where y and ɛ belong to an either finite or infinite dimensional Banach space. The map $\mathcal{K}$ is called a forward operator and is generally assumed to be known, although its uncertainty has also been taken into account in the literature (Arridge et al 2006, Golub and van Loan 1980, Gutta et al 2019, Kluth and Maass 2017, Lehikoinen et al 2007, Nissinen et al 2009, Zhu et al 2011). The term ɛ represents observational error.

Problem 1 is a well-studied problem within applied mathematics (for early works in the field, see Adorf 1995, Calderón 1980, Geman 1990). Its main difficulties arise from the fact that, in practical situations, an inverse of the forward operator does not exist, or if it does, it amplifies the noise term. For this reason such a problem is called ill-posed. Consequently, the estimation of the function x in (1) is generally tackled by minimizing a functional which is the sum of a data (fidelity) term and a regularizing term encoding prior information on the function to be recovered (see, among others, Cavalier 2008, Hu and Jacob 2012, Lefkimmiatis et al 2012, Mathé and Pereverzev 2006, Tenorio 2001). For convex optimization functionals, modern efficient optimization methods can be applied (Beck and Teboulle 2009, Boyd et al 2010, Burger et al 2016, Chambolle and Pock 2011, Chambolle and Pock 2016). Alternatively, when it is important to assess the uncertainty associated with the estimates, a Bayesian approach could be adopted (Calvetti and Somersalo 2007, Kaipio and Somersalo 2005, Repetti et al 2019, Stuart 2010). The deep convolutional neural network approach has also been applied to this setting (Jin et al 2017, McCann et al 2017).

In imaging sciences, it is sometimes of interest to find an optimal representation and perform statistics on the second order information associated with the functional samples, i.e. the covariance operators describing the variability of the underlying functional images. This is, for instance, the case in a number of areas of neuroimaging, particularly those investigating functional connectivity. In this work, we establish a framework for reconstructing and optimally representing indirectly observed samples ${\mathcal{C}}_{1},\dots ,\;{\mathcal{C}}_{n}$ , that are covariance operators, expressing the second order properties of the underlying unobserved functions. The indirect observations are covariance operators generated by the model

$\begin{equation}{\mathcal{S}}_{i}={\mathcal{K}}_{i}{\circ}{\mathcal{C}}_{i}{\circ}{\mathcal{K}}_{i}^{{\ast}}+{\mathcal{E}}_{i},\quad i=1,\dots ,\;n,\end{equation} \tag{ 2 }$

where ${\mathcal{K}}_{i}^{{\ast}}$ denotes the adjoint operator and the term ${\mathcal{E}}_{i}$ models observational error. The term ${\mathcal{K}}_{i}{\circ}{\mathcal{C}}_{i}{\circ}{\mathcal{K}}_{i}^{{\ast}}$ represents the covariance operator of ${\mathcal{K}}_{i}{X}^{\left(i\right)}$ , with X⁽ⁱ⁾ an underlying random function whose covariance operator is ${\mathcal{C}}_{i}$ .

As opposed to more classical linear inverse problems formulations, problem 2 introduces the following additional difficulties:

We are in a setting where each sample is a high-dimensional object that is a covariance operator; it is important to take advantage of the information from all the samples to reconstruct and represent each of them.
The elements $\left\{{\mathcal{C}}_{i}\right\}$ and $\left\{{\mathcal{S}}_{i}\right\}$ live on non-Euclidean spaces, as they belong to the positive semidefinite cone, and it is important to account for this manifold structure in the formulation of the associated estimators.
In an inverse problem setting it is fundamental to be able to introduce spatial regularization, however it is not obvious how to feasibly construct a regularizing term for covariance operators reflecting, for instance, smoothness assumptions on the underlying functional images.

More general non-Euclidean settings could also be accommodated. Specifically, the error term could be defined on a tangent space and mapped to the original space through the exponential mapping. Another setting of interest is the case of error terms that push the observables out of the original space. In our applications this is not an issue, as the contaminated observations are themselves empirical covariance matrices, which belong to the non-Euclidean space of positive semidefinite matrices.

We tackle problem 2 by generalizing the concept of principal component analysis (PCA) to optimally represent and understand the variation associated with samples that are indirectly observed covariance operators. The proposed model is also able to deal with the simpler case of samples that are indirectly observed functional images belonging to a linear functional space.

1.1. Motivating application—functional connectivity

In recent years, statistical analysis of covariance matrices has gained a predominant role in medical imaging and in particular in functional neuroimaging. In fact, covariance matrices are the natural objects to represent the brain's functional connectivity, which can be defined as a measure of covariation, in time, of the cerebral activity among brain regions. While many techniques have been proposed to describe functional connectivity, almost all can be described in terms of a function of a covariance or related matrix.

Covariance matrices representing functional connectivity can be computed from the signals arising from functional imaging modalities. The choice of a specific functional imaging modality is generally driven by the preference to have high spatial resolution signals, and thus high spatial resolution covariance matrices, versus high temporal resolution, and thus the possibility to study the temporal dynamic of the covariance matrices. Functional magnetic resonance falls in the first category, while electroencephalogram (EEG) and magnetoencephalography (MEG) in the second. However, high temporal resolution does generally come at the price of indirect measurements and, as shown in figure 1 in the case of MEG data, the signals are in practice detected on the sensors space. It is however of interest to produce results on the associated signals on the cerebral cortex, which we will refer to as brain space. The signals on the brain space are functional images whose domain is the geometric representation of the brain and are associated with the neuronal activity on the cerebral cortex. We borrow here the notion of brain space and sensors space from Johnstone and Silverman (1990) and we use it throughout the paper for convenience, however it is important to highlight that the formulation of the problem is much more general than the setting of this specific application.

**Figure 1.** On the top left, head model of a subject and superimposition of the 248 MEG sensors positioned around the head, called 'sensors space'. On the top right, brain model of the same subject represented by a triangular mesh of 8k nodes, which represents the 'brain space'. On the bottom left, an example of a synthetic signal detected by the MEG sensors. The dots represent the sensors, the color map represents the signal detected by the sensors. On the bottom right, intensity of the reconstructed signal on the triangular mesh of the cerebral cortex.
Download figure:
Standard image High-resolution image

**Figure 2.** Covariance matrices of the signal detected by the MEG sensors from three different subjects of the human connectome project. The size of the matrices is 248 × 248. The dark blue bands represent missing data, which are due to the exclusion of some channels after a quality check of the signal.
Download figure:
Standard image High-resolution image

The signals on the brain space are related to the signals on the sensors space by a forward operator, derived from the physical modeling of the electrical/magnetic propagation, from the cerebral cortex to the sensors. This is generally referred to as the forward problem. For soft-field methods like EEG, MEG and functional near-infrared spectroscopy (Eggebrecht et al 2014, Ferrari and Quaresima 2012, Mosher et al 1999, Singh et al 2014, Ye et al 2009), the forward operator is defined through the solution to a partial differential equation of diffusion type. Such a mapping induces a strong degree of smoothing and consequently the corresponding inverse problem, i.e. the reconstruction of a signal on the brain space from observations in the sensors space, is strongly ill-posed. In fact, signals with fairly different intensities on the brain space, due to the diffusion effect, result in signals with similar intensities in the sensors space. In figure 1, we show an example of a signal on the brain space and the associated signal on the sensors space.

From a practical perspective, it is crucial to understand how the different parts of the brain interact, which is sometimes known as functional connectivity. A possible way to understand these interactions is by analyzing the covariance function associated with the signals describing the cerebral activity of an individual on the brain space (Fransson et al 2011, Lee et al 2013, Li et al 2009). More recently, the interest has shifted from this static approach to a dynamic approach. In particular, for a single individual, it is of interest to understand how these covariance functions vary in time. This is a particularly active field, known as dynamic functional connectivity (Hutchison et al 2013). Another element of interest is understanding how these covariance functions vary among individuals. In figure 2, we show the covariance matrices, on the sensors space, computed from the MEG signals of three different subjects.

The remainder of this paper is organized as follows. In section 2 we give a formal description of the problem. We then introduce a model for indirectly observed smooth functional images in section 3 and present the more general model associated with problem 2 in section 4. In section 5, we perform simulations to assess the validity of the estimation framework. In section 6 we apply the proposed models to MEG data and we finally give some concluding remarks in section 7.

2. Mathematical description of the problem

We now introduce the problem using our driving application as an example. To this purpose, let $\mathcal{M}$ a be a closed smooth two-dimensional manifold embedded in ${\mathbb{R}}^{3}$ , which in our application represents the geometry of the cerebral cortex. An example of such a surface is shown on the top right of figure 1. We denote with ${L}^{2}\left(\mathcal{M}\right)$ the space of square integrable functions on $\mathcal{M}$ . Define X to be a random function with values in a Hilbert functional space $\mathcal{F}\subset {L}^{2}\left(\mathcal{M}\right)$ with mean $\mu =\mathbb{E}\left[X\right]$ , finite second moment, and assume the continuity and square integrability of its covariance function ${C}_{X}\left(v,{v}^{\prime }\right)=\mathbb{E}\left[\left(X\left(v\right)-\mu \left(v\right)\right)\left(X\left({v}^{\prime }\right)-\mu \left({v}^{\prime }\right)\right)\right]$ . The associated covariance operator ${\mathcal{C}}_{X}$ is defined as ${\mathcal{C}}_{X}g={\int }_{\mathcal{M}}{C}_{X}\left(v,{v}^{\prime }\right)g\left(v\right)\mathrm{d}v$ , for all $g\in {L}^{2}\left(\mathcal{M}\right)$ . Mercer's lemma (Riesz and Szokefalvi-Nagy 1955) guarantees the existence of a non-increasing sequence {γ_r} of eigenvalues of ${\mathcal{C}}_{X}$ and an orthonormal sequence of corresponding eigenfunctions {ψ_r}, such that

$\begin{equation}{C}_{X}\left(v,{v}^{\prime }\right)=\sum _{r=1}^{\infty }{\gamma }_{r}{\psi }_{r}\left(v\right){\psi }_{r}\left({v}^{\prime }\right),\quad \forall v,{v}^{\prime }\in \mathcal{M}.\end{equation} \tag{ 3 }$

As a direct consequence, X can be expanded⁵ as $X=\mu +{\sum }_{r=1}^{\infty }{\zeta }_{r}{\psi }_{r}$ , where the random variables {ζ_r} are uncorrelated and are given by ${\zeta }_{r}={\int }_{\mathcal{M}}\left\{X\left(v\right)-\mu \left(v\right)\right\}{\psi }_{r}\left(v\right)\mathrm{d}v$ . The collection {ψ_r} defines the modes of variation of the random function X, in descending order of strength, and these are called principal component (PC) functions. The associated random variables {ζ_r} are called PC scores. Moreover, the defined PC functions are the best finite basis approximation in the L²-sense, therefore for any fixed $R\in \mathbb{N}$ , the first R PC functions of X minimize the reconstruction error, i.e.

$\begin{equation}{\left\{{\psi }_{r}\right\}}_{r=1}^{R}=\underset{\left({\left\{{\phi }_{r}\right\}}_{r=1}^{R}:\langle {\phi }_{r},{\phi }_{{r}^{\prime }}\rangle ={\delta }_{r{r}^{\prime }}\right)}{\text{arg}\text{min}}\mathbb{E}{\int }_{\mathcal{M}}{\left\{X\left(v\right)-\mu \left(v\right)-\sum _{r=1}^{R}\langle X-\mu ,{\phi }_{r}\rangle {\phi }_{r}\left(v\right)\right\}}^{2}\;\mathrm{d}v,\end{equation} \tag{ 4 }$

where ⟨⋅, ⋅⟩ denotes the ${L}^{2}\left(\mathcal{M}\right)$ inner product and δ_rr' is the Kronecker delta; i.e. δ_rr' = 1 for r = r' and 0 otherwise.

2.1. The case of indirectly observed functions

In the case of indirect observations, the signal is detectable only through s sensors on the sensors space. Let {K_l : l = 1, ..., m} be a collection of s × p real matrices, representing the potentially sample-specific forward operators relating the signal at p pre-defined points {v_j : j = 1, ..., p} on the cortical surface $\mathcal{M}$ with the signal captured by the s sensors. The matrices {K_l} are discrete versions of the forward operator $\mathcal{K}$ introduced in section 1. Moreover, define the evaluation operator ${\Psi}:\mathcal{F}\to {\mathbb{R}}^{p}$ to be a vector-valued functional that evaluates a function $f\in \mathcal{F}$ at the p pre-specified points $\left\{{v}_{j}\right\}\subset \mathcal{M}$ , returning the p dimensional vector ${\left(f\left({v}_{1}\right),\dots ,\;f\left({v}_{p}\right)\right)}^{T}$ . The operators Ψ and {K_l} are known. However, in the described problem the random function X can be observed only through indirect measurements $\left\{{\mathbf{y}}_{l}\in {\mathbb{R}}^{s}:l=1,\dots ,\;m\right\}$ generated from the model

$\begin{equation}\begin{cases}_{l}\hfill & =\mu +\sum _{r=1}^{\infty }{\zeta }_{l,r}{\psi }_{r}\hfill \\ {\mathbf{y}}_{l}\hfill & ={K}_{l}{\Psi}{x}_{l}+{\boldsymbol{\varepsilon }}_{l},\quad l=1,\dots ,\;m\hfill \end{cases}\end{equation} \tag{ 5 }$

where {x_l} are m independent realizations of X, and thus expandible in terms of the PC functions {ψ_r} and the coefficients {ζ_l,r} given by ${\zeta }_{l,r}={\int }_{\mathcal{M}}\left\{{x}_{l}\left(v\right)-\mu \left(v\right)\right\}{\psi }_{r}\left(v\right)\mathrm{d}v$ . The terms {ɛ_l} represent observational errors and are independent realizations of an s-dimensional normal random vector, with mean the zero vector and variance σ²I_p, where I_p denotes the p-dimensional identity matrix.

We consider the problem of estimating the PC functions {ψ_r} in (5), and associated scores {ζ_l,r}, from the observations {y_l}. In figure 3 we give an illustration of the introduced setting. Note that it would not be necessary to define the evaluation operator if the forward operators were defined to be functionals $\left\{{\mathcal{K}}_{l}:\mathcal{F}\to {\mathbb{R}}^{p}\right\}$ , relating directly the functional objects on the brain space to the real vectors on the sensors space. It is however the case that the operators {K_l} are computed in a matrix form by third party software (see section 6 for details) for a pre-specified set of points $\left\{{v}_{j}\right\}\subset \mathcal{M}$ and it is thus convenient to take this into account in the model through the introduction of an evaluation operator Ψ.

In the case of single subject studies, the surface $\mathcal{M}$ is the subject's reconstructed cortical surface, an example of which is shown on the right panel of figure 1. In this case, it is natural to assume that there is one common forward operator K for all the detected signals. In the more general case of multi-subject studies, $\mathcal{M}$ is assumed to be a template cortical surface. We are thus assuming that the individual cortical surfaces have been registered to the template $\mathcal{M}$ , which means that there is a smooth and one-to-one correspondence between the points on each individual brain surface and the template surface $\mathcal{M}$ , where the PC functions are defined.

However, notice that when it comes to the computation of the forward operators, we are not assuming the brain geometries of the single subjects to be all equal to a geometric template, as in fact the model in (5) allows for sample-specific forward operators {K_l}. The individual cortical surfaces could also have different number of mesh points, in that case the subject-specific 'resampling' operator could be absorbed into the definition of sample-specific evaluation operators {Ψ_l}.

The estimation of the PC functions in (5) has been classically dealt with by reconstructing each observation x_l independently and subsequently performing PCA. However, such an approach can be sub-optimal in particular in a low signal-to-noise setting, as when estimating one signal, the information from all the other sampled signals is systematically ignored. The statistical analysis of data samples that are random functions or surfaces, i.e. functional data, has also been explored in the functional data analysis (FDA) literature (Ramsay and Silverman 2005), however, most of those works focus on the setting of fully observed functions. An exception to this is the sparse FDA literature (see e.g. Yao et al 2005), where instead the functional samples are assumed to be observable only through irregular and noisy evaluations.

In the case of direct but noisy observations of a signal, previous works on statistical estimation of the covariance function, and associated eigenfunctions, have been made, for instance, in Bunea and Xiao (2015) for regularly sampled functions and in Huang et al (2008), Yao et al (2005) for sparsely sampled functions. A generalization to functions whose domain is a manifold is proposed in Lila et al (2016) and appropriate spatial coherence is introduced by penalizing directly the eigenfunctions of the covariance operator to be estimated, i.e. the PC functions. In the indirect observations setting, Tian et al (2012) propose a separable model in time and space for source localization. The estimation of PC functions of functional data in a linear space and on linear domains, from indirect and noisy samples, has been previously covered in Amini and Wainwright (2012). They propose a regularized M-estimator in a reproducing kernel Hilbert space (RKHS) framework. Due to the fact that in practice the introduction of an RKHS relies on the definition of a kernel, i.e. a covariance function on the domain, this approach cannot be easily extended to non-linear domains. In Katsevich et al (2015), driven by an application to cryo-electron microscopy, the authors propose an unregularized estimator for the covariance matrix of indirectly observed functions. However, a regularized approach is crucial in our setting, due to the strong ill-posedness of the inverse problem considered. In the discrete setting, also other forms of regularization have been adopted, e.g. sparsity on the inverse covariance matrix (Friedman et al 2008, Liu and Zhang 2019).

2.2. The case of indirectly observed covariance operators

A natural generalization of the setting introduced in the previous section is considering observations that have group specific covariance operators. In detail, suppose now we are given a set of n covariance functions {C_i : i = 1, ..., n}, representing the underlying covariance operators $\left\{{\mathcal{C}}_{i}:i=1,\dots ,\;n\right\}$ on the brain space. In our driving application, each covariance function ${C}_{i}:\mathcal{M}{\times}\mathcal{M}\to \mathbb{R}$ describes the functional connectivity of the ith individual or the functional connectivity of the same individual at the ith time-point.

We consider the problem of defining and estimating a set of covariance functions, that we call PC covariance functions, which enable the description of {C_i} through the 'linear combinations' of few components. Such a reduced order description is of interest, for example, in understanding how functional connectivity varies among individuals or over time.

We define a model for the PC covariance functions of {C_i} from the set of indirectly observed covariance matrices, computed from the signals on the sensors space, and thus given by $\left\{{S}_{i}\in {\mathbb{R}}^{s{\times}s},i=1,\dots ,\;n\right\}$ with

$\begin{equation}{S}_{i}={K}_{i}{\mathbb{C}}_{i}{K}_{i}^{T}+{E}_{i}^{T}{E}_{i},\quad i=1,\dots ,\;n,\end{equation} \tag{ 6 }$

where ${\mathbb{C}}_{i}={\left({C}_{i}\left({v}_{j},{v}_{{j}^{\prime }}\right)\right)}_{j{j}^{\prime }}$ , and {v_j : j = 1, ..., p} are the sampling points associated with the operator Ψ. The forward operators {K_i} act on both sides of the covariance functions {C_i}, due to the linear transformation K_iΨ applied to the signals on the brain space before being detected on the sensors space. The term ${E}_{i}^{T}{E}_{i}$ is an error term, where E_i is an s × s matrix such that each entry is an independent sample of a Gaussian distribution with mean zero and standard deviation σ. Model (6) could be regarded as an implementation of the idealized problem 2, where the covariance operators are represented by the associated covariance functions. An illustration of the setting introduced can be found in figure 4.

**Figure 4.** Illustration of the setting introduced with model (6).
Download figure:
Standard image High-resolution image

The problem introduced in this section has not been extensively covered in the literature. In the discrete case, Dryden et al (2009) introduce a tangent PCA model for directly observed covariance matrices. An extension to directly observed covariance operators has been proposed in Pigoli et al (2014). Also related to our work is the setting considered in Petersen and Müller (2019), where the authors propose a regression framework for responses that are random objects (e.g. covariance matrices) with Euclidean predictors. The proposed regression model is applied to study associations between age and low-dimensional correlation matrices, representing functional connectivity, which have been computed from a parcellation of the brain. In section 4, we propose a novel PCA approach for indirectly observed high-dimensional covariance matrices.

3. Principal components of indirectly observed functions

The aim of this section is to define a model for the estimation of the PC functions {ψ_r} from the observations {y_l}, defined in (5). Although the model proposed in this section is not the main contribution of this work, it allows us to introduce some of the concepts necessary to the definition of the more general model for indirectly observed covariance functions in section 4.

3.1. Model

Now let $\mathbf{z}={\left({z}_{1},\dots ,\;{z}_{m}\right)}^{T}$ be an m-dimensional real column vector and ${H}^{2}\left(\mathcal{M}\right)$ be the Sobolev space of functions in ${L}^{2}\left(\mathcal{M}\right)$ with first and second distributional derivatives in ${L}^{2}\left(\mathcal{M}\right)$ . From now on $\mathcal{F}$ is instantiated with ${H}^{2}\left(\mathcal{M}\right)$ . We propose to estimate $\hat{f}\in {H}^{2}\left(\mathcal{M}\right)$ , the first PC function of X, and the associated PC scores vector z, by solving the equation

$\begin{equation}\left(\hat{\mathbf{z}},\hat{f}\right)=\underset{\mathbf{z}\in {\mathbb{R}}^{m},f\in {H}^{2}\left(\mathcal{M}\right)}{\text{arg}\text{min}}\sum _{l=1}^{m}{{\Vert}{\mathbf{y}}_{l}-{z}_{l}{K}_{l}{\Psi}f{\Vert}}^{2}+\lambda {\mathbf{z}}^{T}\mathbf{z}{\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f,\end{equation} \tag{ 7 }$

where ||⋅|| is the Euclidean norm and Δ is the Laplace–Beltrami operator, which enables a smoothing regularizing effect on the PC function $\hat{f}$ . The data fit term encourages K_lΨf to capture the strongest mode of variation of {y_l}. The parameter λ controls the trade-off between the data fit term of the objective function and the regularizing term. The second PC function can be estimated by classical deflation methods, i.e. by applying model (7) on the residuals $\left\{{\mathbf{y}}_{l}-{\hat{z}}_{l}{K}_{l}{\Psi}\hat{f}\right\}$ , and so on for the subsequent PCs. The proposed model can be interpreted as a regularized least square estimation of the first PC function ψ₁ in (5), with the terms {z_l} playing the role of estimates of the variables {ζ_l,1}.

In the simplified case of a single forward operator K := K₁ =⋯= K_m, the minimization problem (7) can be reformulated in a more classical form. In fact, fixing f in (7) and minimizing over z gives

$\begin{equation}{z}_{l}=\frac{{\mathbf{y}}_{l}^{T}K{\Psi}f}{{{\Vert}K{\Psi}f{\Vert}}^{2}+\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f},\quad l=1,\dots ,\;m,\end{equation} \tag{ 8 }$

which can then be used to show that the minimization problem (7) is equivalent to maximizing

$\begin{equation}\frac{{\left({\Psi}f\right)}^{T}{K}^{T}{\mathbb{Y}}^{T}\mathbb{Y}K\left({\Psi}f\right)}{{{\Vert}K{\Psi}f{\Vert}}^{2}+\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f},\end{equation} \tag{ 9 }$

with $\mathbb{Y}$ an m × s real matrix, where the lth row of $\mathbb{Y}$ is the observation ${\mathbf{y}}_{l}^{T}$ . This reformulation gives further insights on the interpretation of $\hat{f}$ in (7). In fact, $\hat{f}$ is such that $K{\Psi}\hat{f}$ maximizes ${\left(K{\Psi}\hat{f}\right)}^{T}\left\{\frac{1}{m}{\mathbb{Y}}^{T}\mathbb{Y}\right\}\left(K{\Psi}\hat{f}\right)$ subject to a norm constraint. The term $\left\{\frac{1}{m}{\mathbb{Y}}^{T}\mathbb{Y}\right\}$ is the empirical covariance matrix in the sensors space. The term z^Tz in (7) places the regularization term $\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f$ in the denominator of the equivalent formulation (9). Thus, $\hat{f}$ is regularized by the choice of norm in the denominator of (9), in a similar fashion to the classic functional principal component formulation of Silverman (1996). Ignoring the spatial regularization, the point-wise evaluation of the PC function Ψf in (9) can be interpreted as the first PC vector computed from the dataset of backprojected data ${\left[{K}_{1}^{T}{\mathbf{y}}_{1},\dots ,\;{K}_{m}^{T}{\mathbf{y}}_{m}\right]}^{T}$ , similarly to what is proposed in Dobriban et al (2017) in the context of optimal prediction.

3.2. Algorithm

Here we propose a minimization approach for the objective function in (7), which we approach by alternating the minimization of z and f in an iterative algorithm. In (7), a normalization constraint must be considered to make the representation unique, as in fact multiplying z by a constant and dividing f by the same constant does not change the objective function. We optimize in z under the constraint ||z|| = 1, which leads to a normalized version of the estimator (8):

$\begin{equation}{z}_{l}=\frac{{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}f}{\sqrt{{\sum }_{l=1}^{m}{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}f}},\quad l=1,\dots ,\;m.\end{equation} \tag{ 10 }$

For a given z, solving (7) with respect to f will turn out to be equivalent to solving an inverse problem, which we discretize adopting a mixed finite elements approach (Azzimonti et al 2014). Specifically, consider now a triangulated surface ${\mathcal{M}}_{\mathcal{T}}$ , union of the finite set of triangles $\mathcal{T}$ , giving an approximated representation of the manifold $\mathcal{M}$ . We then consider the linear finite element space V consisting of a set of globally continuous functions over ${\mathcal{M}}_{\mathcal{T}}$ that are affine where restricted to any triangle τ in $\mathcal{T}$ , i.e.

$\begin{equation*}V=\left\{v\in {C}^{0}\left({\mathcal{M}}_{\mathcal{T}}\right):{v\vert }_{\tau }\;\text{is}\;\text{affine}\;\text{for}\;\text{each}\;\tau \in \mathcal{T}\right\}.\end{equation*}$

This space is spanned by the nodal basis ϕ₁, ..., ϕ_κ associated with the nodes ξ₁, ..., ξ_κ, corresponding to the vertices of the triangulation ${\mathcal{M}}_{\mathcal{T}}$ . Such basis functions are Lagrangian, meaning that ϕ_i(ξ_j) = 1 if i = j and ϕ_i(ξ_j) = 0 otherwise. Setting $\mathbf{c}={\left(f\left({\xi }_{1}\right),\dots ,\;f\left({\xi }_{\kappa }\right)\right)}^{T}$ and $\boldsymbol{\phi }={\left({\phi }_{1},\dots ,\;{\phi }_{\kappa }\right)}^{T}$ , every function f ∈ V has the form

$\begin{equation}f\left(v\right)=\sum _{k=1}^{\kappa }f\left({\xi }_{k}\right){\phi }_{k}\left(v\right)={\mathbf{c}}^{T}\boldsymbol{\phi }\left(v\right)\end{equation} \tag{ 11 }$

for all $v\in {\mathcal{M}}_{\mathcal{T}}$ . To ease the notation, we assume that the p points {v_j} associated with the evaluation operator Ψ coincide with the nodes of the triangular mesh ξ₁, ..., ξ_κ, and thus we have that the coefficients c are such that c = Ψf for any f ∈ V. Consequently, we are assuming the forward operators {K_l} to be s × κ matrices, relating the κ points on the cortical surface of the ith sample, in one-to-one correspondence to ξ₁, ..., ξ_κ, to the s-dimensional signal detected on the sensors for the ith sample.

Let now M and A be the mass and stiffness κ × κ matrices defined as ${\left(M\right)}_{j{j}^{\prime }}={{\int }_{\mathcal{M}}}_{\mathcal{T}}{\phi }_{j}{\phi }_{{j}^{\prime }}$ and ${\left(A\right)}_{j{j}^{\prime }}={{\int }_{\mathcal{M}}}_{\mathcal{T}}{{\nabla }_{\mathcal{M}}}_{\mathcal{T}}{\phi }_{j}\cdot {{\nabla }_{\mathcal{M}}}_{\mathcal{T}}{\phi }_{{j}^{\prime }}$ , where ${\nabla }_{\mathcal{M}}$ is the gradient operator on the manifold $\mathcal{M}$ . Practically, ${{\nabla }_{\mathcal{M}}}_{\mathcal{T}}{\phi }_{j}$ is a constant function on each triangle of ${\mathcal{M}}_{\mathcal{T}}$ , and can take an arbitrary value on the edges⁶.

Let $h={\mathrm{max}}_{\tau \in \mathcal{T}}\left(\mathrm{d}\mathrm{i}\mathrm{a}\mathrm{m}\left(\tau \right)\right)$ denote the maximum diameter of the triangles forming ${\mathcal{M}}_{\mathcal{T}}$ , then the solution ${\hat{f}}_{h}$ of (7), in the discrete space V, is given by the following proposition.

Proposition 1. The surface finite element solution ${\hat{f}}_{h}\in V$ of model (7), for a given unitary norm vector z, is ${\hat{f}}_{h}={\hat{\mathbf{c}}}^{T}\boldsymbol{\phi }$ where $\hat{\mathbf{c}}$ is the solution of

$\begin{equation}\hat{\mathbf{c}}={\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}+\lambda A{M}^{-1}A\right)}^{-1}\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}.\end{equation} \tag{ 12 }$

Equation (12) has the form of a penalized regression, where the discretized version of the penalty term is AM⁻¹A.

The sparsity of the linear system (12), namely the number of zeros, depends on the sparsity of its components. The matrices M and A are very sparse, however M⁻¹ is not, in general. To overcome this problem, in the numerical analysis of partial differential equations literature, the matrix M⁻¹ is generally replaced with the sparse matrix ${\tilde {M}}^{-1}$ , where $\tilde {M}$ is the diagonal matrix such that ${\tilde {M}}_{jj}={\sum }_{{j}^{\prime }}{M}_{j{j}^{\prime }}$ (Fried and Malkus 1975, Zienkiewicz et al 2013). The penalty operator $A{\tilde {M}}^{-1}A$ approximates very well the behavior of AM⁻¹A.

Moreover, in the case of longitudinal studies that involve only one subject, we have a single forward operator K := K₁ =⋯= K_m common to all the observed signals, and consequently equation (12) can be rewritten as the sparse overdetermined system

$\begin{equation}\left[\begin{bmatrix}{cc}\hfill K\hfill & \hfill \hfill \\ \hfill \sqrt{\lambda }{\tilde {M}}^{-1/2}A\hfill \end{bmatrix}\right]\mathbf{c}=\left[\begin{bmatrix}{c}\hfill {Y}^{T}\mathbf{z}\hfill \\ \hfill \mathbf{0}\hfill \end{bmatrix}\right],\end{equation} \tag{ 13 }$

to be interpreted in a least-square sense. A sparse QR solver can be finally applied to efficiently solve the linear system (13).

In algorithm 1 we summarize the main algorithmic steps to compute the PC functions and associated PC scores for indirectly observed functions. The initializing scores z can be chosen either at random or, when there is a correspondence between the detectors of different samples (e.g. K₁ =⋯= K_m), with the scores obtained by performing PCA on the observations in the sensors space.

Algorithm 1. Inverse problems—PCA algorithm

1: Initialization:

(a) Computation of M and A

(b) Initialize z, the scores vector associated with the first PC function

2: PC function's estimation: compute c such that

$\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}+\lambda An{M}^{-1}A\right)\mathbf{c}=\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}$ $\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}+\lambda An{M}^{-1}A\right)\mathbf{c}=\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}$

${f}_{h}{\leftarrow}{\mathbf{c}}^{T}\boldsymbol{\phi }$ ${f}_{h}{\leftarrow}{\mathbf{c}}^{T}\boldsymbol{\phi }$

3: Scores estimation:

${z}_{l}{\leftarrow}\frac{{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}{f}_{h}}{\sqrt{{\sum }_{l=1}^{m}{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}{f}_{h}}},\quad l=1,\dots ,\;m$ ${z}_{l}{\leftarrow}\frac{{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}{f}_{h}}{\sqrt{{\sum }_{l=1}^{m}{\mathbf{y}}_{l}^{T}{K}_{l}{\Psi}{f}_{h}}},\quad l=1,\dots ,\;m$

4: Repeat steps 2 and 3 until convergence

3.3. Eigenfunctions of indirectly observed covariance operators

Suppose now we are in the case of a single forward operator K. Combining steps 2 and 3 of algorithm 1, and moving the normalization step from (z_l) to f_h, we obtain the iterations

$\begin{align*}\hfill & \left({K}^{T}K+\lambda A{M}^{-1}A\right)\mathbf{c}={K}^{T}\sum _{l=1}^{m}\left({\mathbf{y}}_{l}{\mathbf{y}}_{l}^{T}\right)K{\Psi}{f}_{h}\hfill \\ \hfill & {f}_{h}{\leftarrow}{\mathbf{c}}^{T}\boldsymbol{\phi };\;{f}_{h}\to \frac{{f}_{h}}{{\Vert}{f}_{h}{\Vert}}.\hfill \end{align*}$

The obtained algorithm depends on the data only through ${\sum }_{l=1}^{m}{\mathbf{y}}_{l}{\mathbf{y}}_{l}^{T}$ that up to a constant is the covariance matrix computed on the sensors space. The proposed algorithm can thus be applied to situations where the observations {y_l} are not available, but we are given only the associated s × s covariance matrix on the sensors space, computed from {y_l}. This could be of interest in situations where the temporal resolution is very high and the spatial resolution is low, therefore it is convenient to store the covariance matrix rather than the entire set of observations.

4. Reconstruction and representation of indirectly observed covariance operators

Consider now n sample covariance matrices S₁, ..., S_n, each of size s × s, representing n different connectivity maps on the sensors space. Three of such covariance matrices, associated with three different individuals, are shown in figure 2. Recall moreover that we denote with $\mathcal{M}$ the brain surface template and with $\left\{{K}_{i}\in {\mathbb{R}}^{s{\times}p}\right\}$ the set of subject-specific forward operators, relating the signal at the p pre-specified points {v_j} on the cortical surface $\mathcal{M}$ with the signal detected on the s sensors.

The aim of this section is to introduce a model for the reconstruction and representation of the covariance functions {C_i}, on the brain space, associated with the actually observed covariance matrices {S_i}, on the sensors space. The matrices {S_i} are related to the covariance functions {C_i} through formula (6) that we recall here being

$\begin{equation*}{S}_{i}={K}_{i}{\mathbb{C}}_{i}{K}_{i}^{T}+{E}_{i}^{T}{E}_{i},\quad i=1,\dots ,\;n,\end{equation*}$

with ${\mathbb{C}}_{i}={\left({C}_{i}\left({v}_{j},{v}_{l}\right)\right)}_{jl}$ , and {v_j} the sampling points associated with the operator Ψ.

First, in section 4.1, we see how the PC model introduced in section 3 could be applied to individually reconstruct the covariance functions {C_i}. In section 4.2, we introduce a population model that achieves both reconstruction and joint representation of the underlying covariance functions {C_i}.

4.1. A subject-specific model

Let ${S}_{i}^{1/2}$ be a square-root decomposition of S_i, which is a decomposition such that ${S}_{i}={\left({S}_{i}^{1/2}\right)}^{T}{S}_{i}^{1/2}$ , for all i = 1, ..., n. This could be given, for instance, by ${S}_{i}^{1/2}={D}_{i}^{1/2}{V}_{i}^{T}$ where ${S}_{i}={V}_{i}{D}_{i}{V}_{i}^{T}$ is the spectral decomposition of S_i and ${D}_{i}^{1/2}$ denotes the diagonal matrix whose entries are the square-root of the (non-negative) entries of D_i. Each square-root decomposition ${S}_{i}^{1/2}$ can be interpreted as a data-matrix whose empirical covariance is S_i. Another possible choice for the square-root decompositions is ${S}_{i}^{1/2}={V}_{i}{D}_{i}^{1/2}{V}_{i}^{T}$ . The output of the proposed algorithms will not depend on the specific choice of the square-root decompositions.

In the most general setting, each covariance matrix S_i is an indirect observation of an underlying covariance function C_i, which can be expressed in terms of its spectral decomposition as

$\begin{equation*}{C}_{i}\left(v,{v}^{\prime }\right)=\sum _{r=1}^{\infty }{\gamma }_{ir}{\psi }_{ir}\left(v\right){\psi }_{ir}\left({v}^{\prime }\right),\quad \forall v,{v}^{\prime }\in \mathcal{M},\end{equation*}$

where, for each i, γ_i1 ⩾ γ_i2 ⩾⋯⩾ 0 is a sequence of non-increasing variances and ${\left\{{\psi }_{ir}\right\}}_{r}$ a set of orthonormal eigenfunctions. Introduce now $\left\{{\hat{f}}_{i}\in {H}^{2}\left(\mathcal{M}\right)\right\}$ and $\left\{{\hat{\mathbf{z}}}_{i}\in {\mathbb{R}}^{s}\right\}$ , obtained by applying model (7) to each sample independently, i.e.

$\begin{align}\hfill {\left\{\left({\hat{\mathbf{z}}}_{i},{\hat{f}}_{i}\right)\right\}}_{i}=\underset{\left\{{\mathbf{z}}_{i}\right\}\subset {\mathbb{R}}^{s},\left\{{f}_{i}\right\}\subset {H}^{2}\left(\mathcal{M}\right)}{\text{arg}\text{min}}{{\Vert}{S}_{i}^{1/2}-{\mathbf{z}}_{i}{\left({K}_{i}{\Psi}{f}_{i}\right)}^{T}{\Vert}}_{\mathrm{F}}^{2}+\lambda {{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}{f}_{i},\quad i=1,\dots ,\;n,\end{align} \tag{ 14 }$

with ||⋅||_F denoting the Frobenius matrix norm. Each estimate ${\hat{f}}_{i}$ , from model (14), can be interpreted as a regularized estimate of the leading PC function of ${S}_{i}^{1/2}$ and thus of the eigenfunction ψ_i1. The subsequent eigenfunctions can be estimated by deflation methods, i.e. by removing the estimated components ${\hat{\mathbf{z}}}_{i}{\left({K}_{i}{\Psi}{\hat{f}}_{i}\right)}^{T}$ from ${S}_{i}^{1/2}$ and reapplying model (14). This leads to a set of estimates $\left\{{\hat{f}}_{ir}\right\}$ and $\left\{{\hat{\mathbf{z}}}_{ir}\right\}$ .

The unregularized version of model (14) is equivalent to a singular value decomposition applied to each matrix ${S}_{i}^{1/2}$ independently, which would lead to a set of orthogonal estimates ${\left\{{\hat{\mathbf{z}}}_{ir}\right\}}_{r}\subset {\mathbb{R}}^{s}$ , for each i = 1, ..., n. In the regularized model orthogonality is not enforced, however the estimated PC components can be orthogonalized post-estimation by means of a QR decomposition.

Define now the empirical variances to be ${\hat{\gamma }}_{ir}={{\Vert}{\hat{f}}_{ir}{\Vert}}_{{L}^{2}\left(\mathcal{M}\right)}^{2}$ and consider the ${L}^{2}\left(\mathcal{M}\right)$ -normalized version of $\left\{{\hat{f}}_{ir}\right\}$ . An approximate representation of ${S}_{i}={\left({S}_{i}^{1/2}\right)}^{T}{S}_{i}^{1/2}$ is thus given by

$\begin{equation}{S}_{i}={K}_{i}\sum _{r}\left\{{\hat{\gamma }}_{ir}\left({\Psi}{\hat{f}}_{ir}\right){\left({\Psi}{\hat{f}}_{ir}\right)}^{T}\right\}{K}_{i}^{T},\end{equation} \tag{ 15 }$

and the associated approximate representation of C_i, in terms of $\left\{{\hat{\gamma }}_{ir}\right\}$ and $\left\{{\hat{f}}_{ir}\right\}$ , is

$\begin{equation*}{C}_{i}=\sum _{r}{\hat{\gamma }}_{ir}{\hat{f}}_{ir}\otimes {\hat{f}}_{ir},\end{equation*}$

where ${\hat{\gamma }}_{ir}$ is an estimate of the variance γ_ir and ${\hat{f}}_{ir}$ is an estimate of ψ_ir. The tensor product ${\hat{f}}_{ir}\otimes {\hat{f}}_{ir}$ is such that $\left({\hat{f}}_{ir}\otimes {\hat{f}}_{ir}\right)\left(v,{v}^{\prime }\right)={\hat{f}}_{ir}\left(v\right){\hat{f}}_{ir}\left({v}^{\prime }\right)$ for all $v,{v}^{\prime }\in \mathcal{M}$ . The regularizing terms in (14) introduce spatial coherence on the estimated $\left\{{\hat{f}}_{ir}\right\}$ and thus on the estimated eigenfunctions of {C_i}, fundamental in an inverse problems setting.

The reconstructed covariance functions {C_i} could be discretized on a dense grid, leading to a collection of covariance matrices ${\left({C}_{i}\left({v}_{j},{v}_{l}\right)\right)}_{jl}$ . Following the approach in Dryden et al (2009), a Riemannian metric could be defined on the space of covariance matrices, followed by projection of ${\left({C}_{i}\left({v}_{j},{v}_{l}\right)\right)}_{jl}$ on the tangent space centered at the sample Fréchet mean. PCA could then be carried out on vectorizations of the tangent space representations. A related approach, for covariance functions, has been adopted in Pigoli et al (2014).

However, the aforementioned approaches could be prohibitive in our setting. In fact, performing PCA on tangent space projections produces modes of variation that are geodesics passing through the mean, and whose interpretation in a high-dimensional setting is often challenging. Therefore, in the next section, we propose an alternative model that enables joint reconstruction, and representation on a 'common basis', of indirectly observed covariance functions.

4.2. A population model

Let ${\left\{{\hat{\mathbf{z}}}_{i}\right\}}_{i=1}^{n}\subset {\mathbb{R}}^{s}$ and $\hat{f}\in {H}^{2}\left(\mathcal{M}\right)$ be given by the following model:

$\begin{equation}\left(\left\{{\hat{\mathbf{z}}}_{i}\right\},\hat{f}\right)=\underset{\left\{{\mathbf{z}}_{i}\right\}\subset {\mathbb{R}}^{s},f\in {H}^{2}\left(\mathcal{M}\right)}{\text{arg min}}\sum _{i=1}^{n}{{\Vert}{S}_{i}^{1/2}-{\mathbf{z}}_{i}{\left({K}_{i}{\Psi}f\right)}^{T}{\Vert}}_{\mathrm{F}}^{2}+\lambda \sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f.\end{equation} \tag{ 16 }$

The newly defined model, as opposed to model (14), has now a subject-specific s-dimensional vector z_i and a term f that is common to all samples. As in the previous model, the subsequent components can be estimated by deflation methods, leading to a set of estimates ${\hat{f}}_{r}$ and ${\hat{\mathbf{z}}}_{ir}$ .

Define now the empirical variances to be ${\hat{\gamma }}_{ir}={{\Vert}{\hat{\mathbf{z}}}_{ir}{\Vert}}^{2}{{\Vert}{\hat{f}}_{r}{\Vert}}_{{L}^{2}\left(\mathcal{M}\right)}^{2}$ and consider the ${L}^{2}\left(\mathcal{M}\right)$ -normalized version of $\left\{{\hat{f}}_{r}\right\}$ . The empirical term in model (16) suggests an approximate representation of S_i, that is

$\begin{equation}{C}_{i}=\sum _{r}{\hat{\gamma }}_{ir}{\hat{f}}_{r}\otimes {\hat{f}}_{r},\end{equation} \tag{ 17 }$

where each underlying covariance function C_i is approximated by the sum of the product between a subject-specific constant ${\hat{\gamma }}_{ir}$ and a component ${\hat{f}}_{r}\otimes {\hat{f}}_{r}$ common to all the observations. The regularizing term in (16) introduces spatial coherence on the estimated functions $\left\{{\hat{f}}_{r}\right\}$ .

The covariance operators $\left\{{\mathcal{C}}_{i}\right\}$ are said to be commuting if ${\mathcal{C}}_{i}{\mathcal{C}}_{{i}^{\prime }}={\mathcal{C}}_{{i}^{\prime }}{\mathcal{C}}_{i}$ for all i, i' = 1, ..., n. This property can be equivalently characterized as

$\begin{equation}{C}_{i}\left(v,{v}^{\prime }\right)=\sum _{r=1}^{\infty }{\gamma }_{ir}{\psi }_{r}\left(v\right){\psi }_{r}\left({v}^{\prime }\right),\quad \forall v,{v}^{\prime }\in \mathcal{M},\end{equation} \tag{ 18 }$

with ${\left\{{\gamma }_{ir}\right\}}_{r}$ subject-specific variances and {ψ_r} a set of common orthonormal functions. Thus, a collection of commuting covariance operators is such that its covariance operators can be simultaneously diagonalized by a basis {ψ_r}. In this case, the functions $\left\{{\hat{f}}_{r}\right\}$ can be regarded as estimates of {ψ_r} and $\left\{{\hat{\gamma }}_{ir}\right\}$ estimates of {γ_ir}.

On the one hand, model (16) constrains the estimated covariances to be of the form ${C}_{i}={\sum }_{r}{\hat{\gamma }}_{ir}{\hat{f}}_{r}\otimes {\hat{f}}_{r}$ and not of the more general form ${C}_{i}={\sum }_{r}{\hat{\gamma }}_{ir}{\hat{f}}_{ir}\otimes {\hat{f}}_{ir}$ . On the other hand, such a model takes advantage of all the n samples to estimate the components $\left\{{\hat{f}}_{r}\otimes {\hat{f}}_{r}\right\}$ . Moreover, the associated variables $\left\{{\hat{\gamma }}_{ir}\right\}$ give a convenient approximate description of the ith covariance, as they are comparable across samples, as opposed to the one computed from model (14). In fact, the ith covariance function can be represented by the variance vector ${\left({\hat{\gamma }}_{i1},\dots ,\;{\hat{\gamma }}_{iR}\right)}^{T}$ , for a suitable truncation level R, where each entry is associated with the rank-one component ${\hat{f}}_{r}\otimes {\hat{f}}_{r}$ . For each r, a scatter plot of the variances ${\left\{{\gamma }_{ir}\right\}}_{i}$ , as the one in figure 14, helps understand what the average contribution of the rth components is and what its variability across samples is. Model (17) could also be interpreted as a common PCA model (Benko et al 2009, Flury 1984), as $\left\{{\hat{f}}_{r}\right\}$ are the estimated regularized eigenfunctions of the pooled covariance $C=\frac{1}{n}{\sum }_{i=1}^{n}{C}_{i}$ .

Potentially, PCA could be performed on the descriptors ${\left({\hat{\gamma }}_{i1},\dots ,\;{\hat{\gamma }}_{iR}\right)}^{T}$ to find rank-R components that maximize the variance of linear combinations of $\left\{{\hat{\gamma }}_{ir}\right\}$ (i.e. the variance of the variances). However, results would be more difficult to interpret, as they would involve variations that are rank-R covariance functions around the rank-R mean covariance function.

4.3. Algorithm

The minimization in (14), for each fixed i, is a particular case of the one in (7) (see section 3.2), so we focus on the minimization problem in (16) which is also approached in an iterative fashion. We set ${\sum }_{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}=1$ in the estimation procedure. This leads to the estimates of {z_i}, given f, that are

$\begin{equation*}{\mathbf{z}}_{i}=\frac{{\tilde {\mathbf{z}}}_{i}}{\sqrt{{\sum }_{i=1}^{n}{{\Vert}{\tilde {\mathbf{z}}}_{i}{\Vert}}^{2}}},\quad i=1,\dots ,\;n,\end{equation*}$

with

$\begin{equation*}{\tilde {\mathbf{z}}}_{i}={S}_{i}^{1/2}{K}_{i}{\Psi}{f}_{h},\quad i=1,\dots ,\;n.\end{equation*}$

The estimate of f given {z_i}, in the discrete space V introduced in section 3.2, is given by the following proposition.

Proposition 2. The surface finite element solution ${\hat{f}}_{h}\in V$ of model (16), given the vectors {z_i}, is ${\hat{f}}_{h}={\hat{\mathbf{c}}}^{T}\boldsymbol{\phi }$ where $\hat{\mathbf{c}}$ is the solution of

$\begin{equation}\left(\sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{K}_{i}^{T}{K}_{i}+\lambda A{M}^{-1}A\right)\hat{\mathbf{c}}=\sum _{i=1}^{n}{K}_{i}^{T}{\left({S}_{i}^{1/2}\right)}^{T}{\mathbf{z}}_{i}.\end{equation} \tag{ 19 }$

Algorithm 2 contains a summary of the estimation procedure. From a practical point of view, the choice to define the representation basis to be a collection of rank one (i.e. separable) covariance functions, of the type ${F}_{r}={\hat{f}}_{r}\otimes {\hat{f}}_{r}$ , is mainly driven by the following reasons. Firstly, rank-one covariance functions are easier to interpret due to their limited degrees of freedom. Secondly, on a rank one covariance function ${F}_{r}={\hat{f}}_{r}\otimes {\hat{f}}_{r}$ spatial coherence can be imposed by regularizing ${\hat{f}}_{r}$ , as in fact done for model (14), and this is fundamental in the setting of indirectly observed covariance functions. Finally, due to their size, it might not be possible to store the full reconstructions of the covariance functions {C_i} on the brain space, instead, the representation model in (17) allows for an efficient joint representation of such covariance functions in terms of rank-one components.

Algorithm 2. Inverse problems—covariance PCA algorithm.

1: Square-root decompositions

(a) Compute the representations ${S}_{1}^{1/2},\dots ,\;{S}_{n}^{1/2}$ ${S}_{1}^{1/2},\dots ,\;{S}_{n}^{1/2}$ from S₁, ..., S_n as

${S}_{i}^{1/2}={D}_{i}^{1/2}{V}_{i}^{T},$ ${S}_{i}^{1/2}={D}_{i}^{1/2}{V}_{i}^{T},$

with ${S}_{i}={V}_{i}{D}_{i}{V}_{i}^{T}$ ${S}_{i}={V}_{i}{D}_{i}{V}_{i}^{T}$ the spectral decomposition of S_i.

2: Initialization:

(a) Computation of M and A

(b) Initialize ${\left\{{\mathbf{z}}_{i}\right\}}_{i=1}^{n}$ ${\left\{{\mathbf{z}}_{i}\right\}}_{i=1}^{n}$ , the scores of the first PC

3: PC function's estimation from model (14): compute c such that

$\left(\sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{K}_{i}^{T}{K}_{i}+\lambda A{M}^{-1}A\right)\mathbf{c}=\sum _{i=1}^{n}{K}_{i}^{T}{\left({S}_{i}^{1/2}\right)}^{T}{\mathbf{z}}_{i}$ $\left(\sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{K}_{i}^{T}{K}_{i}+\lambda A{M}^{-1}A\right)\mathbf{c}=\sum _{i=1}^{n}{K}_{i}^{T}{\left({S}_{i}^{1/2}\right)}^{T}{\mathbf{z}}_{i}$

${f}_{h}{\leftarrow}{\mathbf{c}}^{T}\boldsymbol{\phi }$ ${f}_{h}{\leftarrow}{\mathbf{c}}^{T}\boldsymbol{\phi }$

4: Scores estimation from model (14):

${\mathbf{z}}_{i}{\leftarrow}{S}_{i}^{1/2}{K}_{i}{\Psi}{f}_{h},\quad i=1,\dots ,\;n$ ${\mathbf{z}}_{i}{\leftarrow}{S}_{i}^{1/2}{K}_{i}{\Psi}{f}_{h},\quad i=1,\dots ,\;n$

${\mathbf{z}}_{i}{\leftarrow}\frac{{\mathbf{z}}_{i}}{\sqrt{{\sum }_{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}}},\quad i=1,\dots ,\;n$ ${\mathbf{z}}_{i}{\leftarrow}\frac{{\mathbf{z}}_{i}}{\sqrt{{\sum }_{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}}},\quad i=1,\dots ,\;n$

5: Repeat step 3 and 4 until convergence

5. Simulations

In this section, we perform simulations to assess the performances of the proposed algorithms. To reproduce as closely as possible the application setting, the cortical surfaces and the forward operators are taken from the MEG application described in section 6. The details on the extraction and computation of such objects are left to the same section. For the same reason, the signals on the brain space considered here are vector-valued functions, specifically functions from the brain space $\mathcal{M}$ to ${\mathbb{R}}^{3}$ , as is the case in the MEG application. The proposed methodology can be trivially extended to successfully deal with this case, as shown in the following simulations.

5.1. Indirectly observed functions

We consider ${\mathcal{M}}_{\mathcal{T}}$ to be a triangular mesh, with 8k nodes, representing the cortical surface geometry of a subject, as shown on the left panel of figure 1. Each of the 8k nodes will represent a location v_j associated with the sampling operator Ψ. The locations of the nodes {v_j} on the brain space, the location of the 241 detectors on the sensors space and a model of the subject's head, enable the computation of a forward operator K describing the relation between the signal generated on the locations {v_j}, on the brain space, and the signal detected on the 241 sensors in the sensors space. In practice, the signal on each node v_j is described by a three dimensional vector, characterized by an intensity and a direction, while the signal detected on the sensors space is a scalar signal. Thus, the forward operator is a 241 × 24k matrix.

We first want to assess the performances of the proposed model in the case of indirect functional observations belonging to a linear space. To this purpose, we produce synthetic data following the generative model (5). Specifically, on ${\mathcal{M}}_{\mathcal{T}}$ , we construct the four ${L}^{2}\left({\mathcal{M}}_{\mathcal{T}}\right)$ orthonormal vector-valued functions {ψ_r = (ψ_r,1, ψ_r,2, ψ_r,3) : r = 1, ..., 4}, with ${\boldsymbol{\psi }}_{r}:{\mathcal{M}}_{\mathcal{T}}\to {\mathbb{R}}^{3}$ . These represent the PC functions to be estimated. In figure 5 we show the four components of {ψ_r} and the associated energy maps $\left\{{{\Vert}{\boldsymbol{\psi }}_{r}\left(v\right){\Vert}}^{2}:v\in {\mathcal{M}}_{\mathcal{T}}\right\}$ , with ||⋅|| denoting the Euclidean norm in ${\mathbb{R}}^{3}$ . We then generate m = 50 smooth vector-valued functions {x_l} on ${\mathcal{M}}_{\mathcal{T}}$ by

$\begin{equation*}{\mathbf{x}}_{l}={z}_{l1}{\boldsymbol{\psi }}_{1}+{z}_{l2}{\boldsymbol{\psi }}_{2}+{z}_{l3}{\boldsymbol{\psi }}_{3}+{z}_{l4}{\boldsymbol{\psi }}_{4}\quad l=1,\dots ,\;m,\end{equation*}$

where {z_lr} are i.i.d realizations of the four independent random variables {z_r ∼ N(0, γ_r) : r = 1, ..., 4}, with γ₁ = 3², γ₂ = 2.5², γ₃ = 2² and γ₄ = 1.

**Figure 5.** From top to bottom the components and the energy maps of the PC functions ψ₁, ..., ψ₄.
Download figure:
Standard image High-resolution image

The functions {x_l} are sampled at the 8k nodes, and the forward operator is applied to the sampled values, producing a collection of vectors {y_l} each of dimension 241, the number of active sensors. Moreover, on each entry of the vectors {y_l}, we add Gaussian noise with mean zero and standard deviation σ, for different choices of σ, to reproduce different signal-to-noise ratio regimes.

In the following, we compare the PC model (7) to an alternative approach that we call the naive approach. In fact, the individual functions {x_l} could be estimated from {y_l} by use of classical inverse problem estimators. Here, we adopt the estimates $\left\{{\hat{\mathbf{x}}}_{l}\right\}$ defined as

$\begin{equation}{\hat{\mathbf{x}}}_{l}=\underset{\begin{subarray}{c}\mathbf{f}=\left({f}_{1},{f}_{2},{f}_{3}\right):\\ {f}_{1},{f}_{2},{f}_{3}\in {H}^{2}\left(\mathcal{M}\right)\end{subarray}}{\text{arg}\text{min}}\sum _{l=1}^{m}{{\Vert}{\mathbf{y}}_{l}-K{\Psi}\mathbf{f}{\Vert}}^{2}+\lambda {\int }_{\mathcal{M}}{{\Vert}{{\Delta}}_{\mathcal{M}}\mathbf{f}{\Vert}}^{2},\quad l=1,\dots ,\;m,\end{equation} \tag{ 20 }$

where each ${\hat{\mathbf{x}}}_{l}$ is defined in such a way that it balances the fitting term and the regularization term in (20). Due to the fact that f is vector-valued, ${{\Vert}{{\Delta}}_{\mathcal{M}}\mathbf{f}{\Vert}}^{2}$ is defined as

$\begin{equation*}{{\Vert}{{\Delta}}_{\mathcal{M}}\mathbf{f}{\Vert}}^{2}={{\Delta}}_{\mathcal{M}}^{2}{f}_{1}+{{\Delta}}_{\mathcal{M}}^{2}{f}_{2}+{{\Delta}}_{\mathcal{M}}^{2}{f}_{3},\end{equation*}$

with $\left\{{f}_{q}:q=1,2,3\right\}$ denoting the components of f. The same penalty operator is also adopted to generalize to vector-valued functions the PC models introduced in sections 3 and 4. In this approach, the constant λ is chosen independently for each of the m functions by partitioning the 241 detectors in roughly equally sized K = 2 groups and applying K-fold cross-validation. The criterion for the optimal λ is the average reconstruction error, on the sensors space, computed on the validation groups. Once we obtain the estimates $\left\{{\hat{\mathbf{x}}}_{l}\right\}$ we can compute the estimated PC functions {ψ_r} by applying classical multivariate PC analysis on the reconstructed objects ${\hat{\mathbf{x}}}_{l}$ .

The estimates are compared to those of the proposed PC function model, as described in algorithm 1, with 15 iterations. Note that, instead, a tolerance could be fixed to test if the algorithm has converged. However, 15 iterations give satisfactory convergence levels in our simulations and application studies. We partition the m observations in equally sized K = 2 groups and perform K-fold cross-validation for the choice of the penalty. Specifically, we choose the coefficient λ that minimizes the sensors space reconstruction error, on the validation groups.

To evaluate the performances of the two approaches, we generate 100 datasets as previously detailed. The quality of the estimated rth PC function is then measured with $E\left({\boldsymbol{\psi }}_{r},{\hat{\boldsymbol{\psi }}}_{r}\right)={\sum }_{q=1}^{3}{{\Vert}{\nabla }_{\mathcal{M}}\left({\psi }_{r,q}-{\hat{\psi }}_{r,q}\right){\Vert}}^{2}$ . The results are summarized in the boxplots in figure 6, for two different signal-to-noise ratios, where the Gaussian noise has standard deviation σ = 5 and σ = 10. In figure 7 we show an example of a signal on the brain space corrupted with the specified noise levels.

**Figure 6.** On the left, a summary of the results in a medium signal-to-noise ratio regime. On the right, a summary of the results in a low signal-to-noise ratio regime. Each boxplot displays the paired differences of the estimation errors $E\left({\boldsymbol{\psi }}_{r},{\hat{\boldsymbol{\psi }}}_{r}\right)$ between the estimates of the two steps naive method and those obtained by applying algorithm 1. A paired difference greater than 0 indicates that, for the dataset in question, algorithm 1 has performed better than the two steps naive approach.
Download figure:
Standard image High-resolution image

**Figure 6.** On the left, a summary of the results in a medium signal-to-noise ratio regime. On the right, a summary of the results in a low signal-to-noise ratio regime. Each boxplot displays the paired differences of the estimation errors $E\left({\boldsymbol{\psi }}_{r},{\hat{\boldsymbol{\psi }}}_{r}\right)$ between the estimates of the two steps naive method and those obtained by applying algorithm 1. A paired difference greater than 0 indicates that, for the dataset in question, algorithm 1 has performed better than the two steps naive approach.
Download figure:
Standard image High-resolution image

**Figure 7.** From left to right, the energy map of a generated function x_l, the associated signal y_l on the sensors space with respectively no additional error, Gaussian error of standard deviation σ = 5 and Gaussian error of standard deviation σ = 10.
Download figure:
Standard image High-resolution image

The boxplots highlight the fact that the proposed approach provides better estimates of the PC functions (i.e. lower estimation errors $E\left({\boldsymbol{\psi }}_{r},{\hat{\boldsymbol{\psi }}}_{r}\right)$ ), when compared to the naive approach. Differences in the estimation error are higher in a low signal-to-noise regime, as it is for the estimation of the fourth PC function, where intuitively, the low variance associated to the PC function makes it more difficult to distinguish this structured signal from the noise component. Also surprising is the stability of the estimates of the proposed algorithm across the generated datasets, as opposed to the naive approach of reconstructing the functional observations independently, which instead returns multiple particularly unsatisfactory reconstructions. An example of such reconstructions is shown in figure 8.

**Figure 8.** On the first row the energy maps of the true four PC components to be estimated, on the second row the estimations given by the two steps naive method, and on the third row the reconstructions obtained by applying algorithm 1.
Download figure:
Standard image High-resolution image

5.2. Indirectly observed covariance functions

In this section, we consider ${\mathcal{M}}_{\mathcal{T}}$ to be a 8k nodes triangular mesh, this time representing a template geometry of the cortical surface, which is shown in figure 10. This contains only the geometric features common to all subjects. Moreover, each subject's cortical surface is also represented by a 8k nodes triangular surface, which is used, together with the locations of the 241 detectors on the sensors space, and the head model, to compute a forward operator K_i for the ith subject. The 8k nodes of each subject's triangular mesh are in correspondence with the 8k nodes of the template mesh ${\mathcal{M}}_{\mathcal{T}}$ . This allows the model to be defined on the template ${\mathcal{M}}_{\mathcal{T}}$ .

**Figure 9.** On the top row, the energy maps of ψ₁, ..., ψ₄. On the bottom row the energy maps of the estimates ${\hat{\boldsymbol{\psi }}}_{1},\dots ,\;{\hat{\boldsymbol{\psi }}}_{4}$ obtained by applying algorithm 2.
Download figure:
Standard image High-resolution image

**Figure 9.** On the top row, the energy maps of ψ₁, ..., ψ₄. On the bottom row the energy maps of the estimates ${\hat{\boldsymbol{\psi }}}_{1},\dots ,\;{\hat{\boldsymbol{\psi }}}_{4}$ obtained by applying algorithm 2.
Download figure:
Standard image High-resolution image

As in the previous section, we construct four ${L}^{2}\left({\mathcal{M}}_{\mathcal{T}}\right)$ orthonormal functions $\left\{{\boldsymbol{\psi }}_{r}=\left({\psi }_{r,1},{\psi }_{r,2},{\psi }_{r,3}\right):r=1,\dots ,\;4\right\}$ . The energy maps of $\left\{{\boldsymbol{\psi }}_{r}\right\}$ are shown in figure 9. We generate synthetic data from model (6) as follows:

$\begin{equation*}{C}_{i}=\sum _{r=1}^{4}{z}_{ir}^{2}{\boldsymbol{\psi }}_{r}\otimes {\boldsymbol{\psi }}_{r}=\sum _{r=1}^{4}{z}_{ir}^{2}\left[\begin{bmatrix}{ccc}\hfill {\psi }_{r,1}\otimes {\psi }_{r,1}\hfill & \hfill {\psi }_{r,1}\otimes {\psi }_{r,2}\hfill & \hfill {\psi }_{r,1}\otimes {\psi }_{r,3}\hfill \\ \hfill {\psi }_{r,2}\otimes {\psi }_{r,1}\hfill & \hfill {\psi }_{r,2}\otimes {\psi }_{r,2}\hfill & \hfill {\psi }_{r,2}\otimes {\psi }_{r,3}\hfill \\ \hfill {\psi }_{r,3}\otimes {\psi }_{r,1}\hfill & \hfill {\psi }_{r,3}\otimes {\psi }_{r,2}\hfill & \hfill {\psi }_{r,3}\otimes {\psi }_{r,3}\hfill \end{bmatrix}\right],\end{equation*}$

where z_i1, ..., z_i4 are i.i.d realizations of the four independent random variables $\left\{{z}_{r}\sim N\left(0,{\gamma }_{r}\right):r=1,\dots ,\;4\right\}$ , with γ₁ = 3², γ₂ = 2.5², γ₃ = 2² and γ₄ = 1. The matrix-valued form of the covariance functions arises from the fact that the observed functions on the brain space are vector-valued. Subsequently, we construct the point-wise evaluations matrices ${\mathbb{C}}_{i}\in {\mathbb{R}}^{24\mathrm{k}{\times}24\mathrm{k}}$ , from which the correspondent covariance matrices on the sensors space are defined as

$\begin{equation*}{S}_{i}={K}_{i}{\mathbb{C}}_{i}{K}_{i}^{T}+{E}_{i}^{T}{E}_{i},\quad i=1,\dots ,\;n.\end{equation*}$

The term ${E}_{i}^{T}{E}_{i}$ is an error term, where E_i is an s × s matrix with each entry that is an independent sample from a Gaussian distribution with mean zero and standard deviation 5. We then apply algorithm 2 with 15 iterations, feeding in input $\left\{{S}_{i}\right\}$ . The results are shown in figure 9, in terms of energy maps of the reconstructed functions $\left\{{\hat{\boldsymbol{\psi }}}_{r}\right\}$ . These are a close approximation of the underlying functions $\left\{{\boldsymbol{\psi }}_{r}\right\}$ . The fidelity measure ${\sum }_{q=1}^{3}{{\Vert}{\nabla }_{\mathcal{M}}\left({\psi }_{r,q}-{\hat{\psi }}_{r,q}\right){\Vert}}^{2}$ of such estimates is 6.8 × 10⁻², 6.1 × 10⁻¹, 6.8 × 10⁻¹ and 7.4 × 10⁻¹, for ψ₁, ..., ψ₄ respectively, which is comparable in term of order of magnitude to the results obtained in the case of PCs of indirectly observed functions. Across the generation of multiple datasets, results are stable, with the exception of few situations where the cross-validation approach suggests a penalization coefficient λ that under-smoothes the solution, due to very similar associated signals on the sensors space of the under-smoothed solution and the real solution. However, the cross-validation is only a possible approach to the choice of the penalization constant, and many other options have been proposed in the inverse problems literature (Vogel 2002). Some of these, however, involve visual inspection.

**Figure 10.** Top side and bottom side views of the template triangular mesh ${\mathcal{M}}_{\mathcal{T}}$ composed of 8k nodes.
Download figure:
Standard image High-resolution image

**Figure 10.** Top side and bottom side views of the template triangular mesh ${\mathcal{M}}_{\mathcal{T}}$ composed of 8k nodes.
Download figure:
Standard image High-resolution image

6. Application

In this section, we apply the developed models to the publicly available human connectome project (HCP) young adult dataset (Van Essen et al 2012). This dataset comprises multi-modal neuroimaging data such as structural scans, resting-state and task-based functional MRI scans, and resting-state and task-based MEG scans from a large number of healthy volunteers. In the following, we briefly review the pre-processing pipeline, applied to such data by the HCP, to ultimately facilitate their use.

6.1. Pre-processing

For each individual a high-resolution 3D structural MRI scan has been acquired. This returns a 3D image describing the structure of the gray and white matter in the brain. Gray matter is the source of large parts of our neuronal activity. White matter is made of axons connecting the different parts of the gray matter. If we exclude the sub-cortical structures, gray matter is mostly distributed at the outer surface of the cerebral hemispheres. This is also known as the cerebral cortex.

By segmentation of the 3D structural MRI, it is possible to separate gray matter from white matter, in order to extract the cerebral cortex structure. Subsequently a mid-thickness surface, interpolating the mid-points of the cerebral cortex, can be estimated, resulting in a 2D surface embedded in a 3D space that represents the geometry of the cerebral cortex. In practice, such a surface, sometimes referred to as cortical surface, is a triangulated surface. Moreover, from the 3D structural MRI, a surface describing the individuals' head can be extracted. The latter plays a role in the derivation of the model for the electrical/magnetic propagation of the signal from the cerebral cortex to the sensors. An example of the cortical surface of a single subject, is shown on the right panel in figure 1, instead the associated head surface and MEG sensors positions are shown on the left panel of the same figure.

Moreover, a surface based registration algorithm has been applied to register each of the extracted cortical surfaces to a triangulated template cortical surface, which is shown in figure 10. Post registration, the triangulated template cortical surface is sub-sampled to a 8k nodes surface. Moreover, the nodes on the cortical surface of each subject are also sub-sampled to a set of 8k nodes in correspondence to the 8k nodes of the template. For each subject, a 248 × 24k matrix, representing the forward operator, has been computed with FieldTrip (Oostenveld et al 2011) from its head surface, cortical surface and sensors position. Such a matrix relates the vector-valued signals in ${\mathbb{R}}^{3}$ , on the nodes of the triangulation of the cerebral cortex, to the one detected from the sensors, consisting of 248 magnetometer channels.

With the aim of studying the functional connectivity of the brain, for each subject, three 6 min resting state MEG scans have been performed, of which one session is used in our analysis. During the 6 min, data are collected from the sensors at 600k uniformly distributed time-points. Using FieldTrip, classical pre-processing is applied to the detected signals, such as low quality channels and low quality segments removal. Details of this procedure can be found in the HCP MEG reference manual. Moreover, we apply a band pass filter, limiting the spectrum of the signal to the [12.5, 29] Hz, also known as the beta waves. For the signal of each channel we compute its amplitude envelope (see figure B.1) which describes the evolution of the signal amplitude. The measure of connectivity between channels that we adopt in this work is the covariance of the amplitude envelopes. Other connectivity metrics, such as phase-based metrics, have been proposed in the literature (see Colclough et al 2016, and references therein).

6.2. Analysis

Here we apply the population model introduced in section 4.2 to the HCP MEG data. The first part of the analysis focuses on studying dynamic functional connectivity of a specific subject. For this purpose, we subdivide the 6 min session in n = 40 consecutive intervals. Each of these segments is used to compute a covariance matrix in the sensors space, resulting in n covariance matrices S₁, ..., S_n. In this setting, we have one forward operator K = K₁ =⋯= K_n. The aim is understanding the main modes of variation of the functional connectivity on the brain space of the subject. Thus, algorithm 2, with 20 iterations, is applied to S₁, ..., S_n to find the PC covariance functions.

A regularization parameter λ common to all the PC components is chosen by inspecting the plot of the regularity of the first R = 10 PC covariance functions ( ${\sum }_{r=1}^{R}{\int }_{\mathcal{M}}{{\Vert}\nabla {\boldsymbol{\psi }}_{r}{\Vert}}^{2}$ ) versus the residual norm, for different choices of the parameter. This is a version of the L-curve plot (Hansen 2000) and is shown on the left panel of figure B.2. Here we show the results for λ = 10², in the appendices we show the results for λ = 10. The energy maps of the estimated ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ resulting from the analysis are shown in figure 11. These are associated with the first three PC covariance functions ${\hat{\boldsymbol{\psi }}}_{1}\otimes {\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}\otimes {\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}\otimes {\hat{\boldsymbol{\psi }}}_{3}$ . High intensity areas, in yellow, indicate which areas present high average interconnectivity, either by means of positive or negative correlation in time.

**Figure 11.** Top side and bottom side views of the estimated energy maps ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2 to the covariance matrices computed from the MEG resting state data of a single subject on n = 40 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps. On the top right panel we highlight with red circles the areas with high average interconnectivity, which correspond to the neighborhoods of the red crossed vertices in the plot of the energy map of ψ₁.
Download figure:
Standard image High-resolution image

**Figure 11.** Top side and bottom side views of the estimated energy maps ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2 to the covariance matrices computed from the MEG resting state data of a single subject on n = 40 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps. On the top right panel we highlight with red circles the areas with high average interconnectivity, which correspond to the neighborhoods of the red crossed vertices in the plot of the energy map of ψ₁.
Download figure:
Standard image High-resolution image

In figure 12, we show the plot of variances associated with each time segment, describing the variation in time of the PC covariance functions, hence the variation in interconnectivity. The variance can be either defined on the sensors space, by normalizing the PC covariance functions $\left\{K{\hat{\boldsymbol{\psi }}}_{r}\right\}$ , with K the forward operator, or on the brain space, by normalizing the PC covariance functions on the brain space $\left\{{\hat{\boldsymbol{\psi }}}_{r}\right\}$ . Due to the presence of invisible dipoles, which are dipoles that display zero magnetic field on the sensors space, the two norms can be quite different, leading to different average variances for each PC covariance function. Due to the high sensitivity of the source space variances on the choice of the regularization parameter, we focus on the estimated variances on the sensors space.

We have also applied our model to the covariances obtained by subdividing the MEG session in n = 80 segments. As expected the PC covariance functions, shown in figure B.5 are very similar. However, the variances, in figure B.4, show higher variability in time, which can be partially explained by the fact that shorter time segments lead to covariance estimates that have higher variability.

The second part of the analysis focuses on applying the proposed methodology to a multi-subject setting. Specifically, n = 40 different subjects are considered. For each subject, the 6 min scan is used to compute a covariance matrix, resulting in n covariance matrices S₁, ..., S_n. The template geometry in figure 10 is used as a model of the brain space. Algorithm 2 is then applied to find the PC covariance functions on the template brain, associated with S₁, ..., S_n. We run the algorithm for 20 iterations, and choose the regularizing parameter to be λ = 10² by inspecting the L-curve plot in the right panel of figure B.2. The results for λ = 10 are shown in the appendices. The energy maps of the estimated functions ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ and the associated first three covariance functions ${\hat{\boldsymbol{\psi }}}_{1}\otimes {\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}\otimes {\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}\otimes {\hat{\boldsymbol{\psi }}}_{3}$ , are shown in figure 13. High intensity areas, in yellow, indicate which areas present high average connectivity. In figure 14, we show the subject-specific associated variances, both in the sensors space and the brain space.

**Figure 13.** Top side and bottom side views of the estimated energy maps ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2 to the covariance matrices computed from the MEG resting state data of n = 40 different subjects. On the right panel, the covariance functions associated with these energy maps.
Download figure:
Standard image High-resolution image

The presented methodology opens up the possibility to understand population level variation in functional connectivity, and indeed, whether, just as we need different forward operators for individuals (due to anatomical differences), we should also be considering both population and subject-specific connectivity maps when analyzing connectivity networks. In fact, it is of interest to note that in both the single and multi-subject settings, the areas with high interconnectivity, displayed in yellow in figures 11 and 13, seem to be at least partially overlapping with the brain's default network (Buckner et al 2008, Yeo et al 2011). The brain's default network consists of the brain regions known to have highly correlated hemodynamic activity (i.e. highest functional connectivity levels), and to be most active, when the subject is not performing any specific task. An image of the spatial configuration of the default network can be found, for instance, in figure 2 of Buckner et al (2008). From the plots of the associated variances in the sensors space (left panel of figures 12 and 14) we can see that these areas are also the ones that show high variability in connectivity across time or across subjects. This might suggest that the brain's default network is also the brain region that shows among the highest levels of spontaneous variability in connectivity.

The plots of the variances on the brain space (right panel of figure 14), when compared to those on the sensors space (left panel of figure 14), demonstrate that these type of studies are highly sensitive to the choice of the regularization, not only in terms of spatial configuration of the results, but also in terms of estimated variances on the brain space. With a naive 'first reconstruct and then analyze' approach, where the reconstructed data on the brain space replace those observed on the sensors space, this issue could go unnoticed, as the variability that does not fit the chosen model is implicitly discarded in the reconstruction step and does not appear in the subsequent analysis. Also, importantly, our analysis deals with statistical samples that are entire covariances, overcoming the limitations of seed-based approaches, where prior spatial information is required to choose the seed. Seed locations are usually informed by fMRI studies and this comes with the risk of biasing the analysis when comparing electrophysiological networks (MEG) and hemodynamic networks (fMRI).

In general, care should be taken when drawing conclusions from MEG studies. Establishing static and dynamic functional connectivity from MEG data remains challenging, due to the strong ill-posedness of the inverse problem. It is known that other variables, such as the choice of the frequency band or the choice of the connectivity metric can influence the analysis. While the choice of the neural oscillatory frequency band could be seen as an additional parameter in MEG functional connectivity studies, there is no general agreement on the choice of the connectivity metrics (Gross et al 2013). It is important to highlight that in this paper we focus on methodological contributions to the specific problem of reconstructing and representing indirectly observed functional images and covariance functions.

7. Discussion

In this work we introduce a general framework for the reconstruction and representation of covariance operators in an inverse problem context. We first introduce a model for indirectly observed functional images in an unconstrained space, which outperforms the naive approach of solving the inverse problem individually for each sample. This model plays an important role in the case of samples that are indirectly observed covariance functions, and thus constrained to be positive semidefinite. We deal with the non-linearity introduced by such constraint by working with unconstrained representations, yet incorporating spatial information in their estimation. The proposed methodology is finally applied to the study of brain connectivity from the signals arising from MEG scans.

The models proposed here can be extended in many interesting directions. From an applied prospective, it is of interest to apply them to different settings, not necessarily involving neuroimaging, where studying second order information has been so far prohibitive. Direct examples are second order analysis of the dynamics of meteorological observations, such as temperature. Another possible application is the study of the dynamics of ocean currents, where the irregularity of the spatial domain, and its complex boundaries, can be easily accounted for thanks to the manifold representation approach in our models.

From a modeling point of view, it is of interest to take a step further toward the integration of the inverse problems literature with the approach we adopt in this paper. For instance, penalization terms that have been shown to be successful in the inverse problems literature, e.g. total variation penalization, could be introduced in our models.

Acknowledgments

The authors would like to thank the anonymous reviewers and the member of the Editorial Board for their useful and constructive comments.

Appendix A.: Discrete solutions

Proof of proposition 1. We want to find a minimizer $\hat{f}\in {H}^{2}\left(\mathcal{M}\right)$ , given z with ||z|| = 1, of the objective function in (7):

$\begin{align}\hfill & \sum _{l=1}^{m}{{\Vert}{\mathbf{y}}_{l}-{z}_{l}{K}_{l}{\Psi}f{\Vert}}^{2}+\lambda {\mathbf{z}}^{T}\mathbf{z}{\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f\hfill \\ \hfill & \propto {\left({\Psi}f\right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\right){\Psi}f-2{\left({\Psi}f\right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}\right)+\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f.\hfill \end{align} \tag{ 21 }$

An equivalent formulation of a minimizer $\hat{f}\in {H}^{2}\left(\mathcal{M}\right)$ of such objective function is given by satisfying the equation

$\begin{equation}{\left({\Psi}\varphi \right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\right){\Psi}\hat{f}+\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}\varphi {{\Delta}}_{\mathcal{M}}\hat{f}={\left({\Psi}\varphi \right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}\right)\end{equation} \tag{ 22 }$

for every $\varphi \in {H}^{2}\left(\mathcal{M}\right)$ (see Braess 2007, chapter 2). Moreover, such minimizer is unique if $A\left(\varphi ,f\right)={\left({\Psi}\varphi \right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\right){\Psi}f+\lambda {\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}\varphi {{\Delta}}_{\mathcal{M}}f$ is positive definite. Given that for a closed manifold $\mathcal{M}$ , ${\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f=0$ iff f is a constant function (Dziuk and Elliott 2013), the positive definiteness condition is equivalent to assuming that $\text{ker}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\right)$ , the kernel of $\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}$ , does not contain the subspace of p-dimensional constant vectors.

Moreover, we can reformulate equation (22) in a form that involves only first-order derivatives by integration by parts against a test function. We then look for a solution in the discrete space $V\subset {H}^{1}\left(\mathcal{M}\right)$ , i.e. finding $\hat{f},g\in V$

$\begin{equation}\begin{cases}{\left({\Psi}\varphi \right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\right){\Psi}\hat{f}+\lambda {\int }_{\mathcal{M}}{\nabla }_{\mathcal{M}}\varphi \cdot {\nabla }_{\mathcal{M}}g={\left({\Psi}\varphi \right)}^{T}\left(\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}\right)\hfill \\ {\int }_{\mathcal{M}}{\nabla }_{\mathcal{M}}\hat{f}\cdot {\nabla }_{\mathcal{M}}w-{\int }_{\mathcal{M}}gw=0\hfill \end{cases}\end{equation} \tag{ 23 }$

for all φ, w ∈ V. The operator ${\nabla }_{\mathcal{M}}$ is the gradient operator on the manifold $\mathcal{M}$ . The gradient operator ${\nabla }_{\mathcal{M}}$ is such that $\left({\nabla }_{\mathcal{M}}w\right)\left(v\right)$ , for w a smooth real function on $\mathcal{M}$ and $v\in \mathcal{M}$ , takes value on the tangent space at v. We denote with ⋅ the scalar product on the tangent space.

We recall here the definition of the κ × κ matrices to be ${\left(M\right)}_{j{j}^{\prime }}={\int }_{{\mathcal{M}}_{\mathcal{T}}}{\phi }_{j}{\phi }_{{j}^{\prime }}$ and ${\left(A\right)}_{j{j}^{\prime }}={\int }_{{\mathcal{M}}_{\mathcal{T}}}{\nabla }_{{\mathcal{M}}_{\mathcal{T}}}{\phi }_{j}\cdot {\nabla }_{{\mathcal{M}}_{\mathcal{T}}}{\phi }_{{j}^{\prime }}$ . Note that requiring (23) to hold for all φ, w ∈ V is equivalent to requiring that (23) holds for all φ, w that are basis elements of V, thus exploiting the basis expansion formula (11) we can characterize (23) with the solution of the linear system

$\begin{equation}\left[\begin{bmatrix}{cc}\hfill \sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}\hfill & \hfill \lambda A\hfill \\ A\hfill & \hfill -M\hfill \end{bmatrix}\right]\left[\begin{bmatrix}{c}\hfill \hat{\mathbf{c}}\hfill \\ \hfill \hat{\mathbf{q}}\hfill \end{bmatrix}\right]=\left[\begin{bmatrix}{c}\hfill \sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}\hfill \\ \mathbf{0}\hfill \end{bmatrix}\right],\end{equation} \tag{ 24 }$

where $\hat{\mathbf{c}}$ and $\hat{\mathbf{q}}$ are the basis coefficients of f ∈ V and g ∈ V, respectively. Solving (24) in $\hat{\mathbf{c}}$ leads to

$\begin{equation}\left(\sum _{l=1}^{m}{z}_{l}^{2}{K}_{l}^{T}{K}_{l}+\lambda A{M}^{-1}A\right)\hat{\mathbf{c}}=\sum _{l=1}^{m}{z}_{l}{K}_{l}^{T}{\mathbf{y}}_{l}.\end{equation} \tag{ 25 }$

□

Proof of proposition 2. We want to find a minimizer $\hat{f}\in {H}^{2}\left(\mathcal{M}\right)$ , given {z_i} with ${\sum }_{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}=1$ , of the objective function in (14):

$\begin{align}\hfill & \sum _{i=1}^{n}{{\Vert}{S}_{i}^{1/2}-{\mathbf{z}}_{i}{\left({K}_{i}{\Psi}f\right)}^{T}{\Vert}}^{2}+\lambda \sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{\int }_{\mathcal{M}}{{\Delta}}_{\mathcal{M}}^{2}f\hfill \\ \hfill & \propto {\left({\Psi}f\right)}^{T}\left(\sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{K}_{i}^{T}{K}_{i}\right){\Psi}f-2{\left({\Psi}f\right)}^{T}\sum _{i=1}^{n}{K}_{i}^{T}{S}_{i}^{\frac{T}{2}}{\mathbf{z}}_{i}.\hfill \end{align} \tag{ 26 }$

Comparing (26) with (21) it is evident that by following the same steps of the proof of proposition 1 we obtain the desired result, which is

$\begin{equation*}\hat{\mathbf{c}}={\left(\sum _{i=1}^{n}{{\Vert}{\mathbf{z}}_{i}{\Vert}}^{2}{K}_{i}^{T}{K}_{i}+\lambda A{M}^{-1}A\right)}^{-1}\sum _{i=1}^{n}{K}_{i}^{T}{S}_{i}^{\frac{T}{2}}{\mathbf{z}}_{i}.\end{equation*}$

□

Appendix B.: Application—additional material

Here we present further material complementing the analysis in section 6. In figure B.1 we show the amplitude envelope computed from a filtered version of a signal detected by an MEG sensor. The covariance of the amplitude envelopes across different sensors is the measure of connectivity used in this work.

In figure B.2 we show the L-curve plots associated with the PC covariance models applied to the dynamic and multi-subject functional connectivity studies.

**Figure B.2.** Plots of the regularity of the first R = 10 PC covariance functions, measured as ${\sum }_{r=1}^{10}{\int }_{\mathcal{M}}{{\Vert}\nabla {\boldsymbol{\psi }}_{r}{\Vert}}^{2}$ versus the residual norm in the data, for different choices of log(λ). On the left panel, the plot refers to the dynamic connectivity study, on the right panel the plot of the multi-subject connectivity study.
Download figure:
Standard image High-resolution image

**Figure B.3.** Energy maps of the estimated ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2, with lower regularization (λ = 10), to the covariance matrices computed from the MEG resting state data of a single subject on n = 40 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps.
Download figure:
Standard image High-resolution image

In figures B.3 and B.4 we show respectively the plots of the estimated PC covariance functions and associated variances from the dynamic functional connectivity study on n = 40 segments with regularization parameter λ = 10.

In figures B.5 and B.6 we show the estimated PC covariance functions and associated variances from the dynamic functional connectivity study on n = 80 time segments with regularization parameter λ = 10².

**Figure B.4.** Plots of the segment-specific variances of the first R = 10 PC covariance functions in time when a smaller regularization parameter is chosen (λ = 10).
Download figure:
Standard image High-resolution image

**Figure B.5.** Energy maps of the estimated ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2, with λ = 10², to the covariance matrices computed from the MEG resting state data of a single subject on n = 80 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps.
Download figure:
Standard image High-resolution image

**Figure B.6.** Plots of the segment-specific variances of the first R = 10 PC covariance functions in time, with λ = 10², when the MEG resting state data is split into n = 80 consecutive time intervals.
Download figure:
Standard image High-resolution image

**Figure B.7.** Energy maps of the estimated ${\hat{\boldsymbol{\psi }}}_{1}$ , ${\hat{\boldsymbol{\psi }}}_{2}$ and ${\hat{\boldsymbol{\psi }}}_{3}$ obtained by applying algorithm 2, with lower regularization (λ = 10), to the covariance matrices computed from the MEG resting state data of n = 40 different subjects. On the right panel, the covariance functions associated with these energy maps.
Download figure:
Standard image High-resolution image

**Figure B.8.** Plots of the subject-specific variances associated with the first R = 10 PC covariance functions computed from n = 40 subject, with regularization parameter λ = 10.
Download figure:
Standard image High-resolution image

In figures B.7 and B.8 we show the estimated PC covariance functions and associated variances from the multi-subject functional connectivity study on n = 40 subjects with regularization parameter λ = 10.

Representation and reconstruction of covariance operators in linear inverse problems

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Motivating application—functional connectivity

2. Mathematical description of the problem

2.1. The case of indirectly observed functions

2.2. The case of indirectly observed covariance operators

3. Principal components of indirectly observed functions

3.1. Model

3.2. Algorithm

3.3. Eigenfunctions of indirectly observed covariance operators

4. Reconstruction and representation of indirectly observed covariance operators

4.1. A subject-specific model

4.2. A population model

4.3. Algorithm

5. Simulations

5.1. Indirectly observed functions

5.2. Indirectly observed covariance functions

6. Application

6.1. Pre-processing

6.2. Analysis

7. Discussion

Acknowledgments

Appendix A.: Discrete solutions

Appendix B.: Application—additional material

Footnotes

Representation and reconstruction of covariance operators in linear inverse problems

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Motivating application—functional connectivity

2. Mathematical description of the problem

2.1. The case of indirectly observed functions

2.2. The case of indirectly observed covariance operators

3. Principal components of indirectly observed functions

3.1. Model

3.2. Algorithm

3.3. Eigenfunctions of indirectly observed covariance operators

4. Reconstruction and representation of indirectly observed covariance operators

4.1. A subject-specific model

4.2. A population model

4.3. Algorithm

5. Simulations

5.1. Indirectly observed functions

5.2. Indirectly observed covariance functions

6. Application

6.1. Pre-processing

6.2. Analysis

7. Discussion

Acknowledgments

Appendix A.: Discrete solutions

Appendix B.: Application—additional material

Footnotes