| J. Phys. A: Math. Theor. 41 No 21 (30 May 2008) 1-7 |
| doi:10.1088/1751-8113/41/21/212002 |
| PII: S1751-8113(08)75362-9 |
An interlacing theorem for reversible Markov chains
Robert Grone1, Karl Heinz Hoffmann2 and Peter Salamon1
1 Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182-7720, USA
2 Institut für Physik, Technische Universität Chemnitz, D-09107 Chemnitz, Germany
Received 12 March 2008, in final form 18 April 2008
Published 7 May 2008
| Abstract. Reversible Markov chains are an indispensable tool in the modeling of a vast class of physical, chemical, biological and statistical problems. Examples include the master equation descriptions of relaxing physical systems, stochastic optimization algorithms such as simulated annealing, chemical dynamics of protein folding and Markov chain Monte Carlo statistical estimation. Very often the large size of the state spaces requires the coarse graining or lumping of microstates into fewer mesoscopic states, and a question of utmost importance for the validity of the physical model is how the eigenvalues of the corresponding stochastic matrix change under this operation. In this paper we prove an interlacing theorem which gives explicit bounds on the eigenvalues of the lumped stochastic matrix. PACS numbers: 02.50.Ga, 02.70.Tt, 02.10.Yn, 82.20.Wt |
1. Introduction
Reversible Markov chains are chains that satisfy the detailed balance condition (see equation (3) below). Their importance and introduction to the physics literature dates back to the work of Ludwig Boltzmann (1887) who showed that this class of Markov chains is the only class to be used for almost all physical processes at the molecular level. Later, through the work of Lars Onsager (1931), the condition also became known as the principle of microscopic reversibility [1, 2]. The importance of this class of Markov chains was further increased by the introduction of the Metropolis algorithm (1953) which is the basis for essentially all Monte Carlo simulations of physical systems [3, 4]. This algorithm makes all the `kinetic factors' [5, 6] as large as possible consistent with detailed balance and a given sparsity matrix, thereby creating a designer Markov chain which has a desired stationary distribution and relaxes as quickly as possible. More recently (1984) this fact has been recognized to be much more generally useful and is the basis of Gibbs sampling and the Metropolis–Hastings algorithm exploited in the family of Markov chain Monte Carlo methods (MCMC) now in widespread use for many types of statistical simulations [7, 8].
Another manifestation of reversible Markov chains arises from random walks on a simple graph of n vertices. Let
denote the adjacency matrix of a graph G. By a random walk on G, we mean a sequence of steps between adjacent vertices where each adjacent vertex is equally likely for the next step. This describes a Markov chain with transition probability matrix
, where D is diagonal with
being the degree of vertex i for all
. As we state in proposition 2 below, reversible Markov chains have matrices which are diagonally similar to symmetric matrices and thus many facts about symmetric matrices apply. The random walk matrix,
, is similar to
, which is clearly symmetric.
As a final example, we consider the case when the transition matrix is symmetric. This also corresponds to a Markov chain which is reversible as is immediate from condition (3) below. In this case the matrix is bistochastic (rows and columns sum to one) and the stationary distribution is uniform.
Our main result is an interlacing theorem. Given real numbers
and
, we say β's interlace λ's if
The choice of the nomenclature is clearest for the case
, for which equation (1) assumes the form
The most well-known example of an interlacing theorem is the Cauchy interlacing theorem which states that the eigenvalues of an
principal submatrix of a symmetric
matrix
interlace the eigenvalues of
[9–11].
In this paper we consider a reversible Markov chain on n states with (irreducible) transition matrix M. We prove an interlacing theorem between the eigenvalues of M and the lumped Markov chain obtained by partitioning the states of M into disjoint subsets and considering transitions among these subsets [12–15]. Such lumped chains associated with a given Markov chain and a partition of its states have surfaced in a number of contexts recently under various names. Jerrum et al [16] refer to the lumped chain as the projection chain and combine Poincaré and log-Sobolev constants of the lumped chain with restriction chains on each of the sets in the partition to find Poincaré and log-Sobolev constants of the unlumped chain. These constants characterize non-asymptotic rates of convergence associated with the degree of mixing. Lumped chains have also been called aggregate chains [17] in the context of nearly completely decomposable (NCD) chains. Haemers [11] considers a closely related concept called the quotient matrix of a symmetric matrix and a partition in the context of interlacing theorems for combinatorics on graphs. In the physical literature, lumped processes have a long history under the name of coarse graining. Physical descriptions that forego many of the details of the real dynamics by lumping together collections of microstates into mesoscopic states are an old idea which is often referred to as coarse graining. Such coarse graining is a crucial ingredient in the modern description of many physical processes [18]. In particular, it is an indispensable tool for the study of complex energy landscapes [19]. Our result shows that such coarse graining can only speed up relaxation on these landscapes.
Our interlacing theorem follows from the special property of reversible Markov chains mentioned above that their transition probability matrices are diagonally similar to a symmetric matrix. This fact enables us to map our problem onto the Cauchy interlacing theorem. Thus the resulting interlacing theorem for reversible Markov chains may be viewed as a corollary to the strengthened form of the Cauchy interlacing theorem found in [10, 11].
There is a large literature concerning bounds on the second largest modulus of an eigenvalue in a Markov chain [20]. This modulus is known as the coefficient of ergodicity of the chain and is given by
. Closely related to this is the spectral gap
of the stochastic matrix. The interest derives from the fact that such bounds characterize the asymptotic rate of convergence of the chain to its stationary distribution. There is also a large literature concerning bounds on non-asymptotic convergence measures characterizing various degrees of mixing in the chain [16, 21–23]. The statement and proof of the fact that the lumped chain's eigenvalues interlace the eigenvalues of the full chain is presented here for the first time.
2. Reversible Markov chains
Let M be an
column stochastic matrix of a regular Markov chain, i.e. all entries non-negative, all columns equal to one and there exists a positive integer k such that
has all positive entries. The Markov chain with transition matrix M is said to be reversible [12, 13] iff there exists a non-zero vector
such that
If such a vector exists, it follows by summing (3) over
that
must be the eigenvector of M with eigenvalue 1. Thus without loss of generality, we may assume that
has been properly normalized, i.e. that
and
. In other words,
corresponds to the stationary distribution of the Markov chain corresponding to M. Note that the regularity condition assures that
is a simple eigenvalue of M. As discussed above, condition (3) is also called detailed balance or microscopic reversibility in the physical literature [18]. If the chain induced by M is reversible, then we will also say that M is reversible. The following two propositions are easily shown.
Proposition 1. Let
. Then M is reversible iff
is symmetric.
Proposition 2. If M is reversible, then M has all real eigenvalues, say
3. Lumping states
Our interest is in the lumped matrix corresponding to an arbitrary partition of the states of the chain. To keep the exposition as transparent as possible, we begin with a partition whose subsets include only one doubleton and the rest all singletons. By iteratively lumping two states at a time, we can reach any desired partition and thus our procedure is perfectly general besides being more transparent.
Now consider lumping the two states in the doubleton (say states 1 and 2). If we start with the Markov chain with transition matrix M, the corresponding column stochastic matrix
of the lumped chain that results by lumping the first two states of M into one state is obtained as follows:
| (i) | add row 1 of M to row 2 of M, and then delete row 1 to form
|
| (ii) | replace column 2 of
where
and then delete column 1 from
|
It is clear that
is
and column stochastic. In matrix terms
where C is the
collecting matrix
and D is the
distributing matrix
where
and
.
Proposition 3.
is reversible with stationary distribution
.
Proof. It is clear that for
condition (3) is satisfied for
and
. Let
and
. Then from the reversibility of M and the definition of

![]()
4. The interlacing theorem
Let M be an
column stochastic matrix corresponding to a reversible Markov chain, and let
be formed as previously described. Then
corresponds to a reversible Markov chain with stationary distribution
. It follows that
has all real eigenvalues, say
Our main result is the following.
Theorem. The eigenvalues of
interlace those of
One implication of our theorem is that
converges to its limit faster than
since convergence is determined by the rates at which
and
converge to zero, where
and
Proof. Our proof relies on a variant of the Cauchy interlacing theorem which states that if
is an
symmetric matrix and Q is an
matrix with orthonormal columns, then the eigenvalues of
interlace the eigenvalues of
. This variant follows by completing Q to an orthogonal matrix
and noting that
is a principal submatrix of
.
Recall that
is symmetric as is
, where we have set
. Inserting the definition of
, we have
Defining
, we calculate directly that
Similarly, setting
, we get
We note that
and
. Thus the variant of the Cauchy interlacing theorem stated above applies and the eigenvalues of
interlace those of M. Since M and
are both stochastic matrices, the largest eigenvalues are both equal to 1. ![]()
Corollary. When we lump a reversible Markov chain to a partition with m subsets, the eigenvalues of the lumped chain interlace the eigenvalues of the original chain according to
The proof follows from the fact that lumping
states at once gives the same chain as iterated lumping.
5. Conclusions
The main result of this work is an interlacing theorem for reversible Markov chains that results by lumping states of a bigger chain. One important implication of our theorem is that coarse graining can only increase the rate of relaxation toward equilibrium. More generally, our theorem provides a solid tool for the analysis of a large class of Markov models in physics and other areas. It gives strict bounds on the relaxation dynamics for a multitude of processes which can be used as coarse grained or lumped approximations of the underlying process. The realm of irreversible decay toward an equilibrium or stationary state rests on the use of Markov chain descriptions. With typically
particles per cubic centimeter and the corresponding number of degrees of freedom, it is clear that coarse grained descriptions which reduce the dimensionality of the problem are necessary. Our theorem provides direct bounds on the eigenvalues of the lumped problem, and thus provides a rigorous proof of limits for the underlying dynamics. Note that our theorem bounds eigenvalues of the lumped dynamics from above and below. Depending on the structure of the original system these could be quite restrictive thus giving a small range of possible time scales for the lumped dynamics.
For lumping many states, the relevant eigenvalues might change considerably. On the other hand, for cases where symmetries in the system lead to the dynamical equivalence of states, the eigenvalues might change very little. In some disordered physical systems Markov chain techniques have revealed certain `dynamical degeneracies' of states [24]. The behavior of such systems under lumping is a highly interesting yet largely unexplored field.
For physical understanding of the processes studied using the Markov chain, one typically wants to choose the subsets in the partition so as to disrupt the relaxation spectrum as little as possible. How this can be done generally is an open problem; how to carry this out for concrete examples is usually based on physical reasoning. For rapid convergence of MCMC algorithms on the other hand, the aim is the opposite—we want to find a version of the algorithm with fastest convergence. For non-asymptotic convergence, some progress along these lines has been realized [16, 23].
ReferencesRobert Grone et al 2008 J. Phys. A: Math. Theor. 41 212002
Oscar Martinez et al 2009 ApJ 705 L172
M T Batchelor et al 2008 J. Phys. A: Math. Theor. 41 352002
G. Luzzi et al. 2009 ApJ 705 1122
Jean-Bernard Zuber 2008 J. Phys. A: Math. Theor. 41 382001
Szymon P Malinowski et al 2008 New J. Phys. 10 075020
J. E. Chambers 2009 ApJ 705 1206
Preeti Parashar and Swapan Rana 2009 J. Phys. A: Math. Theor. 42 462003
Fabrizio Lillo et al 2008 New J. Phys. 10 043019
T. D. Arber et al. 2009 ApJ 705 1183