Folding kinetics of proteins with multiple domains: inference of transition rates from extinction times

Yingxiang Zhou; Pak-Wing Fok

doi:10.1088/2399-6528/aacca5

1. Introduction

The prediction of the native conformation of a protein, given its amino acid sequence is one of the great open problems in biophysics [1–5]. Since the 1990s [6, 7], techniques such as Atomic Force Microscopy (AFM) [8] and Förster Resonance Energy Transfer (FRET) [9, 10] have allowed experimentalists to explore the relationship between macromolecular structure and folding/unfolding rates.

A computational approach to protein folding is to implement large-scale all-atom molecular dynamics (MD) simulations [11]. The folding of a macromolecule is represented by tracking the positions of every single atom in the molecule and the result of the simulation is a trajectory in a high-dimensional state-space. A more conceptual way to understand protein folding is to introduce a reaction coordinate [12] which effectively maps the high-dimensional space onto a single scalar that measures the progress of the folding. The progress along the reaction coordinate is modulated by a free energy 'landscape' [13] that may exhibit multiple minima, corresponding to multiple metastable configurations. Inferring the shape of these landscapes from quantities such as extinction times [14], rupture forces [15] or time-displacement trajectories [16] remains a challenging theoretical problem.

One of the most common ways of probing the energy landscape is through AFM [17]. In AFM experiments, one end of the molecule is tethered to a movable platform and the other is attached to a cantilever tip: see figure 1(a). Small deflections of the cantiliever are detected using a laser-photodiode setup. The AFM can operate in several ways. One protocol is 'force-ramp' mode where the platform lowers at a constant speed. As a protein domain is coercively unfolded, the cantilever deflection increases until a critical platform position is reached. Beyond this point, the cantilever quickly relaxes, corresponding to domain rupture. The resulting force-extension curve allows quantification of the 'entropic elasticity' of that particular domain. The procedure can also be performed sequentially if multiple domains are present in a large protein [18, 19]. The discrete event corresponding to rupture is interesting both physically and mathematically. Experiments show that rupture does not always occur at the same force. Furthermore, the rupture force distribution shifts towards higher values when the platform speed is larger. Both of these observations are in stark contrast to mechanical bonds that break at a single yield stress. They point towards a model of bond-breaking that is based on 'thermally activated escape,' i.e. a theory of random walks.

**Figure 1.** (a) Experimental schematic for Atomic Force Microscopy of proteins. Deflection of a soft cantilever is detected using a laser-photodiode setup. (b) Possible time trace of reaction coordinate for a two-state protein. (c) Possible trace of reaction coordinate for a three-state protein. Extinction times τ₁ and τ₂ along with maximal sites 2 and 1 respectively are used in this paper for inference. a.u. = arbitrary units.
Download figure:
Standard image High-resolution image

Besides force-ramp mode, another protocol is to keep the AFM platform stationary and operate in 'force-clamp' mode. Under this mode of operation, one focuses on deflections of the cantilever which essentially provide the reaction coordinate as a function of time. Some possible time traces are shown in figures 1(b) and (c). The protein spends most of its time in metastable configurations (when the reaction coordinate is an integer) and very little time in-between these states. Figure 1(b) shows the trace for a simple protein with a single folding domain that can be in one of two states: folded or unfolded. The kinetics in this case are well-described by two exponential distributions [20]; one for the $1\to 0$ transition and the other for the $0\to 1$ transition. The half-lives or rate constants associated with each exponential distribution can easily be inferred from the time trace in figure 1(b). For proteins with multiple domains the traces can be more complex and could resemble figure 1(c). If one assumes exponential kinetics as before (i.e. the transition between states always follows an exponential distribution, but the parameters of the distribution could be state-dependent) the resulting stochastic process is a birth-death chain. These chains correspond to energy landscapes with multiple metastable states: see figure 2.

**Figure 2.** (a) Two-state and (b) Four-state Markov models for protein folding dynamics. Small proteins are usually described by a two-state model. Larger proteins may have multiple metastable states giving rise to longer birth-death chains.
Download figure:
Standard image High-resolution image

An extinction time is the time taken for the reaction coordinate to start at 1 and reach 0 for the first time, corresponding to the state where all domains are folded. In figure 1(c), the measurement of τ_j starts when the reaction coordinate reaches 1 for the first time. For each excursion, one can also define the maximal site n_j that is reached before extinction occurs. After suitable processing of the signal, a single time trace generates many pairs (τ_j, n_j), j = 1, ..., M and M ≫ 1. In this paper, we show how to recover all the transition rates of the birth-death chain from {τ_j} and {n_j}. How to detect jumps in the reaction coordinate, that define when transitions between states have occurred, is a separate issue and beyond the scope of this paper.

1.1. Energy landscape inference: existing methods

In this section, we briefly describe some of the existing methods used to interpret AFM data. One of the earliest methods is Bell's method [21] which assumes a force-dependent rate ansatz (of Arrhenius form) for the state of a domain or chemical bond. Providing that the force on the protein depends on the instantaneous deflection in the cantilever, Bell's method predicts rupture force distributions and survival probabilities for a chemical bond under the action of a force ramp. The method is used to solve the 'forward' problem in the sense that rupture distributions are predicted from a potential well whose shape is known. In contrast, Dudko and co-authors [22] and Freund [23] essentially solved the inverse problem by inferring rate constants, features of the potential well and other related parameters from rupture force distributions.

Chang and co-authors [16] utlilized a path integral method that takes trajectory data (time traces) as input rather than force distributions. Based on non-parametric Bayesian estimation, the method makes very few assumptions about the underlying energy landscape and is able to simultaneously infer the energy landscape and an effective spatially-dependent diffusivity for the reaction coordinate.

Most of the above methods are mainly concerned with the AFM operating in force-ramp mode. However, in works such as [14, 24], the authors develop methodology to extract energy landscapes from data generated by AFMs in force-clamp mode. They treat the reaction coordinate using a Smoluchowksi framework to infer features of the energy landscape; this type of analysis dates back to Kramers's classic transition state theory [25] for chemical reactions. Finally, it is worth mentioning that if the reaction coordinate is treated as a Brownian random walker on an energy landscape, estimating the parameters of the resulting stochastic process from sample paths is a classic problem in statistics and control theory [26, 27].

The method described in this paper is different because from the outset, we assume that the underlying stochastic process is a birth-death chain (and subsequently, transitions are always exponentially distributed). Estimation of parameters from sample paths of a birth-death chain is a classic problem [28, 29]. However, our goal is to estimate transition rates 'mainly' from extinction times. Unfortunately, as discussed below using only extinction times results in a severely ill-posed problem. Having access to maximal site data turns out to render the inference problem much better-posed.

The inclusion of maximal sites is important. The calculation of transition rates in a birth-death chain purely from extinction times essentially reduces to finding the best-fit coefficients and exponents to a given extinction time distribution. Such problems are highly ill-posed: a small amount of noise added to the curve can lead to a large change in the best-fit coefficients/exponents. Nevertheless, because fitting exponential modes to given data is one of the most commonly-arising inverse problems, it has a long history and has been investigated by many researchers: see for example [30–33] and references within.

2. Governing equations for the birth death process

We now present a method to compute the transition rates of a birth death process from its extinction times. Our stochastic model takes the form of a random walker on a finite integer lattice: $\{0,1,\,\ldots ,\,N\}$ . The birth and death rates at site n are λ_n and μ_n respectively. We assume the particle starts at site n = 1 and the process terminates (goes extinct) when it reaches site 0. For each trajectory, we record the extinction time and the maximal site reached (i.e. the largest site number it attains before reaching site 0). From this data, we wish to infer the $2N-1$ rates ${\lambda }_{1},\,\ldots ,\,{\lambda }_{N-1}$ and μ₁, ..., μ_N.

Consider a birth-death process on a lattice with sites labeled $\{0,1,2,\,\ldots ,\,N\}$ , where N is finite but unknown: see figure 3. A particle starts at site 1 and executes a random walk which we write as X(t): X(t) is a random walk on the non-negative integers. At site i, the rightward (leftward) hopping rate is λ_i (μ_i). When the particle reaches site 0, we record the time of exit. When this experiment is repeated many times, we may use the resulting data to compute the extinction time distribution W(t).

**Figure 3.** Birth death chain with N + 1 sites, with ${\lambda }_{N}=0$ . Our algorithm uses the extinction times of the process given that the positions of particles never exceed site n (always remain in the dashed box).
Download figure:
Standard image High-resolution image

The matrix that determines this process is defined as

$\begin{eqnarray}{A}^{(N)}=\left[\begin{array}{cccccc}-({\lambda }_{1}+{\mu }_{1}) & {\mu }_{2} & & & & \\ {\lambda }_{1} & -({\lambda }_{2}+{\mu }_{2}) & {\mu }_{3} & & & \\ & {\lambda }_{2} & -({\lambda }_{3}+{\mu }_{3}) & {\mu }_{4} & & \\ & & \ddots & \ddots & \ddots & \\ & & & {\lambda }_{N-2} & -({\lambda }_{N-1}+{\mu }_{N-1}) & {\mu }_{N}\\ & & & & {\lambda }_{N-1} & -({\lambda }_{N}+{\mu }_{N})\end{array}\right]\end{eqnarray} \tag{ 2.1 }$

and λ_N = 0 . If ${P}_{k}(t)={\mathbb{P}}[X(t)=k]$ , then the ${\bf{P}}={[{P}_{1},\ldots ,{P}_{N}]}^{T}$ satisfy the forward master equation

$\begin{eqnarray}&&\dot{{\bf{P}}}={A}^{(N)}{\bf{P}},\quad {\bf{P}}(0)={[1,0,...,0]}^{T},\end{eqnarray} \tag{ 2.2 }$

and it is a simple matter to find W(t) from P₁(t) (see section 2.1).

If N is known, finding all the transition rates ${\{{\lambda }_{i},{\mu }_{i}\}}_{1\leqslant i\leqslant N}$ from W(t) amounts to an exponential fitting problem, which is very ill-conditioned [34]. To overcome this difficulty, we assume that for every trajectory X_j(t) we also record the maximal site of the particle, n_j. The maximal site for the random walker is simply the largest site number that it attains before exiting. By grouping trajectories according to their maximal site and computing the statistics of the extinction times of each group, we are able to accurately infer transition rates for birth death chains of length 8–11 from about 5 × 10⁷ extinction times with relative error of a few percent.

To be more precise, if X_j(t) is the jth trajectory, then τ_j is the jth extinction time defined as $\inf \{\tau :{X}_{j}(\tau )=0\}$ , ${n}_{j}={\max }_{0\leqslant t\leqslant {\tau }_{j}}{X}_{j}(t)$ is the maximal site of the jth trajectory and ${S}_{n}=\{{\tau }_{j}:{n}_{j}\leqslant n\}$ is the set of extinction times corresponding to trajectories whose maximal site does not exceed n. Then for a finite sample of trajectories, we have ${S}_{1}\subseteq {S}_{2}\subseteq \ldots \subseteq {S}_{N}$ . The algorithm that we propose takes as input $| {S}_{n}|$ and ${\bar{S}}_{n}$ (cardinality and mean of S_n) to infer λ_n and μ_n for each n.

Random walkers that exit from the dashed box in figure 3 can exit at site 0 or n + 1. We informally call the times that correspond to exit at site 0 (n + 1) 'left' ('right') extinction times. Then, the distribution of extinction times for the birth-death process, conditioned on trajectories not exceeding site n is identical to the left extinction time distribution out of the sublattice {1, ..., n}. By finding finding analytical expressions for the moments of this distribution and matching them to the observed moments, we may infer the transition rates on the lattice. This forms the basis of our method.

2.1. Extinction times and probability fluxes

We assume that sites 0 and n + 1 are absorbing in the sense that if the particle reaches site 0 or n + 1 ('exits'), it stays at these sites for all time. We use the superscript n to distinguish the subproblem from the entire chain. Define the probability that the random walker is at site k at time t as

$\begin{eqnarray}&&{P}_{k}^{(n)}(t)={\mathbb{P}}[X(t)=k].\end{eqnarray} \tag{ 2.3 }$

If we take the n × n leading principal submatrix of ${A}^{(N)}$ , then it follows that the conditional probabilities ${{\bf{P}}}^{(n)}={[{P}_{1}^{(n)},{P}_{2}^{(n)},\ldots ,{P}_{n}^{(n)}]}^{T}$ satisfy the forward master equations

$\begin{eqnarray}&&{\dot{{\bf{P}}}}^{(n)}={A}^{(n)}{{\bf{P}}}^{(n)},\quad {{\bf{P}}}^{(n)}(0)={{\bf{e}}}_{1}^{(n)}=\mathop{\underbrace{{[1,0,...,0]}^{T}}}\limits_{n\ \mathrm{elements}},\end{eqnarray} \tag{ 2.4 }$

for 1 ≤ n ≤ N. Equation (2.4) is the starting point for the reconstruction process.

Now we introduce the two random variables

${E}^{(n)}\in \{0,n+1\}$ , a binary random variable which represents the exit site of the random walker:
$\begin{eqnarray}&&{\mathbb{P}}[{E}^{(n)}=0]={{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 2.5 }$

$\begin{eqnarray}&&{\mathbb{P}}[{E}^{(n)}=n+1]={{\rm{\Pi }}}_{* }^{(n)},\end{eqnarray} \tag{ 2.6 }$
and ${{\rm{\Pi }}}^{(n)}+{{\rm{\Pi }}}_{* }^{(n)}=1$ .
T⁽ⁿ⁾, the extinction time of the random walker, defined to be the time at which the walker arrives either at site 0 or n + 1 for the first time. Conditioning on E⁽ⁿ⁾, let the density of extinction times T⁽ⁿ⁾ be ${w}_{L}^{(n)}(t)$ and ${w}_{R}^{(n)}(t)$ :
$\begin{eqnarray}&&P(t\leqslant {T}^{(n)}\leqslant t+{dt}| {E}^{(n)}=0)={w}_{L}^{(n)}(t){dt},\end{eqnarray} \tag{ 2.7 }$

$\begin{eqnarray}&&P(t\leqslant {T}^{(n)}\leqslant t+{dt}| {E}^{(n)}=n+1)={w}_{R}^{(n)}(t){dt},\end{eqnarray} \tag{ 2.8 }$
and we let ${W}_{L}^{(n)}(t)$ and ${W}_{R}^{(n)}(t)$ be the corresponding CDFs.

By equation (2.5), ${{\rm{\Pi }}}^{(n)}$ is strictly increasing with n. Now we show how extinction times are related to probability fluxes. The probability of the particle being at site 0 at time t + dt is given by

$\begin{eqnarray}&&{P}_{0}^{(n)}(t+{dt})={P}_{1}^{(n)}(t){\mu }_{1}{dt}+{P}_{0}^{(n)}(t)\times 1.\end{eqnarray} \tag{ 2.9 }$

Note that the probability of hopping left in time dt from site 1 is μ₁dt and the probability of staying at site 0 in time dt is 1 since site 0 is absorbing. Equation (2.9) implies that

$\begin{eqnarray}&&\displaystyle \frac{{{dP}}_{0}^{(n)}(t)}{{dt}}={\mu }_{1}{P}_{1}^{(n)}(t).\end{eqnarray} \tag{ 2.10 }$

Lemma 1. The flux out of site 1, ${\mu }_{1}{P}_{1}^{(n)}(t)$ , and the left extinction time density ${w}_{L}^{(n)}(t)$ are related through

$\begin{eqnarray}&&{\mu }_{1}{P}_{1}^{(n)}(t)={w}_{L}^{(n)}(t){{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 2.11 }$

where μ₁ is the death rate from site 1 and Π⁽ⁿ⁾ is defined in equation (2.5).

Proof. If the random walker is at site 0 at time t, it must have arrived there either at t or before. Then

$\begin{eqnarray}&&{P}_{0}^{(n)}(t)={\mathbb{P}}[{T}^{(n)}\leqslant t,{E}^{(n)}=0],\end{eqnarray} \tag{ 2.12 }$

$\begin{eqnarray}&&\Rightarrow \,{P}_{0}^{(n)}(t)={\mathbb{P}}[T\leqslant t| {E}^{(n)}=0]\,{\mathbb{P}}[{E}^{(n)}=0],\end{eqnarray} \tag{ 2.13 }$

$\begin{eqnarray}&&\Rightarrow \,{P}_{0}^{(n)}(t)={W}_{L}^{(n)}(t){{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 2.14 }$

$\begin{eqnarray}&&\Rightarrow \,\displaystyle \frac{d}{{dt}}{P}_{0}^{(n)}(t)={w}_{L}^{(n)}(t){{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 2.15 }$

$\begin{eqnarray}&&\Rightarrow \,{\mu }_{1}{P}_{1}^{(n)}(t)={w}_{L}^{(n)}(t){{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 2.16 }$

using equation (2.10). □

3. Algorithm for reconstructing transition rates

Our algorithm for transition rate reconstruction requires the following as input: for each n, the fraction of random walks that exit and whose maximal site does not exceed n; and the mean extinction time for these conditional random walks. For each n ≥ 1, Π⁽ⁿ⁾ and ${\mathbb{E}}[{T}^{(n)}| {E}^{(n)}=0]\equiv {M}^{(n)}$ yield {λ_n, μ_n}: see figure 4.

**Figure 4.** Flow chart of the algorithm presented in this paper. ${{\rm{\Pi }}}^{(n)}$ is the probability of left exit and ${M}^{(n)}$ is the mean of the extinction times, all conditioned on the particles remaining in the domain $\{1,\,\ldots ,\,n\}$ before exiting. At each site, a pair of birth and death rate at that site are recovered.
Download figure:
Standard image High-resolution image

**Figure 4.** Flow chart of the algorithm presented in this paper. ${{\rm{\Pi }}}^{(n)}$ is the probability of left exit and ${M}^{(n)}$ is the mean of the extinction times, all conditioned on the particles remaining in the domain $\{1,\,\ldots ,\,n\}$ before exiting. At each site, a pair of birth and death rate at that site are recovered.
Download figure:
Standard image High-resolution image

3.1. Inference of μ₁ and λ₁

In the first step, we recover the birth and death rates at site 1. Note that ${{\bf{P}}}^{(1)}(t)$ only contains a single element, and the forward master equation can be written as

$\begin{eqnarray}&&{\dot{P}}^{(1)}={A}^{(1)}{P}^{(1)},\end{eqnarray} \tag{ 3.1 }$

with

$\begin{eqnarray}&&{A}^{(1)}=-({\lambda }_{1}+{\mu }_{1})\qquad \ \mathrm{and}\ \qquad {P}^{(1)}(0)=1.\end{eqnarray} \tag{ 3.2 }$

This simple ODE has solution

$\begin{eqnarray}&&{P}^{(1)}(t)={e}^{-({\lambda }_{1}+{\mu }_{1})t}.\end{eqnarray} \tag{ 3.3 }$

Suppose we only consider left extinction times, with n = 1, generated by all trajectories that directly arrive at site 0 from site 1. These extinction times are exponentially distributed with parameter ${\lambda }_{1}+{\mu }_{1}$ :

$\begin{eqnarray}&&{W}_{L}^{(1)}(t)={\mathbb{P}}[{T}^{(1)}\leqslant t| {E}^{(1)}=0]=1-{e}^{-({\lambda }_{1}+{\mu }_{1})t}.\end{eqnarray} \tag{ 3.4 }$

It follows from the property of exponential distributions that

$\begin{eqnarray}&&{\lambda }_{1}+{\mu }_{1}=\displaystyle \frac{1}{{\mathbb{E}}[{T}^{(1)}| {E}^{(1)}=0]}=\displaystyle \frac{1}{{M}^{(1)}}.\end{eqnarray} \tag{ 3.5 }$

In the next step, we use (2.11) to get

$\begin{eqnarray}&&{\mu }_{1}{P}_{1}^{(1)}(t)={w}_{L}^{(1)}(t){{\rm{\Pi }}}^{(1)}\end{eqnarray} \tag{ 3.6 }$

$\begin{eqnarray}&&\Rightarrow \,{\mu }_{1}=\displaystyle \frac{{{\rm{\Pi }}}^{(1)}}{{\int }_{0}^{\infty }{P}_{1}^{(1)}(t^{\prime} ){dt}^{\prime} }\end{eqnarray} \tag{ 3.7 }$

$\begin{eqnarray}&&\Rightarrow \,{\mu }_{1}={{\rm{\Pi }}}^{(1)}({\lambda }_{1}+{\mu }_{1})=\displaystyle \frac{{{\rm{\Pi }}}^{(1)}}{{M}^{(1)}}\end{eqnarray} \tag{ 3.8 }$

$\begin{eqnarray}&&\Rightarrow \,{\lambda }_{1}=\displaystyle \frac{1-{{\rm{\Pi }}}^{(1)}}{{M}^{(1)}}\end{eqnarray} \tag{ 3.9 }$

We now have obtained the approximations for μ₁ and λ₁.

3.2. Inference of μ₂ and λ₂

The forward master equations are for ${{\bf{P}}}^{(2)}(t)$ are

$\begin{eqnarray}&&{\dot{{\bf{P}}}}^{(2)}={A}^{(2)}{{\bf{P}}}^{(2)}\end{eqnarray} \tag{ 3.10 }$

where

$\begin{eqnarray}{A}^{(2)}=\left[\begin{array}{cc}-({\lambda }_{1}+{\mu }_{1}) & {\mu }_{2}\\ {\lambda }_{1} & -({\lambda }_{2}+{\mu }_{2})\end{array}\right].\end{eqnarray} \tag{ 3.11 }$

By (2.11), we have that

$\begin{eqnarray}&&{\mu }_{1}{P}_{1}^{(2)}(t)={w}_{L}^{(2)}(t){{\rm{\Pi }}}^{(n)},\end{eqnarray} \tag{ 3.12 }$

$\begin{eqnarray}&&\Rightarrow \,{\mu }_{1}{\int }_{0}^{\infty }{P}_{1}^{(2)}(t^{\prime} ){dt}^{\prime} ={{\rm{\Pi }}}^{(2)},\end{eqnarray} \tag{ 3.13 }$

$\begin{eqnarray}&&\Rightarrow \,{w}_{L}^{(2)}(t)=\displaystyle \frac{{P}_{1}^{(2)}(t)}{{\int }_{0}^{\infty }{P}_{1}^{(2)}(t^{\prime} ){dt}^{\prime} }.\end{eqnarray} \tag{ 3.14 }$

Now introduce the Laplace transform ${ \mathcal L }\{P(t)\}=\tilde{P}(s)$ . Then the Laplace transformed equation for ${P}_{1}^{(2)}(t)$ satisfies

$\begin{eqnarray}&&[{s}^{2}+{\xi }_{2}^{(2)}s+{\xi }_{1}^{(2)}]{\tilde{P}}_{1}^{(2)}(s)=s+{\eta }_{2}^{(2)},\end{eqnarray} \tag{ 3.15 }$

where ${\xi }_{2}^{(2)}={\lambda }_{1}+{\mu }_{1}+{\lambda }_{2}+{\mu }_{2}$ , ${\xi }_{1}^{(2)}={\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1}$ and ${\eta }_{2}^{(2)}={\lambda }_{2}+{\mu }_{2}$ . Taking derivatives with respect to s, we have

$\begin{eqnarray}&&[2s+{\xi }_{2}^{(2)}]{\tilde{P}}_{1}^{(2)}(s)+[{s}^{2}+{\xi }_{2}^{(2)}s+{\xi }_{1}^{(2)}]\displaystyle \frac{d{\tilde{P}}_{1}^{(2)}(s)}{{ds}}=1,\end{eqnarray} \tag{ 3.16 }$

and when s = 0,

$\begin{eqnarray}&&({\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1}){\tilde{P}}_{1}^{(2)}(0)-({\lambda }_{2}+{\mu }_{2})=0,\end{eqnarray} \tag{ 3.17 }$

$\begin{eqnarray}&&({\lambda }_{1}+{\mu }_{1}+{\lambda }_{2}+{\mu }_{2}){\tilde{P}}_{1}^{(2)}(0)+({\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1}){\left.\displaystyle \frac{d{\tilde{P}}_{1}^{(2)}(s)}{{ds}}\right|}_{s=0}=1.\end{eqnarray} \tag{ 3.18 }$

Equations (3.17) and (3.18) can be rewritten as

$\begin{eqnarray}&&\left[\displaystyle \frac{{\lambda }_{1}+{\mu }_{1}}{{\mu }_{1}}{{\rm{\Pi }}}^{(2)}-1\right]{\lambda }_{2}+[{{\rm{\Pi }}}^{(2)}-1]{\mu }_{2}=0,\end{eqnarray} \tag{ 3.19 }$

$\begin{eqnarray}&&[1-{M}^{(2)}({\lambda }_{1}+{\mu }_{1})]{\lambda }_{2}+[1-{M}^{(2)}{\mu }_{1}]{\mu }_{2}=\displaystyle \frac{{\mu }_{1}}{{{\rm{\Pi }}}^{(2)}}-{\lambda }_{1}-{\mu }_{1},\end{eqnarray} \tag{ 3.20 }$

a linear system for λ₂, μ₂ where

$\begin{eqnarray}&&{{\rm{\Pi }}}^{(2)}={\mu }_{1}{\tilde{P}}_{1}^{(2)}(0)\quad \mathrm{and}\quad {M}^{(2)}=-\displaystyle \frac{{\mu }_{1}}{{{\rm{\Pi }}}^{(2)}}{\left.\displaystyle \frac{d{\tilde{P}}_{1}^{(2)}(s)}{{ds}}\right|}_{s=0}.\end{eqnarray} \tag{ 3.21 }$

Assuming λ₁ and μ₁ are known from the n = 1 case, solving equations (3.19) and (3.20) allows us to compute λ₂ and μ₂ from the conditional moments Π⁽²⁾ and M⁽²⁾.

3.3. Inference of μ_n and λ_n for n ≥ 3

Now we consider the n-th site after computing the birth and death rates for the first n − 1 sites. The Laplace transformed ODEs of ${\tilde{{\bf{P}}}}^{(n)}(s)={[{\tilde{P}}_{1}^{(n)}(s),\ldots ,{\tilde{P}}_{n}^{(n)}(s)]}^{T}$ can be represented in the following matrix form:

$\begin{eqnarray}&&({{sI}}_{n}-{A}^{(n)}){\tilde{{\bf{P}}}}^{(n)}(s)={{\bf{e}}}_{1}^{(n)}\end{eqnarray} \tag{ 3.22 }$

where ${{\bf{e}}}_{1}^{(n)}={[1,0,\ldots ,0]}^{T}$ has n elements and I_n is the identity matrix of size n × n. Whenever s is not an eigenvalue of ${A}^{(n)}$ , we have that ${\tilde{{\bf{P}}}}^{(n)}(s)={({{sI}}_{n}-{A}^{(n)})}^{-1}{{\bf{e}}}_{1}^{(n)}$ . Let the characteristic polynomial of ${A}^{(n)}$ be

$\begin{eqnarray}&&q(s;{\lambda }_{1},{\mu }_{1})={s}^{n}+{\xi }_{n}^{(n)}{s}^{n-1}\,+\cdots +\,{\xi }_{2}^{(n)}s+{\xi }_{1}^{(n)}=\det ({{sI}}_{n}-{A}^{(n)}),\end{eqnarray} \tag{ 3.23 }$

where the $\{{\xi }_{i}^{(n)}\}$ are the coefficients of the characteristic polynomial of ${A}^{(n)}$ , and they can be written as functions of birth and death rates:

$\begin{eqnarray}&&{\xi }_{i}^{(n)}={\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n};{\mu }_{1},\,\ldots ,\,{\mu }_{n}).\end{eqnarray} \tag{ 3.24 }$

In addition, define

$\begin{eqnarray}&&{\eta }_{i}^{(n)}={\xi }_{i}^{(n)}({\lambda }_{1}=0,{\lambda }_{2},\,\ldots ,\,{\lambda }_{n};{\mu }_{1}=0,{\mu }_{2},\,\ldots ,\,{\mu }_{n}).\end{eqnarray} \tag{ 3.25 }$

Finally, we note that the coefficient of of sⁿ in (3.23) is 1 and for notational convenience, define

$\begin{eqnarray}&&{\xi }_{n+1}^{(n)}=1;\quad {\xi }_{n+k}^{(n)}=0\ \mathrm{for}\ k\geqslant 2\end{eqnarray} \tag{ 3.26 }$

$\begin{eqnarray}&&{\eta }_{n+1}^{(n)}=1;\quad {\eta }_{n+k}^{(n)}=0\ \mathrm{for}\ k\geqslant 2\end{eqnarray} \tag{ 3.27 }$

Lemma 2. The Laplace-transformed probability ${\tilde{P}}_{1}^{(n)}(s)$ , the first element in ${\tilde{{\bf{P}}}}^{(n)}(s)$ from equation (3.22), is a rational function in s satisfying

$\begin{eqnarray}&&[{s}^{n}+{\xi }_{n}^{(n)}{s}^{n-1}\,+\cdots +\,{\xi }_{3}^{(n)}{s}^{2}+{\xi }_{2}^{(n)}s+{\xi }_{1}^{(n)}]{\tilde{P}}_{1}^{(n)}(s)={s}^{n-1}+{\eta }_{n}^{(n)}{s}^{n-2}\,+\cdots +\,{\eta }_{3}^{(n)}s+{\eta }_{2}^{(n)}.\end{eqnarray} \tag{ 3.28 }$

for some constants $\{{\xi }_{i}^{(n)}\}{}_{i=1}^{n}$ and $\{{\eta }_{i}^{(n)}\}{}_{i=2}^{n}$ .

Proof. Let ${\hat{A}}^{(k-1)}$ be the (k − 1) × (k − 1) submatrix of ${A}^{(k)}$ with the first row and first column removed, so that

$\begin{eqnarray}{\hat{A}}^{(k-1)}=\left[\begin{array}{ccccc}-({\lambda }_{2}+{\mu }_{2}) & {\mu }_{3} & & & \\ {\lambda }_{2} & -({\lambda }_{3}+{\mu }_{3}) & {\mu }_{4} & & \\ & \ddots & \ddots & \ddots & \\ & & {\lambda }_{k-2} & -({\lambda }_{k-1}+{\mu }_{k-1}) & {\mu }_{k}\\ & & & {\lambda }_{k-1} & -({\lambda }_{k}+{\mu }_{k})\end{array}\right].\end{eqnarray} \tag{ 3.29 }$

By Cramer's rule and the definition of the determinant in terms of its cofactor expansion, the first element of the solution vector ${\tilde{{\bf{P}}}}^{(n)}(s)$ can be calculated as

$\begin{eqnarray}{\tilde{P}}_{1}^{(n)}(s)=\displaystyle \frac{\det \left[\begin{array}{cc}1 & -{\mu }_{2}{[{{\bf{e}}}_{1}^{(n-1)}]}^{T}\\ {\bf{0}} & {{sI}}_{n-1}-{\hat{A}}^{(n-1)}\end{array}\right]}{\det ({{sI}}_{n}-{A}^{(n)})}=\displaystyle \frac{\det ({{sI}}_{n-1}-{\hat{A}}^{(n-1)})}{\det ({{sI}}_{n}-{A}^{(n)})}.\end{eqnarray} \tag{ 3.30 }$

Now introduce the polynomial

$\begin{eqnarray}&&\det ({{sI}}_{n-1}-{\hat{A}}^{(n-1)})={s}^{n-1}+{c}_{n-1}{s}^{n-2}+{c}_{n-2}{s}^{n-3}\,+\ldots +\,{c}_{2}s+{c}_{1},\end{eqnarray} \tag{ 3.31 }$

for some coefficients ${c}_{n-1},{c}_{n-2},\,\ldots ,\,{c}_{1}$ . Then it is clear that

$\begin{eqnarray}&&q(s;0,0)={s}^{n}+{\eta }_{n}^{(n)}{s}^{n-1}\,+\cdots +\,{\eta }_{3}^{(n)}{s}^{2}+{\eta }_{2}^{(n)}s+{\eta }_{1}^{(n)},\end{eqnarray} \tag{ 3.32 }$

$\begin{eqnarray}=\,\det \left[\begin{array}{cc}s & -{\mu }_{2}{[{{\bf{e}}}_{1}^{(n-1)}]}^{T}\\ {\bf{0}} & {{sI}}_{n-1}-{\hat{A}}^{(n-1)}\end{array}\right]=s\det ({{sI}}_{n-1}-{\hat{A}}^{(n-1)}),\end{eqnarray} \tag{ 3.33 }$

from the definitions in equations (3.25) and the fact that the (1, 1) and (2, 1) entries of A⁽ⁿ⁾ are zero when λ₁ = μ₁ = 0. Because $q(s=0;{\lambda }_{1}=0,{\mu }_{1}=0)=0$ , it follows that ${\eta }_{1}^{(n)}=0$ . Equations (3.32) and (3.33) imply that

$\begin{eqnarray}&&{s}^{n}+{\eta }_{n}^{(n)}{s}^{n-1}\,+\cdots +\,{\eta }_{3}^{(n)}{s}^{2}+{\eta }_{2}^{(n)}s+\mathop{\underbrace{{\eta }_{1}^{(n)}}}\limits_{=0}={s}^{n}+{c}_{n-1}{s}^{n-1}+{c}_{n-2}{s}^{n-2}\,+\ldots +\,{c}_{2}{s}^{2}+{c}_{1}s.\end{eqnarray} \tag{ 3.34 }$

Comparing coefficients, we find that ${\eta }_{n}^{(n)}={c}_{n-1}$ , ${\eta }_{n-1}^{(n)}={c}_{n-2},\,\ldots ,\,{\eta }_{2}^{(n)}={c}_{1}$ , so that equation (3.31) becomes

$\begin{eqnarray}&&\det ({{sI}}_{n-1}-{\hat{A}}^{(n-1)})={s}^{n-1}+{\eta }_{n}^{(n)}{s}^{n-2}+{\eta }_{n-1}^{(n)}{s}^{n-3}\,+\ldots +\,{\eta }_{3}^{(n)}s+{\eta }_{2}^{(n)},\end{eqnarray} \tag{ 3.35 }$

$\begin{eqnarray}&&\Rightarrow \,{\tilde{P}}_{1}^{(n)}(s)=\displaystyle \frac{{s}^{n-1}+{\eta }_{n}^{(n)}{s}^{n-2}\,+\cdots +\,{\eta }_{3}^{(n)}s+{\eta }_{2}^{(n)}}{{s}^{n}+{\xi }_{n}^{(n)}{s}^{n-1}\,+\cdots +\,{\xi }_{2}^{(n)}s+{\xi }_{1}^{(n)}}.\end{eqnarray} \tag{ 3.36 }$

□

Since the expression $\tfrac{{{\rm{\Pi }}}^{(n)}}{{\mu }_{1}}$ appears frequently in later analysis, we define the notation

$\begin{eqnarray}&&{r}^{(n)}=\displaystyle \frac{{{\rm{\Pi }}}^{(n)}}{{\mu }_{1}}\end{eqnarray} \tag{ 3.37 }$

and use it in the analysis below.

Lemma 3. [Moment and Exit Probability Relations] Let ${M}^{(n)}\equiv {\mathbb{E}}[{T}^{(n)}| {E}^{(n)}=0]$ , then

$\begin{eqnarray}&&{{\rm{\Pi }}}^{(n)}={\mu }_{1}{\tilde{P}}_{1}^{(n)}(0),\end{eqnarray} \tag{ 3.38 }$

$\begin{eqnarray}&&{M}^{(n)}=-\displaystyle \frac{1}{{r}^{(n)}}{\left.\displaystyle \frac{d{\tilde{P}}_{1}^{(n)}(s)}{{ds}}\right|}_{s=0}.\end{eqnarray} \tag{ 3.39 }$

Proof. The Laplace transform of ${P}_{1}^{(n)}(t)$ is ${\tilde{P}}_{1}^{(n)}(s)={\int }_{0}^{\infty }{e}^{-{st}}{P}_{1}^{(n)}(t){dt}$ . Integrating both sides of (2.11), we have that

$\begin{eqnarray*}&&{{\rm{\Pi }}}^{(n)}={\mu }_{1}{\int }_{0}^{\infty }{P}_{1}^{(n)}(t^{\prime} ){dt}^{\prime} \Longrightarrow {w}_{L}^{(n)}(t)=\displaystyle \frac{{P}_{1}^{(n)}(t)}{{\int }_{0}^{\infty }{P}_{1}^{(n)}(t^{\prime} ){dt}^{\prime} }.\end{eqnarray*}$

Therefore, equation (3.38) holds:

$\begin{eqnarray*}&&{\tilde{P}}_{1}^{(n)}(0)={\int }_{0}^{\infty }{P}_{1}^{(n)}(t){dt}={r}^{(n)}.\end{eqnarray*}$

If we differentiate the Laplace transform ${\tilde{P}}_{1}^{(n)}(s)$ , then (3.39) is validated by

$\begin{eqnarray*}\displaystyle \frac{d{\tilde{P}}_{1}^{(n)}(s)}{{ds}} & = & -{\displaystyle \int }_{0}^{\infty }{{te}}^{-{st}}{P}_{1}^{(n)}(t){dt}\\ \Rightarrow \,{\left.\displaystyle \frac{d{\tilde{P}}_{1}^{(n)}(s)}{{ds}}\right|}_{s=0} & = & -{\displaystyle \int }_{0}^{\infty }{{tP}}_{1}^{(n)}(t){dt}\\ & = & -{r}^{(n)}{\displaystyle \int }_{0}^{\infty }{{tw}}_{L}^{(n)}(t){dt}=-{r}^{(n)}{M}^{(n)},\end{eqnarray*}$

using equation (2.11). □

By setting s = 0 in (3.28) and its derivative equation, we find two constraints involving the exit probability and the first moment ${M}^{(n)}$ of the conditional extinction times using (3.38) (3.39):

$\begin{eqnarray}&&{\xi }_{1}^{(n)}{r}^{(n)}={\eta }_{2}^{(n)},\end{eqnarray} \tag{ 3.40 }$

$\begin{eqnarray}&&({\xi }_{2}^{(n)}-{\xi }_{1}^{(n)}{M}^{(n)}){r}^{(n)}={\eta }_{3}^{(n)}.\end{eqnarray} \tag{ 3.41 }$

Lemma 4. (Recurrence Relations.) The coefficients ${\xi }_{1}^{(n)}$ , ${\xi }_{2}^{(n)}$ and ${\eta }_{2}^{(n)}$ , ${\eta }_{3}^{(n)}$ are linear in ${\lambda }_{n}$ and ${\mu }_{n}$ , satisfying

$\begin{eqnarray}&&{\xi }_{1}^{(n)}=({\lambda }_{n}+{\mu }_{n}){\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)},\end{eqnarray} \tag{ 3.42 }$

$\begin{eqnarray}&&{\xi }_{2}^{(n)}={\xi }_{1}^{(n-1)}+({\lambda }_{n}+{\mu }_{n}){\xi }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{2}^{(n-2)}.\end{eqnarray} \tag{ 3.43 }$

and

$\begin{eqnarray}&&{\eta }_{2}^{(n)}=({\lambda }_{n}+{\mu }_{n}){\eta }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{2}^{(n-2)},\end{eqnarray} \tag{ 3.44 }$

$\begin{eqnarray}&&{\eta }_{3}^{(n)}={\eta }_{2}^{(n-1)}+({\lambda }_{n}+{\mu }_{n}){\eta }_{3}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{3}^{(n-2)}.\end{eqnarray} \tag{ 3.45 }$

Proof. From the definition of A⁽ⁿ⁾ and a cofactor expansion, we have

$\begin{eqnarray*}\det ({{sI}}_{n}-{A}^{(n)}) & = & \det \left[\begin{array}{ccccc}s+{\lambda }_{1}+{\mu }_{1} & -{\mu }_{2} & & & \\ -{\lambda }_{1} & s+{\lambda }_{2}+{\mu }_{2} & -{\mu }_{3} & & \\ & \ddots & \ddots & \ddots & \\ & & -{\lambda }_{n-2} & s+{\lambda }_{n-1}+{\mu }_{n-1} & -{\mu }_{n}\\ & & & -{\lambda }_{n-1} & s+{\lambda }_{n}+{\mu }_{n}\end{array}\right],\\ & = & (s+{\lambda }_{n}+{\mu }_{n})\det ({{sI}}_{n-1}-{A}^{(n-1)})-{\mu }_{n}{\lambda }_{n-1}\det ({{sI}}_{n-2}-{A}^{(n-2)}),\\ \Rightarrow \,\displaystyle \sum _{k=1}^{n+1}{\xi }_{k}^{(n)}{s}^{k-1} & = & (s+{\lambda }_{n}+{\mu }_{n})({s}^{n-1}+{\xi }_{n-1}^{(n-1)}{s}^{n-2}\,+\ldots +\,{\xi }_{2}^{(n-1)}s+{\xi }_{1}^{(n-1)})\\ & & -{\mu }_{n}{\lambda }_{n-1}({s}^{n-2}+{\xi }_{n-2}^{(n-2)}{s}^{n-3}\,+\ldots +\,{\xi }_{2}^{(n-2)}s+{\xi }_{1}^{(n-2)}).\end{eqnarray*}$

By equating coefficients at O(1) and O(s), we can establish a recurrence relation between the coefficients of the characteristic polynomial for the n-site subproblem and the n − 1 and n − 2 site subproblems:

$\begin{eqnarray}&&{\xi }_{1}^{(n)}=({\lambda }_{n}+{\mu }_{n}){\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)},\end{eqnarray} \tag{ 3.46 }$

$\begin{eqnarray}&&{\xi }_{2}^{(n)}={\xi }_{1}^{(n-1)}+({\lambda }_{n}+{\mu }_{n}){\xi }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{2}^{(n-2)}.\end{eqnarray} \tag{ 3.47 }$

It is clear from this recurrence that ${\xi }_{1}^{(n)}$ and ${\xi }_{2}^{(n)}$ are linear in the transition rates λ_n and μ_n since ${\xi }_{1}^{(n-1)}$ only depends on ${\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1}$ , ${\mu }_{1},\,\ldots ,\,{\mu }_{n-1}$ and ${\xi }_{1}^{(n-2)}$ only depends on ${\lambda }_{1},\,\ldots ,\,{\lambda }_{n-2}$ , ${\mu }_{1},\,\ldots ,\,{\mu }_{n-2}$ . A similar argument applied to $\det ({{sI}}_{n-1}-{\hat{A}}^{(n-1)})$ shows that $\{{\eta }_{2}^{(n)}\}$ and $\{{\eta }_{3}^{(n)}\}$ are linear in λ_n and μ_n also:

$\begin{eqnarray}&&{\eta }_{2}^{(n)}=({\lambda }_{n}+{\mu }_{n}){\eta }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{2}^{(n-2)},\end{eqnarray} \tag{ 3.48 }$

$\begin{eqnarray}&&{\eta }_{3}^{(n)}={\eta }_{2}^{(n-1)}+({\lambda }_{n}+{\mu }_{n}){\eta }_{3}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{3}^{(n-2)}.\end{eqnarray} \tag{ 3.49 }$

□

The way of computing (λ_n, μ_n) for the n-site subproblem is to rewrite equations (3.40), (3.41) and (3.42)–(3.45):

$\begin{eqnarray}&&{V}_{1}^{(n)}\left[\begin{array}{c}{\xi }_{1}^{(n)}\\ {\xi }_{2}^{(n)}\\ {\eta }_{2}^{(n)}\\ {\eta }_{3}^{(n)}\end{array}\right]=\left[\begin{array}{c}0\\ 0\end{array}\right],\qquad {V}_{2}^{(n)}\left[\begin{array}{c}{\sigma }_{n}\\ {\mu }_{n}\end{array}\right]=\left[\begin{array}{c}{\xi }_{1}^{(n)}\\ {\xi }_{2}^{(n)}\\ {\eta }_{2}^{(n)}\\ {\eta }_{3}^{(n)}\end{array}\right]-\left[\begin{array}{c}0\\ {\xi }_{1}^{(n-1)}\\ 0\\ {\eta }_{2}^{(n-1)}\end{array}\right],\end{eqnarray} \tag{ 3.50 }$

where

$\begin{eqnarray}{V}_{1}^{(n)}=\left[\begin{array}{cccc}{r}^{(n)} & 0 & -1 & 0\\ -{M}^{(n)}{r}^{(n)} & {r}^{(n)} & 0 & -1\end{array}\right],\qquad {V}_{2}^{(n)}=\left[\begin{array}{cc}{\xi }_{1}^{(n-1)} & -{\lambda }_{n-1}{\xi }_{1}^{(n-2)}\\ {\xi }_{2}^{(n-1)} & -{\lambda }_{n-1}{\xi }_{2}^{(n-2)}\\ {\eta }_{2}^{(n-1)} & -{\lambda }_{n-1}{\eta }_{2}^{(n-2)}\\ {\eta }_{3}^{(n-1)} & -{\lambda }_{n-1}{\eta }_{3}^{(n-2)}\end{array}\right],\end{eqnarray} \tag{ 3.51 }$

and σ_n = λ_n + μ_n. Eliminating the vector ${\left[\begin{array}{cccc}{\xi }_{1}^{(n)} & {\xi }_{2}^{(n)} & {\eta }_{2}^{(n)} & {\eta }_{3}^{(n)}\end{array}\right]}^{T}$ , we find that

$\begin{eqnarray}&&{V}_{1}^{(n)}{V}_{2}^{(n)}\left[\begin{array}{c}{\sigma }_{n}\\ {\mu }_{n}\end{array}\right]=\left[\begin{array}{c}0\\ {\eta }_{2}^{(n-1)}-{r}^{(n)}{\xi }_{1}^{(n-1)}\end{array}\right].\end{eqnarray} \tag{ 3.52 }$

If the matrix ${V}_{1}^{(n)}{V}_{2}^{(n)}$ is invertible, σ_n and μ_n are uniquely determined, and so λ_n and μ_n are uniquely determined.

Theorem 1. Given exact data $\{{{\rm{\Pi }}}^{(n)},{M}^{(n)}\},n=1,2,\,\ldots ,\,N$ generated by some underlying birth-death process, the rates $({\lambda }_{n},{\mu }_{n}),n=1,\,\ldots ,\,N$ are uniquely determined. In particular, the matrix

$\begin{eqnarray}{F}^{(2)}=\left[\begin{array}{cc}({\lambda }_{1}+{\mu }_{1}){r}^{(2)}-1 & {{\rm{\Pi }}}^{(2)}-1\\ 1-{M}^{(2)}({\lambda }_{1}+{\mu }_{1}) & 1-{M}^{(2)}{\mu }_{1}\end{array}\right]\end{eqnarray} \tag{ 3.53 }$

from equations (3.19) and (3.20) is invertible and the 2 × 2 matrix ${V}_{1}^{(n)}{V}_{2}^{(n)}$ (where ${V}_{1}^{(n)}$ and ${V}_{2}^{(n)}$ are defined in equation (3.51)) is invertible for n ≥ 3.

Proof. Consider the case n = 2. Then, we see that

$\begin{eqnarray}&&\det {F}^{(2)}={\lambda }_{1}({r}^{(n)}-{M}^{(2)}),\end{eqnarray} \tag{ 3.54 }$

$\begin{eqnarray}&&=\,-\displaystyle \frac{{\lambda }_{1}^{2}{\mu }_{2}}{({\lambda }_{2}+{\mu }_{2})({\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2})}\lt 0,\end{eqnarray} \tag{ 3.55 }$

where we used the relations

$\begin{eqnarray}&&{{\rm{\Pi }}}^{(2)}=\displaystyle \frac{{\mu }_{1}({\lambda }_{2}+{\mu }_{2})}{{\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1}},\end{eqnarray} \tag{ 3.56 }$

$\begin{eqnarray}&&{M}^{(2)}=\displaystyle \frac{{\lambda }_{2}^{2}+2{\lambda }_{2}{\mu }_{2}+{\lambda }_{1}{\mu }_{2}+{\mu }_{2}^{2}}{({\lambda }_{2}+{\mu }_{2})({\lambda }_{1}{\lambda }_{2}+{\lambda }_{2}{\mu }_{1}+{\mu }_{1}{\mu }_{2})}.\end{eqnarray} \tag{ 3.57 }$

Therefore (λ₂, μ₂) are uniquely determined. Now consider the case n ≥ 3. For reference, define

$\begin{eqnarray}{\tilde{V}}_{2}^{(n)}=\left[\begin{array}{cc}{\xi }_{1}^{(n-1)} & {\xi }_{1}^{(n-2)}\\ {\xi }_{2}^{(n-1)} & {\xi }_{2}^{(n-2)}\\ {\eta }_{2}^{(n-1)} & {\eta }_{2}^{(n-2)}\\ {\eta }_{3}^{(n-1)} & {\eta }_{3}^{(n-2)}\end{array}\right].\end{eqnarray} \tag{ 3.58 }$

To show that ${V}_{1}^{(n)}{V}_{2}^{(n)}$ is invertible, it is sufficient to show that $\det ({V}_{1}^{(n)}{\tilde{V}}_{2}^{(n)})\ne 0$ . We split the proof into two parts. First, we find a simple expression for the determinant. Second, we show using induction that this expression is always non-zero.

Expression for Determinant. The determinant depends on r⁽ⁿ⁾ and M⁽ⁿ⁾, which are determined by the (perfect) data. Using the recurrence relations (3.42)–(3.45), we note that

$\begin{eqnarray}&&{r}^{(n)}={\tilde{P}}_{1}^{(n)}(0)=\displaystyle \frac{{\eta }_{2}^{(n)}}{{\xi }_{1}^{(n)}}=\displaystyle \frac{{\sigma }_{n}{\eta }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{2}^{(n-2)}}{{\sigma }_{n}{\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)}},\end{eqnarray} \tag{ 3.59 }$

and

$\begin{eqnarray*}-{M}^{(n)}{r}^{(n)} & = & {\left.\displaystyle \frac{d{\tilde{P}}_{1}^{(n)}}{{ds}}\right|}_{s=0},\\ & = & \displaystyle \frac{({\eta }_{2}^{(n-1)}+{\sigma }_{n}{\eta }_{3}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{3}^{(n-2)})({\sigma }_{n}{\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)})}{{({\sigma }_{n}{\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)})}^{2}}\\ & & -\displaystyle \frac{({\sigma }_{n}{\eta }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{2}^{(n-2)})({\xi }_{1}^{(n-1)}+{\sigma }_{n}{\xi }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{2}^{(n-2)})}{{({\sigma }_{n}{\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)})}^{2}}.\end{eqnarray*}$

Therefore, after some algebra

$\begin{eqnarray}&&\det ({V}_{1}^{(n)}{\tilde{V}}_{2}^{(n)})=\displaystyle \frac{{\lambda }_{n-1}{\mu }_{n}{({\eta }_{2}^{(n-1)}{\xi }_{1}^{(n-2)}-{\eta }_{2}^{(n-2)}{\xi }_{1}^{(n-1)})}^{2}}{{{\xi }_{1}^{(n)}}^{2}}.\end{eqnarray} \tag{ 3.60 }$

The denominator is always nonzero because ${\xi }_{1}^{(n)}={(-1)}^{n}\det ({A}^{(n)})\ne 0$ . It is well-known that the eigenvalues of the infinitesimal generator matrix (and its submatrices) of a birth-death process are all negative [35], and the matrix ${A}^{(n)}$ is the transpose of a submatrix of such an infinitesimal generator. It remains to show that this expression is non-zero for $n\geqslant 3$ .

Induction Proof. First we show that $\det ({V}_{1}^{(3)}{\tilde{V}}_{2}^{(3)})$ is non-zero. Because

$\begin{eqnarray}&&{\xi }_{1}^{(2)}={\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1},\end{eqnarray} \tag{ 3.61 }$

$\begin{eqnarray}&&{\eta }_{2}^{(2)}={\lambda }_{2}+{\mu }_{2},\end{eqnarray} \tag{ 3.62 }$

$\begin{eqnarray}&&{\xi }_{1}^{(1)}={\lambda }_{1}+{\mu }_{1},\end{eqnarray} \tag{ 3.63 }$

$\begin{eqnarray}&&{\eta }_{2}^{(1)}=1,\end{eqnarray} \tag{ 3.64 }$

$\begin{eqnarray}&&\Rightarrow \,\det ({V}_{1}^{(3)}{\tilde{V}}_{2}^{(3)})=\displaystyle \frac{{\lambda }_{2}{\mu }_{3}{\lambda }_{1}^{2}{\mu }_{2}^{2}}{{{\xi }_{1}^{(3)}}^{2}}\gt 0.\end{eqnarray} \tag{ 3.65 }$

Now assume that $\det ({V}_{1}^{(n)}{\tilde{V}}_{2}^{(n)})$ is non-zero. Then

$\begin{eqnarray}&&{\eta }_{2}^{(n-1)}{\xi }_{1}^{(n-2)}-{\eta }_{2}^{(n-2)}{\xi }_{1}^{(n-1)}\ne 0.\end{eqnarray} \tag{ 3.66 }$

It suffices to show that ${\eta }_{2}^{(n)}{\xi }_{1}^{(n-1)}-{\eta }_{2}^{(n-1)}{\xi }_{1}^{(n)}\ne 0$ . Using (3.42) and (3.44),

$\begin{eqnarray*}{\eta }_{2}^{(n)}{\xi }_{1}^{(n-1)}-{\eta }_{2}^{(n-1)}{\xi }_{1}^{(n)} & = & ({\sigma }_{n}{\eta }_{2}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\eta }_{2}^{(n-2)}){\xi }_{1}^{(n-1)}-{\eta }_{2}^{(n-1)}({\sigma }_{n}{\xi }_{1}^{(n-1)}-{\mu }_{n}{\lambda }_{n-1}{\xi }_{1}^{(n-2)}),\\ & = & {\mu }_{n}{\lambda }_{n-1}({\eta }_{2}^{(n-1)}{\xi }_{1}^{(n-2)}-{\eta }_{2}^{(n-2)}{\xi }_{1}^{(n-1)}),\\ & \ne & 0.\end{eqnarray*}$

Therefore $\det ({V}_{1}^{(n)}{\tilde{V}}_{2}^{(n)})$ is non-zero for n ≥ 3. Therefore ${V}_{1}^{(n)}{V}_{2}^{(n)}$ is invertible for n ≥ 3. From (3.52), σ_n and μ_n are uniquely determined and so (λ_n, μ_n) are also uniquely determined. □

In summary, at step n ≥ 3, we must solve the linear system

$\begin{eqnarray}&&{V}_{1}^{(n)}{V}_{2}^{(n)}\left[\begin{array}{c}{\sigma }_{n}\\ {\mu }_{n}\end{array}\right]=\left[\begin{array}{c}0\\ {\eta }_{2}^{(n-1)}-{r}^{(n)}{\xi }_{1}^{(n-1)}\end{array}\right],\end{eqnarray} \tag{ 3.67 }$

which we may write as

$\begin{eqnarray}&&{F}^{(n)}{\nu }_{n}={G}^{(n)},\end{eqnarray} \tag{ 3.68 }$

where

$\begin{eqnarray}{F}^{(n)}={V}_{1}^{(n)}{V}_{2}^{(n)}\left[\begin{array}{cc}1 & 1\\ 0 & 1\end{array}\right],\quad {\nu }_{n}=\left[\begin{array}{c}{\lambda }_{n}\\ {\mu }_{n}\end{array}\right],\quad {G}^{(n)}=\left[\begin{array}{c}0\\ {\eta }_{2}^{(n-1)}-{r}^{(n)}{\xi }_{1}^{(n-1)}\end{array}\right].\end{eqnarray} \tag{ 3.69 }$

The F⁽ⁿ⁾ and G⁽ⁿ⁾ depend on the previous transition rates ν₁, ..., ν_n−1 and from theorem 1, F⁽ⁿ⁾ is invertible. For reference, the entries of F⁽ⁿ⁾ are:

$\begin{eqnarray}&&{F}_{11}^{(n)}={r}^{(n)}{\xi }_{1}^{(n-1)}-{\eta }_{2}^{(n-1)},\end{eqnarray} \tag{ 3.70 }$

$\begin{eqnarray}&&{F}_{12}^{(n)}={r}^{(n)}({\xi }_{1}^{(n-1)}-{\lambda }_{n-1}{\xi }_{1}^{(n-2)})-{\eta }_{2}^{(n-1)}+{\lambda }_{n-1}{\eta }_{2}^{(n-2)},\end{eqnarray} \tag{ 3.71 }$

$\begin{eqnarray}&&{F}_{21}^{(n)}={r}^{(n)}({\xi }_{2}^{(n-1)}-{\xi }_{1}^{(n-1)}{M}^{(n)})-{\eta }_{3}^{(n-1)},\end{eqnarray} \tag{ 3.72 }$

$\begin{eqnarray}&&{F}_{22}^{(n)}={r}^{(n)}[{\xi }_{2}^{(n-1)}-{M}^{(2)}{\xi }_{1}^{(n-1)}-{\lambda }_{n-1}({\xi }_{2}^{(n-2)}-{M}^{(2)}{\xi }_{1}^{(n-2)})]+{\lambda }_{n-1}{\eta }_{3}^{(n-2)}-{\eta }_{3}^{(n-2)}.\end{eqnarray} \tag{ 3.73 }$

Let's define the following notations

$\begin{eqnarray}&&{{\bf{x}}}^{* }=({\lambda }_{1}^{* },\,\ldots ,\,{\lambda }_{n-1}^{* },{\mu }_{1}^{* },\,\ldots ,\,{\mu }_{n-1}^{* };{{\rm{\Pi }}}^{(1)},\,\ldots ,\,{{\rm{\Pi }}}^{(n)},{M}^{(1)},\,\ldots ,\,{M}^{(n)}),\end{eqnarray} \tag{ 3.74 }$

$\begin{eqnarray}&&{\boldsymbol{\delta }}{\bf{x}}=(\delta {\lambda }_{1},\,\ldots ,\,\delta {\lambda }_{n-1},\delta {\mu }_{1},\,\ldots ,\,\delta {\mu }_{n-1};\delta {{\rm{\Pi }}}^{(1)},\,\ldots ,\,\delta {{\rm{\Pi }}}^{(n)},\delta {M}^{(1)},\,\ldots ,\,\delta {M}^{(n)}),\end{eqnarray} \tag{ 3.75 }$

where the birth-death rates with asterisks stand for exact rates and ${{\rm{\Pi }}}^{(n)},{M}^{(n)}$ are exact data. In contrast, the elements in ${\boldsymbol{\delta }}{\bf{x}}$ are perturbations to the corresponding elements. Therefore the transition rates at site n depend on the rates at sites 1, 2, ..., n − 1:

$\begin{eqnarray}&&{\lambda }_{n}={f}_{1}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},{\mu }_{1},\,\ldots ,\,{\mu }_{n-1};{{\rm{\Pi }}}^{(1)},\,\ldots ,\,{{\rm{\Pi }}}^{(n)},{M}^{(1)},\,\ldots ,\,{M}^{(n)}),\end{eqnarray} \tag{ 3.76 }$

$\begin{eqnarray}&&{\mu }_{n}={f}_{2}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},{\mu }_{1},\,\ldots ,\,{\mu }_{n-1};{{\rm{\Pi }}}^{(1)},\,\ldots ,\,{{\rm{\Pi }}}^{(n)},{M}^{(1)},\,\ldots ,\,{M}^{(n)}),\end{eqnarray} \tag{ 3.77 }$

where ${f}_{i}^{(n)}:\,{{\mathbb{R}}}^{4n-2}\to {\mathbb{R}}$ for $i=1,2$ .

Theorem 2. (Error Propagation with site number.) Let ${\nu }_{n}={({\lambda }_{n},{\mu }_{n})}^{T},{D}_{n}={({{\rm{\Pi }}}^{(n)},{M}^{(n)})}^{T}$ and let ${\nu }_{n}^{* }$ be the exact transition rate at site n. Suppose that all first derivatives of ${f}_{1}^{(n)}$ and ${f}_{2}^{(n)}$ in equations (3.76), (3.77) are bounded in a small neighborhood B(x*,r) of x* in equation (3.74), i.e. there exists r, R > 0 such that for any x ∈ B(x*,r),

$\begin{eqnarray}&&\parallel {\rm{\nabla }}{f}_{m}^{(n)}({\bf{x}}){\parallel }_{\infty }\leqslant R/2,\end{eqnarray} \tag{ 3.78 }$

for 1 ≤ n ≤ N and m = 1, 2. If a sufficiently small error δD_k is introduced into the data ${\{{D}_{k}\}}_{k=1}^{n}$ at each site such that ${D}_{k}={D}_{k}^{* }+\delta {D}_{k}$ and $| | \delta {D}_{k}| {| }_{{\rm{\infty }}}\le r$ for $k=1,\,...,\,n$ , then the error of birth-death rates at site n satisfies

$\begin{eqnarray}&&\parallel \delta {\nu }_{n}{\parallel }_{\infty }\leqslant \displaystyle \sum _{j=1}^{n}R{(1+R)}^{n-j}\parallel \delta {D}_{j}{\parallel }_{\infty }.\end{eqnarray} \tag{ 3.79 }$

Proof. The analysis is fairly standard and makes use of Taylor series expansions. The exact rates satisfy

$\begin{eqnarray}&&{\lambda }_{n}^{* }={f}_{1}^{(n)}({{\bf{x}}}^{* }),\quad {\mu }_{n}^{* }={f}_{2}^{(n)}({{\bf{x}}}^{* })\end{eqnarray} \tag{ 3.80 }$

By the mean value theorem for multivariate functions, the errors at each site satisfy

$\begin{eqnarray}&&{\lambda }_{n}^{\ast }+\delta {\lambda }_{n}={f}_{1}^{(n)}({{\bf{x}}}^{\ast }+{\boldsymbol{\delta }}{\bf{x}})={f}_{1}^{(n)}({{\bf{x}}}^{\ast })+{\rm{\nabla }}{f}_{1}^{(n)}({{\bf{z}}}_{1}^{(n)})\cdot \delta {\bf{x}}\end{eqnarray} \tag{ 3.81 }$

$\begin{eqnarray}&&{\mu }_{n}^{\ast }+\delta {\mu }_{n}={f}_{2}^{(n)}({{\bf{x}}}^{\ast }+{\boldsymbol{\delta }}{\bf{x}})={f}_{2}^{(n)}({{\bf{x}}}^{\ast })+{\rm{\nabla }}{f}_{2}^{(n)}({{\bf{z}}}_{2}^{(n)})\cdot \delta {\bf{x}}\end{eqnarray} \tag{ 3.82 }$

for some ${{\bf{z}}}_{1}^{(n)}={{\bf{x}}}^{* }+{c}_{1}^{(n)}{\boldsymbol{\delta }}{\bf{x}},{{\bf{z}}}_{2}^{(n)}={{\bf{x}}}^{* }+{c}_{2}^{(n)}{\boldsymbol{\delta }}{\bf{x}}$ with ${c}_{i}^{(n)}\in (0,1),i\,=\,1,2$ , so that ${{\bf{z}}}_{1}^{(n)},{{\bf{z}}}_{2}^{(n)}\in B({{\bf{x}}}^{\ast },r)$ . By equation (3.80), we have

$\begin{eqnarray}&&\delta {\lambda }_{n}=\displaystyle \sum _{i=1}^{n-1}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {\lambda }_{i}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\delta {\lambda }_{i}+\displaystyle \sum _{i=1}^{n-1}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {\mu }_{i}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\delta {\mu }_{i}+\displaystyle \sum _{i=1}^{n}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {{\rm{\Pi }}}^{(i)}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\delta {{\rm{\Pi }}}^{(i)}+\displaystyle \sum _{i=1}^{n}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {M}^{(i)}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\delta {M}^{(i)},\end{eqnarray} \tag{ 3.83 }$

$\begin{eqnarray}&&\delta {\mu }_{n}=\displaystyle \sum _{i=1}^{n-1}{\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {\lambda }_{i}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\delta {\lambda }_{i}+\displaystyle \sum _{i=1}^{n-1}{\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {\mu }_{i}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\delta {\mu }_{i}+\displaystyle \sum _{i=1}^{n}{\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {{\rm{\Pi }}}^{(i)}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\delta {{\rm{\Pi }}}^{(i)}+\displaystyle \sum _{i=1}^{n}{\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {M}^{(i)}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\delta {M}^{(i)}.\end{eqnarray} \tag{ 3.84 }$

Define the matrices

$\begin{eqnarray}{R}_{k}^{(n)}=\left[\begin{array}{cc}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {\lambda }_{k}}\right|}_{{{\bf{z}}}_{1}^{(n)}} & {\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {\mu }_{k}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\\ {\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {\lambda }_{k}}\right|}_{{{\bf{z}}}_{2}^{(n)}} & {\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {\mu }_{k}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\end{array}\right]\end{eqnarray} \tag{ 3.85 }$

and

$\begin{eqnarray}{S}_{k}^{(n)}=\left[\begin{array}{cc}{\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {{\rm{\Pi }}}^{(k)}}\right|}_{{{\bf{z}}}_{1}^{(n)}} & {\left.\displaystyle \frac{\partial {f}_{1}^{(n)}}{\partial {M}^{(k)}}\right|}_{{{\bf{z}}}_{1}^{(n)}}\\ {\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {{\rm{\Pi }}}^{(k)}}\right|}_{{{\bf{z}}}_{2}^{(n)}} & {\left.\displaystyle \frac{\partial {f}_{2}^{(n)}}{\partial {M}^{(k)}}\right|}_{{{\bf{z}}}_{2}^{(n)}}\end{array}\right]\end{eqnarray} \tag{ 3.86 }$

Therefore,

$\begin{eqnarray}&&\delta {\nu }_{n}=\displaystyle \sum _{k=1}^{n-1}{R}_{k}^{(n)}\delta {\nu }_{k}+\displaystyle \sum _{k=1}^{n}{S}_{k}^{(n)}\delta {D}_{k}.\end{eqnarray} \tag{ 3.87 }$

In general, we can repeatedly substitute to get δν_n in terms of ${\{\delta {D}_{k}\}}_{k=1}^{n}$ only:

$\begin{eqnarray}\delta {\nu }_{n} & = & \displaystyle \sum _{j=1}^{n}\left({S}_{j}^{(n)}+\displaystyle \sum _{j\leqslant {k}_{1}\lt n}{S}_{j}^{({k}_{1})}{R}_{{k}_{1}}^{(n)}+\displaystyle \sum _{j\leqslant {k}_{1}\lt {k}_{2}\lt n}{S}_{j}^{({k}_{1})}{R}_{{k}_{1}}^{({k}_{2})}{R}_{{k}_{2}}^{(n)}\,+\ldots \right.\\ & & +\left.\displaystyle \sum _{j\leqslant {k}_{1}\lt \cdots \lt {k}_{i}\lt n}{S}_{j}^{({k}_{1})}{R}_{{k}_{1}}^{({k}_{2})}\cdots {R}_{{k}_{i}}^{(n)}\,+\ldots +\,{S}_{j}^{(j)}{R}_{j}^{(j+1)}\cdots {R}_{n-1}^{(n)}\right)\delta {D}_{j}\end{eqnarray} \tag{ 3.88 }$

By equaton (3.78) we have $\parallel {R}_{k}^{(n)}{\parallel }_{\infty }\leqslant R$ and $\parallel {S}_{k}^{(n)}{\parallel }_{\infty }\leqslant R$ for all 1 ≤ i, j ≤ N. Then, a binomial expansion yields

$\begin{eqnarray}&&| | \delta {\nu }_{n}| {| }_{\infty }\leqslant \displaystyle \sum _{j=1}^{n}R{(1+R)}^{n-j}\parallel \delta {D}_{j}{\parallel }_{\infty }\end{eqnarray} \tag{ 3.89 }$

□

Hence the errors of data introduced at each site will propagate exponentially in its subsequent sites. In other words, at each site n, the error in the birth-death rates (λ_n, μ_n) is the result of accumulating the exponential growth of errors from sites 1 to n − 1.

Corollary 1. Define a constant D such that $D={{\max }}_{1\le j\le N}\parallel \delta {D}_{j}{\parallel }_{{\rm{\infty }}}$ . Then equation (3.79) becomes

$\begin{eqnarray}&&\parallel \delta {\nu }_{n}{\parallel }_{\infty }\leqslant D[{(1+R)}^{n}-1].\end{eqnarray} \tag{ 3.90 }$

In this special case where all errors in data are bounded by D, we also expect to observe the error in birth-death rates grow exponentially.

3.4. Algorithm details

The implementation of the algorithm starts with the inference at site 1 and site 2. In order to keep track of the recurrence relation in lemma 4, we only need to focus on the 'feature vector' defined as ${{\bf{u}}}_{n}={[{\xi }_{1}^{(n)},{\xi }_{2}^{(n)},{\eta }_{2}^{(n)},{\eta }_{3}^{(n)}]}^{T}$ at each site n. For site 1, we have

$\begin{eqnarray}&&{{\bf{u}}}_{1}=\left[\begin{array}{c}{\xi }_{1}^{(1)}\\ {\xi }_{2}^{(1)}\\ {\eta }_{2}^{(1)}\\ {\eta }_{3}^{(1)}\end{array}\right]=\left[\begin{array}{c}{\lambda }_{1}+{\mu }_{1}\\ 1\\ 1\\ 0\end{array}\right]\end{eqnarray} \tag{ 3.91 }$

For site 2, we have

$\begin{eqnarray}&&{{\bf{u}}}_{2}=\left[\begin{array}{c}{\xi }_{1}^{(2)}\\ {\xi }_{2}^{(2)}\\ {\eta }_{2}^{(2)}\\ {\eta }_{3}^{(2)}\end{array}\right]=\left[\begin{array}{c}{\lambda }_{1}{\lambda }_{2}+{\mu }_{1}{\mu }_{2}+{\lambda }_{2}{\mu }_{1}\\ {\lambda }_{1}+{\mu }_{1}+{\lambda }_{2}+{\mu }_{2}\\ {\lambda }_{2}+{\mu }_{2}\\ 1\end{array}\right]\end{eqnarray} \tag{ 3.92 }$

Then we can define the matrix

$\begin{eqnarray}Z=\left[\begin{array}{cccc}{{\bf{u}}}_{1} & {{\bf{u}}}_{2} & \cdots & {{\bf{u}}}_{n}\end{array}\right]=\left[\begin{array}{cccc}{\xi }_{1}^{(1)} & {\xi }_{1}^{(2)} & \cdots & {\xi }_{1}^{(n)}\\ {\xi }_{2}^{(1)} & {\xi }_{2}^{(2)} & \cdots & {\xi }_{2}^{(n)}\\ {\eta }_{2}^{(1)} & {\eta }_{2}^{(2)} & \cdots & {\eta }_{2}^{(n)}\\ {\eta }_{3}^{(1)} & {\eta }_{3}^{(2)} & \cdots & {\eta }_{3}^{(n)}\end{array}\right]\end{eqnarray} \tag{ 3.93 }$

where the j-th column is the feature vector at site j.

At each step j ≥ 3, we need to solve linear system defined by (3.68) and (3.69) to obtain [λ_j, μ_j], and update the j-th column of Z by lemma 4. Details are listed in algorithm 1.

Algorithm 1. Inference of birth and death rates up to site N in a birth death chain

Input: An array of extinction times $\{{\tau }_{j}\}$ $\{{\tau }_{j}\}$ along with maximal sites {n_j} from repeated

simulation of a birth death process (j = 1, ..., L).

Initialize: For $k=1\text{to}N$ $k=1\text{to}N$ , find all τ_j such that n_j = k. Compute number of extinctions

as a fraction of L (Π^(k)) and the mean of the extinction times (M^(k)). Initialize Z as a

4 × N zero matrix.

1: At site 1, $\left[\begin{array}{c}{\lambda }_{1}\\ {\mu }_{1}\end{array}\right]=\left[\begin{array}{c}(1-{{\rm{\Pi }}}^{(1)})/{M}^{(1)}\\ {{\rm{\Pi }}}^{(1)}/{M}^{(1)}\end{array}\right]$ $\left[\begin{array}{c}{\lambda }_{1}\\ {\mu }_{1}\end{array}\right]=\left[\begin{array}{c}(1-{{\rm{\Pi }}}^{(1)})/{M}^{(1)}\\ {{\rm{\Pi }}}^{(1)}/{M}^{(1)}\end{array}\right]$ , as in (3.8), (3.9). Feature vector ${Z}_{:,1}={{\bf{u}}}_{1}$ ${Z}_{:,1}={{\bf{u}}}_{1}$ is defined

in (3.91).

2: if N == 1 then return {μ₁, λ₁}

3: end if

4: At site 2, $\left[\begin{array}{c}{\lambda }_{2}\\ {\mu }_{2}\end{array}\right]={\left[\begin{array}{cc}{r}^{(2)}({\lambda }_{1}+{\mu }_{1})-1 & {{\rm{\Pi }}}^{(2)}-1\\ 1-{M}^{(2)}({\lambda }_{1}+{\mu }_{1}) & 1-{M}^{(2)}{\mu }_{1}\end{array}\right]}^{-1}\left[\begin{array}{c}0\\ 1/{r}^{(2)}-{\lambda }_{1}-{\mu }_{1}\end{array}\right]$ $\left[\begin{array}{c}{\lambda }_{2}\\ {\mu }_{2}\end{array}\right]={\left[\begin{array}{cc}{r}^{(2)}({\lambda }_{1}+{\mu }_{1})-1 & {{\rm{\Pi }}}^{(2)}-1\\ 1-{M}^{(2)}({\lambda }_{1}+{\mu }_{1}) & 1-{M}^{(2)}{\mu }_{1}\end{array}\right]}^{-1}\left[\begin{array}{c}0\\ 1/{r}^{(2)}-{\lambda }_{1}-{\mu }_{1}\end{array}\right]$ , see

(3.19) (3.20). Feature vector ${Z}_{:,2}={{\bf{u}}}_{2}$ ${Z}_{:,2}={{\bf{u}}}_{2}$ is defined as (3.92).

5: for j = 3 : N do

6: Compute ${V}_{1}^{(j)}$ ${V}_{1}^{(j)}$ and ${V}_{2}^{(j)}$ ${V}_{2}^{(j)}$ as in (3.51).

7: Form the matrices ${F}^{(j)}$ ${F}^{(j)}$ and ${G}^{(j)}$ ${G}^{(j)}$ as in (3.68) and (3.69).

8: Solve ${[{\lambda }_{j},{\mu }_{j}]}^{T}={({F}^{(j)})}^{-1}{G}^{(j)}$ ${[{\lambda }_{j},{\mu }_{j}]}^{T}={({F}^{(j)})}^{-1}{G}^{(j)}$ .

9: Update ${Z}_{:,j}={V}_{2}^{(j)}\left[\begin{array}{c}{\lambda }_{j}+{\mu }_{j}\\ {\mu }_{j}\end{array}\right]+\left[\begin{array}{c}0\\ {Z}_{1,j-1}\\ 0\\ {Z}_{3,j-1}\end{array}\right].$ ${Z}_{:,j}={V}_{2}^{(j)}\left[\begin{array}{c}{\lambda }_{j}+{\mu }_{j}\\ {\mu }_{j}\end{array}\right]+\left[\begin{array}{c}0\\ {Z}_{1,j-1}\\ 0\\ {Z}_{3,j-1}\end{array}\right].$

10: if j == N then

11: ${\lambda }_{j}=0$ ${\lambda }_{j}=0$

12: end if

13: end for

Output: {μ₁, ..., μ_N} and $\{{\lambda }_{1},\,\ldots ,\,{\lambda }_{N-1}\}$ $\{{\lambda }_{1},\,\ldots ,\,{\lambda }_{N-1}\}$

4. Numerical results

Below we reconstruct the transition rates for proteins with N folding/unfolding domains. When all domains are folded, the reaction coordinate is 0 and if all domains are unfolded, the reaction coordinate is N. Every domain that unfolds (folds) increases (decreases) the reaction coordinate by an integer.

Example 1. Protein with N = 4 domains. First we present a simple result of a birth-death chain with only 5 states and some pre-determined rates. All the rates are about the same order of magnitude, see figure 5.

The extinction time data is generated by simulating the birth-death process with these given rates, and we evaluate the reconstruction in comparison to them. With about 5 × 10⁶ extinction times, we can infer the rates to very good accuracy—about 0.11% and 0.37% relative errors in $\lambda$ and $\mu$ , respectively. First, the sum of birth and death rates λ₁ + μ₁ follows from the mean extinction time conditioned on immediate exit through equation (3.5). The fraction of times corresponding to immediate exit then yields μ₁ and λ₁ separately through (3.8) and (3.9). Next, λ₂ and μ₂ are computed by (3.19) (3.20). Then for n = 3,4, we compute the n-th columns of matrix Z as in (3.93) and obtain {λ_n, μ_n} simultaneously. Notice that the last birth rate λ_N is always assumed to be zero and no inference is necessary at that site.

**Figure 5.** Bar plots of the inference results in a 5-site birth death chain. The top subplot (a) contains rates for μ_k and bottom subplot (b) for λ_k. The bars in dark blue represent numerically approximated rates, and yellow bars stand for exact rates. On top of each bar is the value associated with it.
Download figure:
Standard image High-resolution image

Example 2. Protein with N = 10 domains. We now test our reconstruction algorithm on a longer, 11-site chain. In this result, 5 × 10⁷ extinction times are simulated from an 11-site birth-death chain. Following the same steps, we have the inference results with a relative error in $\lambda$ and $\mu$ to be 3.29% and 3.71%, respectively: see figure 6.

**Figure 6.** Bar plots of the inference results in a 11-site birth death chain. The top subplot (a) contains rates for μ_k and bottom subplot (b) for λ_k. The bars in dark blue represent numerically approximated rates, and yellow bars stand for exact rates.
Download figure:
Standard image High-resolution image

Example 3. Protein with N = 10 domains and a single high activation energy barrier. This birth-death chain has a 'bottleneck' between states 3 and 4 so that the transition rates between these two configurations are much smaller than the others: see figure 7. The accuracy of the transition rates of sites after the bottleneck are more affected than the accuracy before the bottleneck, due to the fact that all inferences of sites after the bottleneck depend on the extinction times generated by particles that have gone through the bottleneck. With about 5 × 10⁷ extinction times, there are 1.17% and 1.24% relative errors in $\lambda$ and $\mu$ , respectively. The effect of the bottleneck is evident from the data since we see an abrupt increase in ${M}^{(n)}$ as n goes from 3 to 4.

**Figure 7.** Bar plots of the inference results in a 9-site birth death chain with a bottleneck at site number 4. The top subplot (a) contains rates for *μ_k* and bottom subplot (b) for *λ_k*. The bars in dark blue represent numerically approximated rates, and yellow bars stand for exact rates.
Download figure:
Standard image High-resolution image

Example 4. Protein with N = 8 domains and a single low activation energy barrier. This protein has a very low barrier between states 5 and 6; the transition rates between these sites are very large. From an AFM trace, it may be difficult to distinguish or separate these two configurations. However, given 5 × 10⁷ simulated data, the birth and death rates can be inferred within a reasonable amount of accuracy, and the overall relative error is 2.21% in λ_k and 2.24% in μ_k. See figure 8.

**Figure 8.** Bar plots of the inference results in a 9-site birth death chain with extreme rates. The top subplot (a) contains rates for *μ_k* and bottom subplot (b) for *λ_k*. The bars in dark blue represent numerically approximated rates, and yellow bars stand for exact rates.
Download figure:
Standard image High-resolution image

Example 5. 11-site birth death chain error propagation. In this example, we show how error propagates with site number. The birth and death rates are all equal to 1, and 1 × 10⁸ extinction times are generated from the Monte Carlo simulation. Bootstrap is done by taking random samples of size 5 × 10⁶ from these extinction times and errors at each site are computed. This resampling procedure is repeated 50 times, and figures 9 and 10 display the mean error and 95% confidence intervals from the above 50 samples, where it is clear that errors for both λ_k and μ_k increase exponentially with site number, as a result of theorem 2. Although this example uses noisy data, this exponential increase with site number is not surprising in light of theorem 2.

**Figure 9.** Error plot for *λ_k* at each site, taken as the average of 50 random samples of 1 × 10⁸ extinction times.
Download figure:
Standard image High-resolution image

**Figure 10.** Error plot for *μ_k* at each site, taken as the average of 50 random samples of 1 × 10⁸ extinction times.
Download figure:
Standard image High-resolution image

Example 6. Minimum number of extinction times required. Finally, we consider the minimum number of extinction times required for a birth death chain of length N + 1 such that relative error for both λ_k and μ_k is below 15%. For each N, 50 bootstrap samples of a fixed size are selected and the average relative errors are computed from these samples. This procedure is then repeated for different sample sizes, from which we can estimate the minimum number of extinction times required for the chain of length N + 1 to have error below 15%. Note that this estimation applies to the chain where all birth-death rates are about the same order of magnitude. Detail is in table 1, and figure 11. It is obvious from the plot that minimum number of extinction times required is increasing exponentially with the length of chain.

**Figure 11.** Number of extinction times required for rates to have relative error below 15%. The best-fit exponential curve is shown in red.
Download figure:
Standard image High-resolution image

Table 1. Number of extinction times required for rates to have relative error below 15%, on chains of different lengths. The birth-death rates all have the same order of magnitude.

N	(Estimated) Minimum number of ETs
1	5.0 × 10¹
2	1.5 × 10²
3	5.0 × 10²
4	1.0 × 10⁴
5	1.0 × 10⁵
6	3.0 × 10⁵
7	2.0 × 10⁶
8	4.0 × 10⁶
9	7.0 × 10⁶

5. Conclusions

In summary, we have presented a method for extracting the kinetic rates of large proteins, with multiple folding domains, from extinction times (when all domains have folded) and 'maximal sites' (the maximum number of unfolded domains before extinction). Both of these quantities can, in principle, be computed from AFM time traces.

The inference relies on the recurrence relation specified in lemma 4 and starts with base cases when n = 1, 2, and inference of each subsequent site depends on its previous sites by solving a linear system. If the data ${{\rm{\Pi }}}^{(n)}$ and ${M}^{(n)}$ are exactly given, meaning that they correspond to the statistics of an underlying birth-death process, then the birth-death rates are uniquely determined, and we proved that a small perturbation in site 1 will propagate exponentially throughout the chain, given that the first derivatives of ${f}_{1}^{(n)}$ and ${f}_{2}^{(n)}$ in theorem 2 are bounded near the exact solution. With sufficient data (about 50 million for a chain of length 8–12), the method can compute these rates to very good accuracy and is capable of detecting bottlenecks or extreme values in the chain. In general, the number of extinction times one needs to obtain reasonable results grows exponentially with the total length of the chain.

There still remain many theoretical challenges to interpreting single-molecule AFM data. For example, inference from extinction times in the 'transmission' problem where absorption/extinction is at site N rather than site 0 is severely ill-posed. Can some aspect of time-trace data be used to better-condition this problem? Also, bifurcations or loops in the birth-death chain representing multiple pathways to a final unfolded state have not yet been explored. Finally, it remains to be seen if our method can be adapted to force-ramp data, or how to proceed if transition times between metastable configurations are not exponentially distributed.

Appendix

There is an alternative way to infer the birth death rates, instead of the recurrence relations. This method is simple, yet more computationally intensive if number of sites is large.

In light of lemma 4, we can assume the following forms:

$\begin{eqnarray}&&{\xi }_{i}^{(n)}={C}_{i1}^{(n)}{\lambda }_{n}+{C}_{i2}^{(n)}{\mu }_{n}+{C}_{i3}^{(n)},\quad i=1,2,\end{eqnarray} \tag{ A.1 }$

$\begin{eqnarray}&&{\eta }_{j}^{(n)}={\hat{C}}_{j1}^{(n)}{\lambda }_{n}+{\hat{C}}_{j2}^{(n)}{\mu }_{n}+{\hat{C}}_{j3}^{(n)},\quad j=2,3,\end{eqnarray} \tag{ A.2 }$

where ${C}^{(n)}$ and ${\hat{C}}^{(n)}$ are coefficients that depend on ${\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1}$ , ${\mu }_{1},\,\ldots ,\,{\mu }_{n-1};$ we describe how to compute ${C}^{(n)}$ and ${\hat{C}}^{(n)}$ in section A.1. If we plug (A.1) and (A.2) into (3.40) and (3.41), and denote ${r}^{(n)}={{\rm{\Pi }}}^{(n)}/{\mu }_{1}$ , then we get a linear system in two variables $({\lambda }_{n},{\mu }_{n})$ :

$\begin{eqnarray}&&{F}^{(n)}\left[\begin{array}{c}{\lambda }_{n}\\ {\mu }_{n}\end{array}\right]={G}^{(n)}\end{eqnarray} \tag{ A.3 }$

where the 2 × 2 matrix F⁽ⁿ⁾ and 2-vector G⁽ⁿ⁾ are defined as

$\begin{eqnarray}{F}^{(n)}=\left[\begin{array}{cc}{r}^{(n)}{C}_{11}^{(n)}-{\hat{C}}_{21}^{(n)} & {r}^{(n)}{C}_{12}^{(n)}-{\hat{C}}_{22}^{(n)}\\ {r}^{(n)}({C}_{21}^{(n)}-{C}_{11}^{(n)}{M}^{(n)})-{\hat{C}}_{31}^{(n)} & {r}^{(n)}({C}_{22}^{(n)}-{C}_{12}^{(n)}{M}^{(n)})-{\hat{C}}_{32}^{(n)}\end{array}\right],\end{eqnarray} \tag{ A.4 }$

$\begin{eqnarray}&&{G}^{(n)}=\left[\begin{array}{c}{\hat{C}}_{23}^{(n)}-{r}^{(n)}{C}_{13}^{(n)}\\ {\hat{C}}_{33}^{(n)}-{r}^{(n)}({C}_{23}^{(n)}-{C}_{13}^{(n)}{M}^{(n)})\end{array}\right].\end{eqnarray} \tag{ A.5 }$

This matrix is consistent with equations (3.70)–(3.73). The rates at current site are therefore given by ${[{\lambda }_{n},{\mu }_{n}]}^{T}={({F}^{(n)})}^{-1}{G}^{(n)}$ .

A.1. Leverrier-faddeev algorithm

The method for computing the coefficient matrices ${C}^{(n)}$ and ${\hat{C}}^{(n)}$ is the Leverrier-Faddeev (L-F) algorithm [36]. The L-F algorithm computes all coefficients of the characteristic polynomial for a given n × n square matrix with time complexity ${ \mathcal O }({n}^{4})$ . In this problem, it suffices to get only the last three coefficients ${\xi }_{3}^{(n)}$ , ${\xi }_{2}^{(n)}$ and ${\xi }_{1}^{(n)}$ . Given the expression in (A.1), we have

$\begin{eqnarray}&&\left\{\begin{array}{l}{C}_{i1}^{(n)}={\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},1;{\mu }_{1},\,\ldots ,\,{\mu }_{n})-{\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},0;{\mu }_{1},\,\ldots ,\,{\mu }_{n})\\ {C}_{i2}^{(n)}={\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n};{\mu }_{1},\,\ldots ,\,{\mu }_{n-1},1)-{\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n};{\mu }_{1},\,\ldots ,\,{\mu }_{n-1},0)\\ {C}_{i3}^{(n)}={\xi }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n}=0;{\mu }_{1},\,\ldots ,\,{\mu }_{n}=0)\end{array}\right.\end{eqnarray} \tag{ A.6 }$

for i = 1, 2. With (A.2), we also have

$\begin{eqnarray}&&\left\{\begin{array}{l}{\hat{C}}_{i1}^{(n)}={\eta }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},1;{\mu }_{1},\,\ldots ,\,{\mu }_{n})-{\eta }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n-1},0;{\mu }_{1},\,\ldots ,\,{\mu }_{n})\\ {\hat{C}}_{i2}^{(n)}={\eta }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n};{\mu }_{1},\,\ldots ,\,{\mu }_{n-1},1)-{\eta }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n};{\mu }_{1},\,\ldots ,\,{\mu }_{n-1},0)\\ {\hat{C}}_{i3}^{(n)}={\eta }_{i}^{(n)}({\lambda }_{1},\,\ldots ,\,{\lambda }_{n}=0;{\mu }_{1},\,\ldots ,\,{\mu }_{n}=0)\end{array}\right.\end{eqnarray} \tag{ A.7 }$

for j = 2, 3. A concise and straightforward algorithm containing all steps is presented in algorithm 2.

Algorithm 2. Inference of birth and death rates up to site N with L-F algorithm

Input: An array of extinction times T along with maximal site of repeated simulation of a

birth death process.

Initialize: Compute the conditional probabilities of left exit $\{{{\rm{\Pi }}}^{(1)},\,\ldots ,\,{{\rm{\Pi }}}^{(N)}\}$ $\{{{\rm{\Pi }}}^{(1)},\,\ldots ,\,{{\rm{\Pi }}}^{(N)}\}$ , and mean of

conditional extinction times $\{{M}^{(1)},\,\ldots ,\,{M}^{(N)}\}$ $\{{M}^{(1)},\,\ldots ,\,{M}^{(N)}\}$ .

1: At site 1, ${\mu }_{1}={{\rm{\Pi }}}^{(1)}/{M}^{(1)}$ ${\mu }_{1}={{\rm{\Pi }}}^{(1)}/{M}^{(1)}$ and ${\lambda }_{1}=(1-{{\rm{\Pi }}}^{(1)})/{M}^{(1)}$ ${\lambda }_{1}=(1-{{\rm{\Pi }}}^{(1)})/{M}^{(1)}$ .

2: for j = 2 : N do

3: Compute the coefficients of the characteristic polynomial of ${A}^{(j)}$ ${A}^{(j)}$ to obtain C and $\hat{C}$ $\hat{C}$ ,

see section A.1;

4: Form the constraint matrices ${F}^{(j)}$ ${F}^{(j)}$ and ${G}^{(j)}$ ${G}^{(j)}$ as in (A.4) and (A.5);

5: Solve ${[{\lambda }_{j},{\mu }_{j}]}^{T}={({F}^{(j)})}^{-1}{G}^{(j)}$ ${[{\lambda }_{j},{\mu }_{j}]}^{T}={({F}^{(j)})}^{-1}{G}^{(j)}$ .

6: if j == N then

7: ${\lambda }_{j}=0$ ${\lambda }_{j}=0$

8: end if

9: end for

Output: {μ₁, ..., μ_N} and {λ₁, ..., λ_N−1}

Folding kinetics of proteins with multiple domains: inference of transition rates from extinction times

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Energy landscape inference: existing methods

2. Governing equations for the birth death process

2.1. Extinction times and probability fluxes

3. Algorithm for reconstructing transition rates

3.1. Inference of μ₁ and λ₁

3.2. Inference of μ₂ and λ₂

3.3. Inference of μ_n and λ_n for n ≥ 3

3.4. Algorithm details

4. Numerical results

5. Conclusions

Appendix

A.1. Leverrier-faddeev algorithm

Folding kinetics of proteins with multiple domains: inference of transition rates from extinction times

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Energy landscape inference: existing methods

2. Governing equations for the birth death process

2.1. Extinction times and probability fluxes

3. Algorithm for reconstructing transition rates

3.1. Inference of μ1 and λ1

3.2. Inference of μ2 and λ2

3.3. Inference of μn and λn for n ≥ 3

3.4. Algorithm details

4. Numerical results

5. Conclusions

Appendix

A.1. Leverrier-faddeev algorithm

3.1. Inference of μ₁ and λ₁

3.2. Inference of μ₂ and λ₂

3.3. Inference of μ_n and λ_n for n ≥ 3