Quantifying metastatic inefficiency: rare genotypes versus rare dynamics

We introduce and solve a ‘null model’ of stochastic metastatic colonization. The model is described by a single parameter θ: the ratio of the rate of cell division to the rate of cell death for a disseminated tumour cell in a given secondary tissue environment. We are primarily interested in the case in which colonizing cells are poorly adapted for proliferation in the local tissue environment, so that cell death is more likely than cell division, i.e. θ < 1 ?> . We quantify the rare event statistics for the successful establishment of a metastatic colony of size N. For N ≫ 1 ?> , we find that the probability of establishment is exponentially rare, as expected, and yet the mean time for such rare events is of the form ∼ log ( N ) / ( 1 − θ ) ?> while the standard deviation of colonization times is ∼ 1 / ( 1 − θ ) ?> . Thus, counter to naive expectation, for θ < 1 ?> , the average time for establishment of successful metastatic colonies decreases with decreasing cell fitness, and colonies seeded from lower fitness cells show less stochastic variation in their growth. These results indicate that metastatic growth from poorly adapted cells is rare, exponentially explosive and essentially deterministic. These statements are brought into sharper focus by the finding that the temporal statistics of the early stages of metastatic colonization from low-fitness cells ( θ < 1 ?> ) are statistically indistinguishable from those initiated from high-fitness cells ( θ > 1 ?> ), i.e. the statistics show a duality mapping ( 1 − θ ) → ( θ − 1 ) ?> . We conclude our analysis with a study of heterogeneity in the fitness of colonising cells, and describe a phase diagram delineating parameter regions in which metastatic colonization is dominated either by low or high fitness cells, showing that both are plausible given our current knowledge of physiological conditions in human cancer.


Introduction
Cancer metastasis is the process of dissemination of cancer cells originating from a solid tumour, and their subsequent colonization of distant tissues [1,2]. Metastatic disease is responsible for the great majority of cancer deaths. It is generally considered to be a succession of unlikely events rendering in toto a highly inefficient process [3][4][5]. In the conventional view of metastasis, only certain cancer cells from the primary tumour can successfully metastasise as they are thought to require a number of attributes, such as the ability to invade local tissue, enter, survive in, and leave the bloodstream, and colonize new tissue environments. As such, they have been likened to decathletes, in that they possess multiple pre-adapted abilities, all of which are required to form a new colony in a distant tissue [6]. This view explains metastatic inefficiency, as it highlights the low probability of Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. acquiring all the specific genetic mutations necessary to achieve the intermediate goals en route to successful metastasis [7,8]. However, as pointed out by Bernards and Weinberg [9], this picture contains a conceptual difficulty concerning the microevolution of such cells. The acquisition of these abilities would appear to endow the cells with no selective advantage in the primary tumour, and, as such, one might expect the sub-population of fully-fledged metastatic cells to be vanishingly small. The resolution put forward by Bernards and Weinberg posits pleiotropy, namely, that genetic mutations enabling metastasis might also provide selective advantage in the primary tumour, but this view is then hard to reconcile with metastatic inefficiency. An additional and important piece of information is the observation of large numbers of circulating tumour cells (CTCs) and disseminated tumour cells (DTCs) in cancer patients with no discernible metastatic disease [10][11][12], which provides a challenging backdrop in which to understand metastatic inefficiency. Clearly, there is scope for a more precise and quantitatively based understanding of metastasis, the first steps of which we hope to provide in this paper.
Given the large number of CTCs in the blood stream and DTCs in the bone marrow and other secondary sites of cancer patients, one can posit a large number of individual attempts at metastatic colonization occurring over time, most of which are unsuccessful; so that the inefficiency of metastasis can be approached as a problem in the statistics of rare events. For the purposes of this paper, we delineate such rare events into two types. First, rare genotypes, as considered in the mainstream paradigm; it being assumed that there exists only a minute subset of colonising cancer cells capable of successful metastasis. Second, rare dynamics, which is the concept we wish to explore in depth in this paper; in which we posit that highly unlikely colonization dynamics can arise from the statistical fluctuations of birth and death of otherwise poorly adapted seeding cells. A helpful metaphor to understand the difference between these two types of rare events is the use of special forces versus infantry to achieve a military mission. 'Rare genotypes' corresponds to a highly trained yet small group of soldiers who each have the specialized skills to accomplish the mission. 'Rare dynamics' corresponds to a large number of poorly trained infantry, none of whom have the specialized skills required, but each of whom has a very small but non-zero chance of accomplishing the mission. We will study these rare events in the context of the final stage of metastasis, that of tissue colonization, which is arguably the most poorly understood step of metastasis, and yet observed to be the highest barrier for DTCs to overcome in forming a new tumour [2,4,5]; in mouse models it is observed that the vast majority of metastatic cells readily die once they reach a distant tissue site and attempt to colonize it [13]. We propose a simple birth-death model of stochastic colonization which will allow us to quantify the temporal statistics of rare dynamics. The results arising from this model are counterintuitive at first glance, and provide a fascinating duality between colonization arising from rare genotypes and rare dynamics. We then proceed to quantify the relative likelihood of colonization from these two different sources, based on human cancer data to hand, and show plausibility that metastatic colonization can equally well arise purely by chance, not requiring mutations of key genes.

Methods
We are interested in the probability and dynamics of metastatic colonization, namely, the process by which a DTC, having successfully become located in a secondary tissue environment, begins proliferation to attempt to create a new tumour (a micro-metastasis). We will approach this problem as a stochastic rather than a deterministic process, namely, we ascribe to the founding cell a rate (probability per unit time) of cell death μ, and, similarly a rate of cell division r. The key parameter in the model is the ratio of these two rates, which we denote by θ μ = r / , and which we term 'fitness'. It is very important to understand that the fitness is a property of a given cell in a given environment (i.e. not a property that can be ascribed to a cell per se). Later in our analysis, we will consider heterogeneity of fitness in the DTC population in order to discuss how our model relates to Pagetʼs classical seed and soil hypothesis [6], which is used to explain to correlation of primary and secondary sites across different cancer types. Given that we are interested in the early stages of secondary tumour growth, we can assume that the birth and death rates are independent of the number of cells in the colony (i.e. the growth and death rates are density-independent). Thus, the stochastic process defining our model is a linear birth/death process [14]. We assume that the realizations of this process, i.e. the seeding events occurring in secondary sites by DTCs, are statistically independent. We also assume that progeny cells have the same fitness as the founding cell. This assumption will clearly break down, with interesting consequences, for a long-lived colony of cells, but will be a good approximation for the nascent micro-tumours of interest here. We note here that there is prior work on applying stochastic processes to tumour growth, but these have all been concerned with growth arising from high fitness cells in a microevolutionary framework [15][16][17].
The fitness parameter θ encapsulates all the factors affecting the proliferative potential of single cells in a particular microenvironment (vulnerability to apoptosis, interaction with stroma and immune system, etc). For cases when θ > 1, a founding DTC is more likely to divide than die, as are its progeny, and so it is likely that a tumour will arise, following an approximately exponential growth trajectory. Conversely, if θ < 1 cells tend to die more often than dividing, and the overwhelmingly most likely outcome of each seeding is that the nascent secondary tumour will rapidly succumb to extinction. However, rare events can occur in which a nascent colony can achieve a critical size, which, as we describe shortly, we define to be quasi-stable. A primary concern in this paper is to calculate the absolute probability and associated temporal statistics of these rare dynamical events.
If cells remain poorly adapted to their environment, with θ < 1, every attempt to form a sustainable tumour will ultimately fail given long enough time, since zero cells is the only absorbing state for this stochastic process, as described. Biologically, there are at least two ways in which a fragile tumour arising from a low-fitness founding cell can evade extinction. First, by reaching a critical size N that affords a less hostile internal microenvironment to the cells. In a tumour of this moderate size, cells internal to the tumour, which are not in direct contact with the hostile microenvironment, will have a higher effective fitness, θ′ > 1, rendering the tumour quasistable and allowing essentially deterministic growth thereafter. Second, given that cancer cells are genetically unstable, each division event carries a chance to produce daughter cells of a slightly different genotype, which after several generations provides the possibility, through microevolution, of adapted cells arising in the tissue niche. In either case, there is a critical threshold that new colonies have to surpass in order to become stable. The first (microenvironment) case requires a critical colony size N to be reached, while the second (microevolution) case requires a critical number of division events to allow significant diversity to arise and thus the possibility for adaptation. We shall address here the first case, assuming cells to retain the fitness of the founding cells, which already provides rich and interesting results. The statistical properties of the second case are more complicated and are not within the purview of this paper. In brief, the basic assumptions of our birth/death model are: • All colonization attempts start with single DTCs and are independent of each another. • All cells within a nascent tumour have the same fitness.
Dynamical adaptation and microevolution are not considered. • Cell divisions and deaths occur stochastically according to Poisson statistics. • Only small nascent metastases, not limited by nutrients, are considered. As such, the rates of cell division and death are assumed independent of the cluster size. • For metastases arising from a low-fitness cell, there is a critical size N at which nascent tumours render a new internal microenvironment, shielding inside cells from the deleterious effects of the external tissue, and stabilising the stochastic growth.
These assumptions are sufficient for us to formulate this model as a stochastic process which can be analysed using a linear master equation, as described below in some detail. At first glance one would imagine that a simple model such as this would have been studied in detail decades ago. To our knowledge this is not the case, although we note that Basan et al [18] studied a non-linear version of this model in a biomechanical study of metastatic growth, wherein forces within the host tissue act to oppose tumour growth until a critical mass is achieved. Similar models of stochastic colonization have been studied in island ecology, with the biological assumption that new individuals are pre-adapted to the new habitat (θ > 1), and thus most likely proliferate [19]. The interest of such studies resides in calculating the statistics of small population size extinctions wiping out strong, but fledgling colonies. This reasoning has also been applied to stochastic re-colonizations of pre-adapted metastatic cancer stem cells during treatment cycles [17].
We use well-known analytical methods of stochastic process theory [14] to construct and solve our model. Solutions for temporal statistics of interest can be calculated exactly, albeit after lengthy calculations, using generating functions and Laplace transform techniques. Furthermore, our results are fully supported by numerical calculations and simulations. These simulations provide us with explicit rare event realizations that are helpful in guiding our understanding of some of the rather counterintuitive results that we report.
First, we consider the colonization process arising from a population of DTCs with the same fitness θ. These results will be used later when considering heterogeneity. We define the function P n t ( , ) to be the probability that the nascent tumour has n cells at time t, given that there existed a single cell at time t = 0. This probability increases when a division event occurs in the − n ( 1)-state or a cell death event occurs in the + n ( 1)-state. Likewise, P n t ( , ) will decrease when division or death events occur in the n-state. The rate of change of this probability function is given by the master equation: defined for ⩾ n 0 with the understanding that and the single-seeding initial condition δ = P n ( , 0) n,1 . We have scaled time by the cell death rate μ for convenience.
In the terminology of stochastic process theory, n = 0 is an absorbing state. Since our primary interest is in low-fitness cells (θ < 1), and we assume the existence of a critical size N, we will study the statistics of first passage by imposing a second absorbing state at n = N [20]. This is accomplished by limiting equation (1) to ⩽ ⩽ − n N 0 ( 2), and disallowing death events from the state n = N: It is helpful to introduce δ Q t t ( ) N as the probability of reaching the critical size N in the time interval δ + t t t ( , ), from which we have For fitness values θ > 1 the existence of N is of little biological relevance, but one can still ask the corresponding statistical question: what are the first-passage temporal statistics for colonies arising from pre-adapted cells reaching size N? As such, we will use the same model to discuss nascent tumour growth for colonies arising from both low and high fitness founding DTCs, since the details of the calculations are insensitive to the size of θ. Solving the + N ( 1) differential equations (1)-(4) yields P N (t) and Q N (t), from which all statistics of successful colonization can be derived. In particular we are interested in the following three quantities: (i) The probability of successful establishment of a colony of size N: The average time to reach size N in the subset of successful realizations: The variance in the time to reach size N in the subset of successful realizations: The Laplace transform method is used to solve equations (1)-(4) in order to obtain Q N (t), and then calculate these statistics. Because our primary interest is in non-preadapted cells (θ < 1) we shall present some results in a form that it is most appropriate for this case, but the derivation and general results are valid for arbitrary θ values.

Probability of establishment
We provide details of the calculation of various statistical quantities, starting with the probability of establishment of a micrometastasis of size N. By definingP s ( ) n to be the Laplace transform of P n (t), the master equations and boundary conditions take the form of the following + N 1 algebraic equations:ˆ=ŝ Equations (11)-(14) form a closed set of − N 1 equations and can be solved explicitly.
We now define the set of functions H s { ( )} n via the recursion relation  (17) and eliminatingˆ− P N 2 using (14) and then (5) we obtain a simple expression forQ s ( ) It is subsequently convenient here to work with the set of and using (16) we get the recursion relation: Now, using the definition of the Laplace transform, the integrals of Q N (t) with weights 1, t, and t 2 are respectively given byQ (0) Therefore the probability of successful establishment (equation (6)) is Substituting in (22) we have our first main result, viz. the probability of establishment is given by:

Mean and variance of colonization times
The temporal statistics can be obtained in a similar manner. From (21) we have that Differentiating (20) with respect to s and setting s = 0 one finds n n n 1 2 n Solving by iteration for ′ J (0) n and using equation (27) gives an exact solution for the average time of colonization: Following the same procedure, in solving by iteration for ″ J (0) n , one finds after a lengthy calculation an exact solution for the variance in the time of colonization:

Asymptotic results for large N
We are interested in micrometastases of moderate size, so it is useful to extract the asymptotic form of the temporal statistics for ≫ N 1. Equation (26) gives the probability of establishment of a tumour of size N for any fitness θ. For θ < 1, but not too close to unity (i.e. θ − ≫ N 1 1 / ), this equation takes the simple exponential form: Quite intuitively, the probability of establishment decreases exponentially for increasing N. For example, a micro-tumour of 20 cells with fitness of θ = 0.5 has a probability of forming ∼ ∼ − P (0.5) 10 20 20 6 . Hence, one in a million progenitor cells will achieve this size for this fitness level.
The expressions for the average and variance in colonization times are too complicated to easily infer the dependence on the fitness θ and critical size N. By recasting the summations as integrals and performing an asymptotic analysis for ≫ N 1 it is possible to extract the leading behaviour, which turns out to be quite simple. In these calculations, it is convenient to use the integral representation in order to explicitly perform the summations, which have the form of a geometric series. For the average colonization time we find, for θ < 1 and not too close to unity, . is Eulerʼs constant [21]. Because of the logarithmic dependence of the average colonization time on N, we can rewrite this expression as a simple exponential growth law: eff . Thus, we find that the average growth dynamics is exponential with a rate anti-correlated to the fitness. In other words, the less fit the cells are, the more aggressive the colonization dynamics of (rare) successful tumours will appear to be. On first reading, this result is counter-intuitive. One might imagine that the rare dynamics leading to colonies of low-fitness cells would be slow growing, relying on repeated chance avoidance of extinction. Despite the huge combinatorial weight of such meandering trajectories, the statistically relevant successful trajectories are those which grow rapidly, relying on repeated birth events and a relative absence of death events. This extreme bias becomes ever more important as θ decreases, hence the more rapid successful trajectories for cells of lower fitness.
Similar counter-intuitive results of this type are known in disparate fields, such as the 'bold play' strategy in gambling (successive 'all or nothing' gambling on high odds games is an optimal strategy for rare but large wins) [22], and transition rate theory in chemistry to quantify spontaneous dissociation of chemical bonds (stronger bonds break rarely but explosively). The Freidlin-Wentzell theory of large deviations provides a general mathematical framework to further understand such rare event dynamics, including the timereversal duality discussed below [23]. We are obliged to ask whether this average time is a meaningful statistical measure. Perhaps the fluctuations in successful realizations are so great that the average time is a statistical oddity, with little bearing on the true nature of the dynamics. The variance of the distribution is informative in this regard.
Performing an asymptotic analysis of equation (30), and after a lengthy calculation, we find that the variance in colonization times is given by where the Riemann Zeta function ζ π = (2) /6 2 [21]. Here we see that for large N the variance is independent of the critical tumour size and decreases with θ. This means that successful colonization trajectories are effectively more deterministic (i.e. less noisy) for lower fitness cells than for higher fitness cells. This is consistent with the preceding discussion, in that the statistically dominant growth trajectories of low-fitness cells are strongly biased to repetitive birth events, and will thus have a low degree of statistical variation.
In the limit of extremely low fitness, θ → 0, it is relatively straightforward to calculate the entire distribution of colonization times for ≫ N 1, through a general solution of the generating equation (20). One finds that the colonization times are distributed according to the Gumbel distribution (well-known in extreme value statistics), centred at * = t N log (which, according to equation (33), is also the average colonization time in this limit). Explicitly, one finds for the distribution function of colonization times: This result suggests that the colonization time is drawn as an extreme value in samples of random variables with exponential distribution [24].

Numerical verification and stochastic realizations
Despite the existence of exact solutions and asymptotic analysis, given the counter-intuitive nature of the results it is reassuring to check them against a numerical integration of the original master equation (1) with the boundary conditions equations (2)-(4). Integration was performed with a 2nd order Runge-Kutta scheme, and equations (6)-(9) were used to evaluate the statistics of interest. In figure 1 we compare the numerical solution to the exact results for the probability of establishment (26), the average time of colonization (29), and the standard deviation of the time of colonization (30). In each case we have perfect agreement as expected. We also compare a numerical evaluation of the exact results with the asymptotic analysis (for large N) in figure 2 for the average time of establishment (33) and the standard deviation of the time of colonization (35), and agreement is found, improving as N increases. Numerical generation of stochastic realizations of the birth-death process are useful to get better intuition for the actual realizations of successful colonization events. This was achieved by implementing the Gillespie method [25] to generate many realizations of the birth/death stochastic process corresponding to the master equation (1). Results for the mean and standard deviation of establishment times are supported by these numerical results as shown in figure 3. Furthermore, we show some representative realizations of colonization for cells of two different fitnesses (figure 4) demonstrating explicitly that successful events arising from lower-fitness progenitors are indeed typically more explosive and deterministic in their dynamics than those for higherfitness cells.

Dualities
Given that our calculations for temporal statistics are valid for any value of θ, it is interesting to compare our results for 'rare dynamics' (i.e. rare successful trajectories arising from cells with θ < 1), with results for 'rare genotypes' (i.e. successful trajectories arising from rare cells with θ > 1). So, we consider θ > 1 in the exact expressions for the growth statistics given in equations (26), (29) and (30). Taking the limit ≫ N 1 we find that N N Thus for colonization from high-fitness cells, the fraction of successful colonies is of order unity as one would expect. Analysis of equations (29) and (30) shows the exact mapping: For ≫ N 1, using the above relations with the asymptotic results for the temporal statistics found in the low-fitness regime, we have for the high-fitness colonization-time statistics: Interestingly, comparing the leading terms in these expressions with those for the low-fitness colonies, equations (33) and (35), we can see that the expressions are identical if eff for θ > 1. The two solutions are 'dual' to one another for large N: the growth of a rare successful low-fitness colony, of fitness θ < 1, is statistically indistinguishable from the growth of a high-fitness colony of fitness θ − > (2 ) 1. This duality is illustrated in figure 5 where the exact expressions for mean and standard deviation of colonization times are plotted as a function of θ for θ ∈ (0, 2).
Relating this mathematical result back to the biological context of metastatic colonization leads to the remarkable conclusion: the dynamic progression of micro-metastases at early stages for rare dynamics and rare genotypes are indistinguishable. If one were to observe, in vivo, the early exponential growth of a micrometastasis, our results show that one would not be able to infer whether that event was seeded by a pre-adapted or non-pre-adapted cell. A distinction does occur for colony growth beyond the critical size N. Such colonies seeded by low-fitness cells are highly unlikely to sustain the rapid subsequent growth, and would enter a quasistable state in which adaptation and microenvironment reconfiguration will presumably enable further, and possibly slow, growth. By contrast, in colonies seeded by fit cells the critical size N is of little relevance, and exponential growth would presumably continue until access to nutrients becomes a limiting factor.  ( , ) gives a normalized two-dimensional histogram, indicating the probability distribution of stochastic trajectories. The vertical axis represents the likelihood that a trajectory will hit the state n at time t. The surface crest denotes the most likely trajectory while the spread indicates the size of statistical fluctuations in the ensemble of trajectories. As fitness increases, typical trajectories take longer to reach the target, and histograms become broader indicating less deterministic dynamics.
A duality of time-inversion is also implied in the exponential growth dynamics of non-pre-adapted cells given in (34). An initial tumour of size N comprising cells with fitness θ < 1 would most likely decay exponentially, with an effective decay rate of θ − 1 . The dynamics would be the timereversed dynamics of our rare event solution, a result reminiscent of reversible solutions in the Freidlin-Wentzell theory of rare event dynamics. Yet another duality exists, in fact, if one considers the dynamics of an initial tumour of size N comprising high-fitness cells (θ > 1), and asks for the temporal statistics of the rare event that such a tumour vanishes. The probability of such an event will be exponentially small, and will typically occur following an exponential decay, for the (time-reversed) reasons discussed at length above. This result implies that spontaneous remission of small tumours comprising fit cells, although rare, will occur rapidly and deterministically, an effect similar to the rare event study here, that would be interesting to pursue in future.

Heterogeneous populations
It is crucial to recognise that the fitness of a cell attempting colonization depends on both the cell phenotype and the nature of the secondary tissue. Clearly we expect significant heterogeneity in the millions of colonization initiation events happening during the progression of cancer, due both to the heterogeneity of cells leaving the primary tumour, and the heterogeneity of secondary tissue environments in which colonization is attempted. During later stages of metastasis genetic instability, adaptation and microevolution will induce cell heterogeneity within single metastatic tumours. Here though we are only concerned with the very early steps of metastatic colonization. Heterogeneity can then be described by introducing a distribution of fitnesses in the population of progenitor cells (i.e. those single DTCs attempting to proliferate to form a new colony). We denote this distribution θ P ( ) het . Since this distribution involves properties of both DTCs and secondary tissue sites, it describes the attempted events of colonization throughout the body over the time course of metastatic disease.
For larger values of fitness, θ P ( ) het will be a rapidly decreasing function of θ because progressively fitter (or better adapted) cells should be progressively more rare in order to account for metastatic inefficiency. In this regard, it is helpful to introduce ϵ as the fraction of pre-adapted cells in the population: The overall probability distribution for successful colonization events in the body is then given by the product of θ P ( ) N and θ P ( ) het . We will denote this distribution by P succ : where we have used equation (26). It is convenient to rewrite this expression as   too unfit. Therefore θ 0 represents the optimal balance between the rarity of initial phenotypes and the rarity of their corresponding dynamics to maximally contribute to nascent metastases. As such, if θ < 1 0 , the colonization process is dominated by rare dynamics ( figure 6(a)), whereas if θ > 1 0 colonization is dominated by rare genotypes (figure 6(c)). We can use the condition θ ≡ 1 0 to define a phase boundary of the parameter space. This can be written as the condition: Combining equations (45) and (46), we find that the phase boundary condition provides a relationship between the threshold metastasis size N and the distribution of heterogeneity P het : To proceed further, it is necessary to make an explicit choice for the form of P het . Unfortunately this distribution is not available experimentally, but one can make a pragmatic choice nonetheless. First, the details of the distribution for low fitness values are not very important since, from equation (47), one will be differentiating the distribution at the threshold fitness value of unity. Second, due to the experimental fact that metastasis is a very inefficient process  [4], a reasonable assumption is that the tail of the distribution decreases exponentially. Such an exponential form could arise, for example, if accumulating discrete random events were required to endow cells with higher fitness values; an idea consistent with popular conceptions of metastatic potential of cells arising from successive mutations of key genes [26].
We therefore take P het to have a generic exponential form: The normalization condition (43) fixes where Γ z ( ) is the Gamma function [21]. The second integral condition (42) relates b and s to ϵ as follows .
Since high-fitness cells are rare, we have ϵ ≪ 1. This enables us to give an approximate solution to the transcendental equation (49): Note this solution is exact if P het is a pure exponential function (s = 1).
It is now straightforward to insert the explicit form of θ P ( ) het (equation 48) into the condition (47) to derive a relationship between N and ϵ. Taking = s ( 1) for simplicity we get: This relation demarcates the two domains in the ϵ N ( , ) parameter space as shown in figure 6(b). Specifically, rare dynamics dominates when the fraction of high-fitness cells is small and the critical size N is not too large. The impact on values of N defining the boundary is only weakly (logarithmically) dependent on ϵ.

Geometrical considerations
Statistical arguments to follow will allow us to estimate relevant sizes for N. First though, in this subsection, we provide here a simple argument for the lower limit on the size of a nascent metastatic colony which will allow a protective microenvironment for non-surface cells within the cluster. This argument is mainly illustrative, relying purely on geometrical considerations. The true efficacy of a microenvironment depends on many factors such as the biochemical and signalling milieux of the tumour, as well as the morphology of its constituent cells. In addition, for the cluster to be stable, the rate of division of core cells must be high enough to at least replace surface cells which are lost to cell death. This balance will depend on the relative fitnesses of core and surface cells, which is a biological aspect we do not account for in the geometrical arguments below.
Assuming that cells in the metastasis are spherical and of identical sizes, the minimal number of cells required to shield one single core cell is 12, since this is the 'kissing number' for spheres in three dimensions [33]. If indeed a surface of 12 cells provides a sufficiently beneficial microenvironment to the single core cell, one can assign this core cell a high fitness. The subsequent dynamics are, however, not particularly robust, since the progeny of a single high-fitness cell are at significant risk of stochastic extinction. A more robust core would require more than one cell. Using simple geometrical considerations, and assuming a large cluster, one finds that the number of surface cells N s required to cover N i internal core cells is given by the relation where e is the packing efficiency, a value that typically varies between ∼ e 0.52 for randomly packed spheres and ∼ e 0.74 for Gaussʼs optimal sphere packing. According to this formula a core comprising 2-10 cells (allowing a more deterministic growth dynamics) would require approximately 25-50 surface cells ( figure 7). A more accurate estimate can be gained from numerically packing spheres according to face-centred-cubic close packing, which produces somewhat lower estimates that equation (52). In this case 20-40 surface cells are required to enclose 2-10 core cells. Thus, on geometric grounds, the minimum stability threshold for a protometastasis, with greater than one core cell, is around ∼ N 20 -50 cells.
It is possible that the stability threshold could be smaller if surface cells are afforded some degree of protection also. Since cells are likely to adapt their morphology to that of their neighbours, via adhesive interactions, one can think of surface cells in the tumour forming a contiguous surface layer having only one facet in direct contact with the secondary tissue. This partial screening may be sufficient to raise the fitness of surface cells above unity, in which case all cells in the protometastasis (both core and surface) would be capable of deterministic growth. In this case, the minimum stability threshold might be as small as approximately 12 cells.

Connections to human cancer
To proceed further, in discussing the relevance of our theoretical results, it is necessary to make closer contact to human cancer, and infer what we can from the dearth of empirical data on the dynamics of metastatic colonization in humans. Indeed, little is known about the properties of DTCs in human cancer. Even measures such as the typical rate of shedding of cancer cells from the primary tumour are poorly known. As such, physiological parameters describing the number and diversity of DTCs will be subject to high uncertainties. Fortunately, for our purposes, it will transpire that only the logarithm of these parameters will be relevant to calibrating our model, so that approximate orders of magnitude in the physiological parameters will suffice.
For concreteness, consider individual DTC colonization attempts occurring over a period of one year in a hypothetical cancer patient who is at risk of metastatic disease, but not in late stage disease from previous successful metastatic colonization. The total number of such attempts, which we denote by M, will be very large, ranging from tens of thousands to hundreds of millions. The typical concentrations of CTCs found in patients with metastasis disease can be estimated at about one to ten cells per millilitre of blood [11,27], which corresponds to about 10 4 -10 5 total CTCs in the human body at any given time. Although a fraction of CTCs will die in the bloodstream, we expect that a significant fraction are rapidly arrested and extravasated from the vascular system as shown by Luzzi et al [4] (note, this is an extrapolation from a mouse model), and most of them are not in the blood stream for more than 24 hours [11,28]. Thus, over a one year period we can expect approximately 10 6 -10 7 different DTCs to be attempting colonization. This estimate appears low when compared to other experimental sources. For example, Butler et al [29] showed that approximately one million cells per day are shed per one gram of tumour in a rat mammary carcinoma model. If this figure were translated to humans, a one gram tumour would yield over two orders of magnitude greater number of DTCs than quoted above. A lower bound for this number comes from studies of DTCs in bone marrow of prostate tumour patients. In such cases it is estimated that 10 4 -10 5 DTCs may be in evidence, most likely in a dormant state [30]. Given this great uncertainty, for our purposes we take M in the range 10 4 -10 8 .
Assuming that rare dynamics is the dominant mechanism for colonization, we can crudely estimate a range of values for N as follows: let m be the number of colonization attempts in the relevant range θ Δθ ± 0 0 where, ultimately, successful events are likely to emerge. It is not possible to estimate systematically what fraction of the M attempts will contribute to m, so given the enormous range of uncertainty in M, we will assume that the range of m is equally large and uncertain. We denote by K the number of m attempts resulting in a metastatic colony. The great inefficiency of metastasis implies that successful colonization events are exceedingly rare [4] and as such K will be orders of magnitude smaller than m. From the definitions given above we have the direct Assuming that θ 0 is not too close to unity, equation (31) gives , which together with (53) allows us to express the stability threshold N in terms of the other quantities, viz.
where we have used in the second step the fact that ≫ m K. On varying m within the range 10 4 -10 8 and θ 0 within the range 0.25-0.85 one finds that N takes a modest range of values, between 10 and 100. The fact that the range of N is relatively well-defined given the enormous uncertainty in the number of attempts and the fitness of DTCs is due to the logarithmic dependence of N on these parameters.
There is a remarkable concurrence of this size range with both geometrical reasoning and biological observations. As discussed in the previous subsection geometrical estimates on the size of a tumour which would enable protection of the core from a hostile microenvironment yields a range for N of about 20-50 cells. Observations of induced metastatic colonization in zebrafish reported that clusters as small as 15-30 cells were able to induce angiogenesis for subsequent growth [31]. There have also been reports of stable metastatic cell clusters in mice of size 30-60 cells [32]. We believe the congruence of biological observations with geometrical considerations and our rare event analysis in identifying 10-100 cells as a threshold size for stability argues for closer experimental examination of such minute clusters, which we term 'proto-metastases'.

Absolute probabilities and timescales for colonization
We have used equation (54) to determine the possible range of N given crude estimation of physiological ranges of θ 0 and m, finding this range to be 10-100. We can take a different strategy and use (54) to provide a range of values of θ 0 assuming a value for N motivated by biological data, e.g. N = 30 as discussed above. Thus, taking N = 30 in (54), and the physiological range ∼ m 10 4 -10 8 , we find that θ 0 takes values in the rather confined range 0.55-0.75. The absolute probability of colonization for a given event will be trivially given by K/m, and thus will lie in the large range of 10 −8 -10 −4 , which simply reflects the large uncertainty in the number of DTCs. Using the range of θ 0 and N = 30 we can estimate the average time for a given successful colonization event using the leading term of equation (33). This equation gives the average time in units of the mean time (the inverse of the Poisson rate) of cell death, which we crudely estimate for non-dormant DTCs to be of order one day. One then finds 〈 〉 t 30 in the range 7.5-13.5 days, i.e. between one and two weeks. This time-scale of initial growth of a successful protometastasis is rapid compared to the time interval separating successful events, and relatedly, the typical time-scale of years over which human metastasis clinically progresses.

Conclusions
We have presented results arising from a linear birth-death process designed to interrogate the notion that metastatic inefficiency in human cancer may be due to rare dynamical events rather than rare pre-adapted cells. This is a fundamental issue in interpreting and combatting metastatic disease: does colonization of secondary tissue sites arise from deterministic growth of rare pre-adapted cells or from rare stochastic dynamics of common non-pre-adapted cells-the special forces versus infantry in our military metaphor. We have shown, in the context of rare events, that rare dynamics is equally plausible to rare genotypes, and that the dynamics of emerging colonies from each mechanism are, in their early stages, statistically indistinguishable. Despite the paucity of relevant human data, we have been able to provide estimates on key aspects of rare dynamics coloniation-such as the critical colony size N and the mean time of growth-due to the logarithmic dependence of these quantities on the poorly known physiological parameters. Our estimate for N, being in the range 10-100 cells, is, remarkably, in the same range as independent estimates from geometrical considerations and observations of proto-clusters in mice and zebrafish models. Given the rare event nature of these concepts, direct experimental validation is challenging. The following two experimental avenues might provide such validation. Rare colonization events leading to proto-metastases could occur in proximal stromal tissue surrounding the primary tumour, as well as in distant secondary sites (which has been the focus of the main article). Exhaustive study of high resolution histology sections of such tissue removed along with a primary tumour in a cancer patient would be a means of identifying these stable nascent colonies. If our model is correct one would expect to see a number of tiny metastatic clusters of size 10-100 cells. If only pre-adapted cells are causative agents of metastasis, there will be no significance to this size range, and the size distribution of micro-metastases will show no signature in this range. Better would be high-resolution examination of tissue from secondary sites in a cancer patient with a small recently diagnosed primary tumour. For example, a biopsy from the lung or liver in a breast cancer patient who does not yet have a diagnosis of metastatic disease. However, there are serious medical and ethical concerns about obtaining such tissue samples. An in vitro alternative is an experiment inspired by the classic Fidler experiments [6,8]; to culture a large number of individual cancer cells from an in vivo heterogeneous tumour (preferably from a human patient), but in relatively unfavourable growth conditions. Presumably only a small number of replicates will yield a significant new colony. To determine whether these colonies are due to rare pre-adapted cells or rare dynamics, one would attempt to culture cells from these colonies singly under the same conditions. If a large fraction succeed, then one is observing 'special forces', if a large fraction fail, then one is observing 'infantry'. We must emphasise here that it is imperative that cells are removed from the first round of successful colonies very early on before adaptation has occurred, which can effectively transform 'lucky infantry' into special forces for the given metastatic milieu.
In summary, successful establishment of metastases from low-fitness cells is (i) exponentially rare, (ii) explosive, and (iii) effectively deterministic. As a result of statistical duality, if one were to observe the early dynamics of only successful realizations of colonization arising from low fitness cells, they would resemble perfectly the deterministic, exponential dynamics of pre-adapted cells (θ > 1), leading one to the erroneous conclusion that successful tumours arise from cells with a rare genotype that enabled them to have positive fitness in the secondary tumour site. This duality provides a clear warning in interpretation of metastatic growth statistics-rare dynamics can be as likely as, or more likely than, rare genotypes in causing colonization, and yet can appear phenotypically similar in early stages. Our results are in accord with the seed and soil hypothesis, in that fitness is a key parameter within our model, and fitness, as we have repeatedly stressed, is a co-function of a given DTC (from a given subclone of the primary tumour) in a given secondary tissue environment. The empirical correlation between primary and secondary sites can be interpreted in the context of the rare dynamics model in terms of particular environments providing DTCs with lesser degrees of exponential difficulty of colonization than others. While colonization from rare genotypes provides a rationale for searching for novel drug targets aimed at the particular genetic mutations or phenotypic adaptations that allow such rare cells to be pre-adapted to secondary sites, colonization from rare dynamics, occurring by chance, will require an entirely different paradigm of therapy, presumably more generic than targeted in nature. On the positive side, our results show that the rare dynamics is exponentially sensitive to both the critical cluster size N and the optimal fitness θ 0 . Therefore treatments that can increase N and/or lower θ 0 , even marginally, will have a significant impact in lowering the overall probability of successful colonization, directly leading to increased periods of an absence of metastatic disease. With particular regard to targeting the critical cluster size, it will be worthwhile to connect the purely demographic analysis in this paper to more detailed cell biological and biomechanical studies of the microenvironment, such as the model of Basan et al [18], in which the initial growth of metastases is hampered by homeostatic pressure from the surrounding tissue, essentially rendering all progenitor cells 'unfit', and leading to dynamics analogous to nucleation.
We end with three speculative remarks. First, although this paper has been concerned with metastatic colonization, this entire discussion could be recapitulated in the context of primary tumour initiation. Again, the standard model is rare genotypes (key successive mutations allowing a particular cell to clonally expand), whilst rare dynamics would describe a process of aberrant proliferation against the odds, which very rarely leads to a critical tumour size in which the nascent colony is less fragile due to defining its own new microenvironment. Second, whether one speaks of rare dynamics in terms of metastatic colonization or primary tumour initiation, the rare event statistics show that successful growth will be very rapid. This explosive growth may enable such fragile colonies to effectively evade immune response. We estimated a time scale of days for growth to a colony size of tens of cells, and it would be interesting to explore this time scale in the context of immune response. Third, and last, our analysis of colonization is not limited, in terms of the modelling, to cancer. Rare dynamics, as a mode for non-pre-adapted individuals to reach critical size in a hostile environment, could find application in infectious disease dynamics within the body, as well as in more traditional areas of modelling such as ecological colonization.