Many-body effects in tracer particle diffusion with applications for single-protein dynamics on DNA

30% of the DNA in E. coli bacteria is covered by proteins. Such high degree of crowding affect the dynamics of generic biological processes (e.g. gene regulation, DNA repair, protein diffusion etc.) in ways that are not yet fully understood. In this paper, we theoretically address the diffusion constant of a tracer particle in a one dimensional system surrounded by impenetrable crowder particles. While the tracer particle always stays on the lattice, crowder particles may unbind to a surrounding bulk and rebind at another or the same location. In this scenario we determine how the long time diffusion constant ${\cal D}$ (after many unbinding events) depends on (i) the unbinding rate of crowder particles $k_{\rm off}$, and (ii) crowder particle line density $\rho$, from simulations (Gillespie algorithm) and analytical calculations. For small $k_{\rm off}$, we find ${\cal D}\sim k_{\rm off}/\rho^2$ when crowder particles are immobile on the line, and ${\cal D}\sim \sqrt{D k_{\rm off}}/\rho$ when they are diffusing; $D$ is the free particle diffusion constant. For large $k_{\rm off}$, we find agreement with mean-field results which do not depend on $k_{\rm off}$. From literature values of $k_{\rm off}$ and $D$, we show that the small $k_{\rm off}$-limit is relevant for in vivo protein diffusion on a crowded DNA. Our results applies to single-molecule tracking experiments.


I. INTRODUCTION
Few doubt that molecular crowding has severe consequences for dynamical processes [1]. Interesting examples are living cells where macromolecular concentrations are large. Take the E. coli bacterium as an example. There, the concentration of proteins and RNA is about 300-400 mg/ml [2] which is 30-40 times larger than common test tube conditions [3]. There is overwhelming evidence that this level of crowding influences important biological processes such as gene regulation [4], enzymatic activity [5], protein folding [6,7], and diffusion of macromolecules [8,9]. In order to get a complete picture of the in vivo dynamics we must increase our understanding of the role of crowding, and recent experimental developments provide the means to do it.
In recent years, researchers have beat the diffraction limit and turned optical microscopy into 'nanoscopy'. Today's microscopy methods (e.g. STED, STORM and FIONA) [10] does not only allow us to image nanometersized biological structures, but recent improvements [11,12] also permit tracking fluorescently labelled proteins at the biologically relevant millisecond-scale. This is anticipated to shed new light on biological processes, as well as increase our understanding of particle transport in engineered nano-fluidic systems [13,14]. In order to properly interpret those type of experiments in vivo, we * Electronic address: ludvig.lizana@physics.umu.se need new theoretical and computational models in terms of physical properties of the intracellular space, the cytoplasm.
The cytoplasm is cramped with macromolecules and we are interested how this influences diffusion-controlled processes, a key component in many cellular functions (e.g. gene regulation). While our results are new, aspects of this problem has been studied theoretically before. For example, [15][16][17][18] investigate diffusion in the three dimensional cytoplasm and in gels, whereas [19][20][21][22] focus on the sub-diffusive motion seen in single-molecule experiments. Crowding is also important for DNA search processes where a searcher combines one and three dimensional diffusion to quickly find its target, so called facilitated diffusion. Facilitated diffusion under crowding is addressed in [4,23] which resemble this paper but we ask different questions: we calculate the diffusion constant of a tracer particle in terms of key properties of surrounding crowder particles rather than focusing on mean target finding times.
Much inspiration to this work comes from DNA binding proteins. Of particular interest is repair proteins (MutS and homologs) whose residence time on the DNA can be very long (∼ 10 min [24]). We are also inspired by transcription factors, the family of gene regulatory proteins. The yeast regulatory proteins LexA and Gal4 can stay bound to their regulatory sites for several minutes in vitro (LexA ∼ 5 min and Gal4 > 30 min) [25], but surprisingly, this number can be reduced up to 1000 times in vivo. Both classes of proteins have the ability to diffuse along the DNA, unbind to the three dimensional intra- 1: Schematic illustration of our model. All particles are diffusing with rate kD on a one dimensional lattice with lattice spacing a. The crowder particles (green) may also unbind and rebind to a random, or the same, lattice site with rates k off and kon, respectively. The tagged particle (orange) cannot leave the line (kon = k off = 0). cellular space, diffuse in space, and rebind to the DNA. We are interested in how the dynamics of those proteins change under crowding.
In order to better understand the role of crowding, we introduce a theoretical model where particles diffuse on a one dimensional lattice where two particles cannot occupy the same site ( Fig. 1 ). They diffuse with rate k D (same for all particles) and may unbind and rebind to the lattice with rates k off and k on , respectively. These rates are tuned so that the average particle line density is constant at 10-20% which is not too far from in vivo conditions (30% of the DNA in E. coli is covered by proteins). The unbinding rate for the tracer particle is set to zero similar to the long-lived protein-DNA complexes described above. Now we ask: What is the long time diffusion constant of a tracer particle in such a crowded quasi one dimensional system?
We answer this question numerically using stochastic simulations (Gillespie algorithm), corroborated with analytical results. The main results are Figs. 4-6 where we show how the diffusion constant changes as a function of our main parameter k off . Those results are applicable to single molecule tracking experiments [26].
The unbinding rate k off interpolates between two well studied limits. (i) When k off is large (compared to k D ), the tracer's mobility is only weakly lowered and diffuses close to as if it was free. (ii) When k off → 0, the particles diffuse with unchanged order in a single file. Single-file diffusion is well studied [27][28][29][30][31][32][33] where the most famous result is that the mean squared displacement of a tracer tracer particle is proportional to √ t rather t (t is time) which signatures non-markovian dynamics.
This paper is organised as follows. In Sec. II, we outline briefly the details of our model. Before showing the results in Sec. IV, we provide analytical estimates of the diffusion constant in Sec. III, based on a theoretical calculation found in Appendix A. In Sec. III we also briefly review the dynamics of the model at short, intermediate and long times. We close by a few concluding remarks in Sec. V.

II. THE MODEL
Our model has been used and explained elsewhere [34], but for completeness we summarise it briefly below. Consider a one dimensional lattice on which crowder particles (assumed identical) and the tracer particle diffuse (Fig.  1). The crowder particles can diffuse, unbind and rebind to the lattice. Rebinding occurs in two ways. Either to a random unoccupied lattice site (chosen with uniform probability), or to the exact same location. Both rebinding modes has been used to model transcription factor dynamics on DNA [35,36], and we will therefore consider both. The lattice constant is denoted a, and the diffusion rate k D is assumed equal in both directions and for all particles. Double occupancy is forbidden and a particle cannot overtake a flanking neighbor (single-file condition). Binding and unbinding dynamics of crowders are characterised by the rates k on and k off , which are chosen such that the particle line density is in equilibrium with the bulk, thereby keeping the average filling fraction is constant. In our simulations we keep it at 10-20%. We implemented the model using the Gillespie algorithm. See Appendix C for details.

III. ANALYTICAL ESTIMATES FOR THE LONG TIME DIFFUSION CONSTANT
Here we provide analytical estimates to corroborate and better understand the numerical results in the next section. We are mainly interested in the long time diffusion D constant for the tracer particle, defined as where x 2 (t) is the ensemble averaged mean squared displacement (MSD), and t is time. Notably, D is in general not equal to the bare, or free particle, diffusion constant It is a non-trivial function of k off and ρ. To simplify matters, we start by assuming that the crowder particles sit equidistantly on the line with density ρ, unable to diffuse (k D = 0), and rebind to the site from which they unbound. In this situation, the tracer moves back and fourth between its flanking neighbours and can only move past them if one of them unbinds. The average time until this happens is proportional to 1/k off . In point of view of the tracer this process is a random walk on an effective, or coarse grained, lattice with spacing and jump rate proportional to 1/ρ and k off , respectively. From this we expect that D ∼ k off /ρ 2 , and a more elaborate calculation shows that When crowder particles rebind to a random location rather than to the same site, the distance between two neighbouring particles fluctuate even though the average density is fixed. This leads to a larger effective lattice spacing, and a larger D compared to Eq. (3): When crowder particles also diffuse (k D = 0) the distance between nearest-neighbours becomes difficult to define. We estimate the coarse grained lattice constant as the length the tracer particle explores during a time intervall proportional to 1/k off . This leads to which has different k off -scaling than before. Equations (3)-(5) constitute our main analytical results.

B. Long time, large k off behaviour
When k D k off , crowder particles frequently unbind and rebind to the lattice and the no-passing condition is effectively violated. But, crowder particles still hinder the tracer thereby decreasing the diffusion rate. Imagine that the jump rate for a single particle to a neighbouring site on an otherwise empty lattice is k D , or D = a 2 k D . Then, when crowder particles are around, some of the jumps are canceled because the target lattice site may be occupied. In that situation the jump rate is reduced by the probability that the target lattice is unoccupied. For very large k off this probability is simply 1 − aρ, therefore This mean field result has been obtained before [20,37,38], and as k off /k D is close to or smaller than unity, corrections to this formula becomes prominent (see Fig. 5).

C. Interpolation formula for D
Based on the expressions above, we propose a simple formula for D valid for all k off : Here D large k off is Eq. (6) whereas D small k off is one of Eqs.
(3)-(5) depending on the case under study. Equation (7) is appealingly simple and captures properly the small and large k off limits, but it should not be viewed as more than a candidate expression for D. We have not made a systematic attempt to find the best form of D, and leave it for future research.

D. How is the long time asymptotics [Eq. (1)] approached?
Here we clarify the meaning of short, intermediate and long times within of our model. To keep the discussion simple, we consider k off , k on , k D , and ρ as constant. See also Fig. 3 which shows x 2 (t) as a function of time, where all relevant regimes are present.
At most we have three regimes of different behaviour. These are separated by the average residence time of the crowder particles τ off , and the average collision time τ coll , which is the time it takes for a particle to diffuse across the average nearest neighbour distance 1/ρ: Let us assume that there is a clear separation between these timescales and that τ coll τ off and that k D is the fastest rate in the system, 1/k D τ coll . In the first regime, t τ coll , the tracer diffuse as if it was free, since it has not yet collided with its nearest neighbours. This means that the tracer's MSD is x 2 (t) = 2Dt. In the second regime, τ coll t τ off , many particle collisions have taken place but particles diffuse with maintained order since they are unable to pass each other. This is the single-file diffusion regime which is characterised by Harris' law x 2 (t) ∝ Dt/ρ 2 [27]. Here, memory effects dominate and is the very reason to the sub diffusive behaviour. In the third regime, t τ off , particles start unbinding from the lattice which effectively violates the no-passing condition. In this regime we expect diffusive behaviour again x 2 (t) ∼ t, but with a diffusion constant different from D, denoted by D [see Eq. (1)]. This is the one we wish to calculate, in particular in terms of our key parameter k off . Note that the second regime can be erased completely if we lower τ off such that τ off ≈ τ coll (or smaller). Similarly, the third regime is absent if unbinding is not allowed, i.e. k off = 0 (or τ off = ∞). In most of our simulations, diffusion is the fastest process in the system which also is the likely scenario a biological cell (see Sec. V). To sum up,

IV. RESULTS
In this section we present results from stochastic simulations of the model outlined in Sec. II, together with of our theoretical findings from Sec. III. The simulation details can be found in Appendix C. First, we show the tracer particle's MSD as a function of time, from which we extract the long time diffusion constant D. Second, we investigate D separately for large and small k off . Finally, we compare our proposed interpolation formula Eq. (7) to the full range of k off values.  Figure 2 shows the MSD when crowder particles do not diffuse but only unbind and rebind. They rebind either always to the same site (upper panel), or to a randomly chosen site (lower panel). The short time behaviour in both plots is independent of k off and is well represented by 2Dt (upper dashed dark blue line). The long time behaviour is, however, strongly dependent on k off , which is evident from the broad scattering of curves. The MSD is still linear in time but the diffusion constant (proportional to the extrapolated intersection with the vertical axis) depends strongly on k off . The linear regime sets in when t ≈ τ off as is seen from the shorter vertical dashed lines. If we increase the particle concentration, the shape of the curves remains the same but the scattering of curves increases, since D ∝ 1/ρ 2 (small k off ) and D ∝ 1 − aρ (large k off ).
In Fig. 3, crowder particles diffuse and rebind to a randomly chose site. As we lower k off the separation between τ off and the collision time τ coll increases, which means that the single-file regime ( x 2 (t) ∼ √ t) becomes wider. This is simply because crowder particles have not yet started to unbind from the lattice and therefore diffuse collectively in a single-file. We also see that the MSD curves for long times is less scattered than before, indicating that D is less sensitive to k off . This agrees with our theoretical prediction where D ∝ √ k off compared to D ∝ k off when crowder particles stand still.
/L 2 2D free t/L 2 k off = 10 −5 k off = 10 −4 k off = 10 −3 k off = 10 −2 k off = 10 −1 k off = 1 k off = 10 FIG. 2: Mean squared displacement x 2 (t) of the tracer particle as a function from time for different unbinding rates k off when crowder particles do not move. The crowder particles rebind in two ways: to the same site (upper panel), or to a randomly chosen site (lower panel). For shorthand we put k off = k off /kD andτ off = τ off kD. Simulation details: lattice constant: a = 1, tracer particle diffusion rate: kD = 1 (kD = 0 for crowder particle), filling fraction: aρ = 0.1, number of lattice sites: 501 (L = 501a), number of simulation runs: 9600.
case is plotted for two concentrations, aρ = 0.1 and aρ = 0.2. In order to better compare the three panels with the figures below we scaled the vertical axis with the large k off limit D(1 − aρ). In Fig. 5 we show explicitly how this limit is approached.
The two upper panels in Fig. 4, where the crowder particles are immobile are very similar to each other. If both cases would be depicted in the same graph, the data points would practically sit on top of each other. For clarity, we therefore separated the data into two figures. We see that the small k off behaviour agrees very well with the theoretical results, Eqs. (3) and (4).
4D free tπ k off = 10 −4 k off = 10 −3 k off = 10 −2 k off = 10 −1 k off = 1 k off = 10 The lower panel depicts when crowder particles diffuse on the lattice. Their movements lead to an overall increase of D for the tracer particle since they no longer act as static road blocks. This also changes the scaling with k off from linear in the two upper panels, to √ k off . The density dependence is also weaker (1/ρ compared to 1/ρ 2 ).

C. Large k off behaviour
When k off is much larger than the diffusion rate k D , we expect the mean field result Eq. (6) to hold. We also expect that corrections to this result becomes increasingly prominent as k off is lowered. Both are confirmed by simulations in Fig. 5, where we see that D/(1 − aρ) ≈ 1 for k off /k D 1, and D/(1 − aρ) < 1 for k off /k D < 1. These results validate the mean field argument leading up to Eq. (6) for our quasi one dimensional system. The figure only shows the case where the crowder particles rebind to the same location, since the behaviour at large k off is close to identical for all rebinding modes.

D. Interpolation formula
In Sec III we proposed Eq. (7) that ties together the small and large k off regimes. The comparison to the full range of k off is shown in Fig. 6 as solid lines (symbols are simulation results). Just as in Fig. 4, each panel shows: (top) immobile crowder particles and rebinding to the same lattice site, (middle), immobile crowder particles and rebinding to a random lattice site, and (bottom) diffusing crowder particles and rebinding to a random site. Overall, Eq. (7) is a good approximation for the whole range of k off . The deviations are largest in the transition region, roughly 10 −3 < k off /k D < 10 −1 , where the maximum relative error for all curves is 79% (top panel, aρ = 0.2). The relative error in the small and large k off tails is less than 7%.

V. SUMMARY AND CONCLUDING REMARKS
We studied the long-time diffusion constant D of a tracer particle in a one dimensional crowded manyparticle system. We found that D depends strongly on the unbinding rate k off of the surrounding crowder particles and density ρ. For small k off we made a simple theoretical model where we deduced that D ∼ k off /ρ 2 (to first order in 1/ρ 2 ) when crowder particles are immobile and only unbind/rebind to the lattice. The prefactor depends on how they rebind, either to the same or to a random site. When they also diffuse we obtain D ∼ Dk off /ρ 2 (to first order in 1/ρ), a different k off -scaling than before; D is the free particle diffusion constant. This means that D is less sensitive to k off and ρ when crowder particles are diffusing compared to standing still. For large k off , we found that all cases agreed with the mean field result D D(1−ρa), independent of k off . Our new expressions showed overall good agreement with simulations.
It is interesting to see which k off /k D regime we expect to find in the living cell. As mentioned in the introduction, residence times of DNA binding proteins vary from fractions of a second to up to an hour (unspecific binding is even shorter, >5 ms [39]). To get an order of magnitude estimate of k off /k D , let us assume that k off ∼ 0. [26], which gives k D = D 1D /(bp) 2 ∼ 10 5 − 10 7 s −1 . This means that k off /k D ∼ 10 −6 − 10 −8 , which clearly indicates that k off k D . The model we studied is inspired by protein diffusion on DNA. Our results are simple formulas for the diffusion constant of a tracer particle taking crowding and binding/unbinding dynamics into account. Although a protein is more complex than a hard-core particle, we hope that the simplicity of our results will find its usefulness in a range of settings, in particular, single-molecule tracking experiments.  In this appendix we outline the derivation for the long time diffusion constant D in the small k off limit that led to Eqs. (3)- (5). The main idea is to calculate the typical length scale l 0 that the tracer travels before being hindered by a crowder particle. In terms of l 0 , the long time diffusion constant is where τ is the typical waiting time until a successful jumping event. Since the diffusion rate k D is fast, the rate limiting step for the tracer particle to move is when a flanking crowder particle unbind from the lattice. We can therefore envision the tracer particle diffusion as a single particle diffusion process on a coarse grained lattice, with lattice constant l 0 and jump rate 1/τ . First we address τ , the average time until a successful jumping event. Imagine that the tracer particle is flanked by two crowder particles, and the time for any of them to unbind is 1/2k off . Now, say the that the tracer's right neighbour unbinds but the tracer anyway tries to jump left. This jump is forbidden, and so is in fact half of all tries the tracer makes. This implies that τ is 1/k off rather than 1/2k off . Moreover, we also consider rebinding of crowder particles, so even if the tracer move in the direction of the unbound neighbour it may anyway be blocked by another crowder particle. We must therefore correct the jumprate with the probability that the site is vacant, that is 1 − aρ (in equilibrium). In summary, we estimate τ as Second we turn our attention to the coarse-grained lattice distance l 0 . In short, we choose l 0 the standard deviation of the distribution of nearest-neighbour distances. Here is how we formally arrive at this result. Since it can happen that nearest and next-to-nearest neighbours are unbound simultaneously, the length that the tracer particle can move, z, can vary. We choose the probability distribution of z to be the probability that there is a separation z between two nearest neighbour particles. The distribution of z, ϕ(z), is known (see Fig. 7), but differs depending on the type of rebinding. If z only can change in discrete steps of ∆, that is ∆, 2∆, 3∆..., we can define a jump length distribution g(l) for the tracer particle in the coarse grained lattice as where δ(x) is the Dirac delta function. Now we choose l 0 as the standard deviation of g(l), and from Eq. (A3) one can show that where n 2 ϕ = ∞ n=1 n 2 ϕ(n∆). In the subsections below, we calculate l 0 explicitly for the special cases: (1) immobile crowder particles placed equidistantly, (2) immobile crowder particles placed at a random distance apart from each other, and (3) diffusing crowder particles.

Case 1: Immobile crowder particles placed equidistantly
In the simulations, the crowder particles sit equidistantly, and unbind and rebind with rates k off and k on , respectively. In order to make sure that the average density ρ is constant over time, we choose k on = k off , and work with 2m particles in the whole system (lattice + surrounding bulk). This means that m particles will on average be on the lattice and the density is ρ = m/(aN ), where N is the number of lattice sites, and a the lattice constant. The smallest separation between two crowder particles in this setup becomes ∆ = 1/(2ρ), and increase discrete steps of ∆: Each one of these lengths has a different probability, and the distribution of nearest neighbour distances is which agrees well with simulations (Fig. 7). Using that n 2 ϕ = 6 in Eq. (A4) gives 2. Case 2: Immobile crowder particles with rebinding to random locations In this case the crowder particles leave the lattice and return to a random vacant lattice site. This means that the smallest separation is the lattice distance of the original lattice, ∆ = a, and distances are in steps of a: a, 2a, 3a, . . . (A8) The inter-particle distance distribution in this case is which is corroborated by simulations in Fig. 7. In the continuum limit (small a), the distribution becomes exponential ϕ(na) ∼ e −|n|aρ . Using that n 2 ϕ = (2 − aρ)/(2a 2 ρ 2 ), we obtain

Case 3: Diffusing crowder particles with rebinding to random locations
Here all particles diffuse which drastically changes the situation. The main difference is that the tracer does not get stuck between two flanking road blocks since they also move. However, we know from simulations that the MSD for the tracer is in the long time limit proportional to Dt (Fig. 3), which is a direct manifestation that the nopassing condition is violated (otherwise we would have had MSD ∼ √ t). Altogether, this implies that there is length scale for the coarse grained lattice and a time scale associated with a jumping event.
For this case we cannot use ϕ(z) to estimate l 0 since ϕ(z) is the same as when crowder particles are immobile (see Fig. 7, and ), and gives the wrong result for D. The reason is that inter-particle distances fluctuate at the same rate as the tracer is diffusing, and those fluctuations increase D. In fact, even if k off = 0, the tracer particles still can move across the system, although slowly. We estimate l 0 as the distance the tracer particle explores in a time τ , that is Interestingly, l 0 depends on k off and not only ρ as in the previous cases. This changes the scaling of k off in D from linear in Cases 1 and 2 to √ k off for this case. This can also be understood from the following simple argument. The curve for x 2 (t) is continuous for all times, and at some time the dynamics changes behaviour from singlefile (∼ √ t) to regular diffusion (∼ t). This occurs around t ≈ τ , which implies The way we determine D from our MSD simulations, is illustrated in Fig. 8. First, τ off is the approximate time at which the MSD becomes linear (shown as vertical dashed-dotted lines). Second, we make a linear regression of the MSD curve starting from that point, and obtain the slope which equals 2D. The resulting fits are shown as dashed lines.

Appendix C: Numerical implementation
The model (Fig. 1) is implemented using the Gillespie algorithm [40]. The majority of the details of the implementation has been explained elsewhere [34], but below we point out some key differences.
We keep track of the unbound crowder particles in the bulk in order to have the option to rebind them at the location they detached from. In practice we use two lattices, one of which represents the bulk. The filling fraction is maintained at the level we want by setting k on = k off , and then let the systems equilibrate such that half the number of crowder particles sit in the bulk and the other half on the lattice we are interested in. This representation helpful to investigate all sorts of binding modes especially rebinding to the same location. In Ref. [34] the bulk served as an infinite particle reservoir and the concentration on the lattice was tuned via detailed balance (rebinding always occurred to a randomly chosen site). Here on the other hand, the bulk has a finite size and cannot be seen as a strict particle reservoir. However, since we use about 500 particles, fluctuations around the filling fraction aρ are so small that we rarely (if ever) deplete the bulk. This means that we have approximately a grand canonical ensemble.