Heterogeneous anomalous transport in cellular and molecular biology

It is well established that a wide variety of phenomena in cellular and molecular biology involve anomalous transport e.g. the statistics for the motility of cells and molecules are fractional and do not conform to the archetypes of simple diffusion or ballistic transport. Recent research demonstrates that anomalous transport is in many cases heterogeneous in both time and space. Thus single anomalous exponents and single generalised diffusion coefficients are unable to satisfactorily describe many crucial phenomena in cellular and molecular biology. We consider advances in the field of heterogeneous anomalous transport (HAT) highlighting: experimental techniques (single molecule methods, microscopy, image analysis, fluorescence correlation spectroscopy, inelastic neutron scattering, and nuclear magnetic resonance), theoretical tools for data analysis (robust statistical methods such as first passage probabilities, survival analysis, different varieties of mean square displacements, etc), analytic theory and generative theoretical models based on simulations. Special emphasis is made on high throughput analysis techniques based on machine learning and neural networks. Furthermore, we consider anomalous transport in the context of microrheology and the heterogeneous viscoelasticity of complex fluids. HAT in the wavefronts of reaction–diffusion systems is also considered since it plays an important role in morphogenesis and signalling. In addition, we present specific examples from cellular biology including embryonic cells, leucocytes, cancer cells, bacterial cells, bacterial biofilms, and eukaryotic microorganisms. Case studies from molecular biology include DNA, membranes, endosomal transport, endoplasmic reticula, mucins, globular proteins, and amyloids.


Introduction
At the micron and submicron scale, particles suspended in simple fluids (e.g. liquids and gases) experience substantial random fluctuations in their displacements that are driven by collisions with the fluid molecules which surround them that are in turn driven by thermal forces. The displacement fluctuations are often independent of each other and the central limit theorem implies that the statistical distribution of their displacements tends to a Gaussian distribution provided a sufficient number of displacements are considered. This is classical diffusion at the molecular scale and its experimental verification with small latex particles in solution led to a Nobel prize for J. B. Perrin in 1926. The experiments were based on the groundbreaking theories of A. Einstein (1905) and others (Langevin, Smoluchowski etc.). Mathematicians find that the central limit theorem applies to a wide range of basic probability distributions (and thus physical phenomena), but it can breakdown under some circumstances, particularly if the underlying distributions have fat tails [1]. Such phenomena are now known to have important implications in a broad range of fields including finance, epidemiology, soft matter, quantum physics, condensed matter physics and anomalous transport in cellular and molecular biology [2][3][4][5][6][7][8][9]. The later subject will be the focus of the current review.
Heterogeneous anomalous transport (HAT) sounds like an exotic phenomenon, but it is surprisingly common. It is the rule for the majority of molecules inside cells, the motion of individual cells and their collective motility. Anomalous transport of motile particles is defined broadly as the violation of any of the conditions of the central limit theorem. It is most prominently characterised by either non-Gaussian probability density functions for the particles' displacements and/or the non-linear power-law growth of the mean squared displacement of the particles on the time t, MSD(t) ∼ D α (r, t)t α(r,t) . (1) Constant anomalous exponents α (0 < α < 2) and the generalized diffusion coefficients D α are the basic parameters used to describe homogeneous anomalous transport. For α = 1 the motion is diffusive and for α = 2 the motion is ballistic. However, the dependence of α(r, t) and D α (r, t) on space (r) and time (t) in HAT gives rise to distinct phenomena. Equation (1) is insufficient to fully characterise HAT since different phenomena can produce the same behaviour of the MSD. Therefore, we will discuss other quantities which can be used to more fully characterise HAT in experiments. Furthermore, machine learning techniques are rapidly developing that allow HAT to be probed avoiding (1) which is prone to numerical errors, especially for short trajectories. Scaling laws, such as (1), have a long history in physics, including developments in soft matter physics e.g. how the size of a polymer chain depends on the number of monomers [10,11]. Fundamental reasons that physical laws often follow power laws are unknown and are probably unknowable [12]. Fractional powers are often a signature of fractal behaviour (with intermediate dimensionalities that are between integer values). Trajectories of diffusing particles are fractals, so it is perhaps no surprise that methods to quantify their behaviour (e.g. MSDs) also have self-similar scaling. HAT is thus closely connected with the study of multifractals, since heterogeneity causes the fractal exponents to vary. Multifractal analysis showed lots of initial promise in a broad range of research fields [13], but has not been extensively used in the interpretation of experiments due to challenges in making robust meaningful measurements. However, machine learning techniques combined with large high-resolution data sets are changing this situation [14].
Our main goal is to give an overview of the physical mechanisms which lead to homogeneous anomalous diffusion, provide a manifesto for their uses to explain transport phenomena in molecular and cellular biology (Section 2), and discuss the anomalous diffusion framework with an emphasis on its extensions to heterogeneous anomalous transport (Section 3). In Section 4 we explore the ever-growing number of applications of these methods to describe diffusion in heterogeneous media, along with experimental methods, microrheology, and measurable observables (Section 5). In Sections 6 and 7 we will focus on examples from cellular and molecular biology to constrain the review to a manageable size, although there is a much wider range of applications. Before we embark on this journey we discuss the situations where HAT should be expected in place of homogeneous anomalous diffusion and its implications in various fields.

Where and when to expect HAT
HAT occurs when the properties of anomalous transport change in space or time. The motility of particles can no longer be described by single constant anomalous exponents α and generalized diffusion coefficients D α in (1). Instead, every trajectory accrues a spaceand time-dependent anomalous exponent α(r, t) and generalized diffusion coefficient D α (r, t). Thus, HAT is characterized by distributions of anomalous exponents and generalized diffusion coefficients. These distributions can vary with the time and length scale considered.
The physical mechanisms which lead to homogeneous anomalous diffusion and the mathematical apparatus of fractional integration, which proved to be very useful for its description, are reviewed in many papers and books [1,[3][4][5][6][15][16][17][18]. A variety of physical effects can give rise to heterogeneous anomalous diffusion in molecular and cellular biology. Effects are primarily attributed to either space (molecular crowding, particle shape, particle size, caging and quenched disorder e.g., in glasses, gels and jammed systems) or time (localized effects of a physiological process, activity of molecular machines e.g., enzymes or motors, transient binding reactions, escape out of confinement, aging, and fluctuations due to particle flexibility). In practice, the effects of space and time can rarely be separated and both need to be considered simultaneously i.e. the separation is slightly artificial and is directed by computational simplicity.
Advances in single particle tracking (SPT) methods have provided the field of HAT with some robust experimental foundations in cellular and molecular biology. Measurements have created single-particle trajectories over long time scales (long tracks) that have anomalous statistics. Often the tracks have frequent switches between different anomalous transport regimes which indicate heterogeneity in time e.g. due to the switching behaviour of motor proteins [19][20][21]. The observed heterogeneity is not explained away by ensemble averaging over multiple particles, time averaging, or experimental noise. Other experimental techniques, such as fluorescence correlation spectroscopy (FCS) [22,23], dynamic light scattering [24] (or the more modern microscope-based equivalent, differential dynamic microscopy [25]), quasi-elastic neutron scattering [26] and nuclear magnetic resonance (NMR) [27] have given further useful, but more indirect, evidence for HAT.
In general, it is challenging and currently impossible to find a universal description for the dynamics of molecules in living cells due to the high concentrations of extremely complex molecules that cells contain that are constantly interacting with one another. Each biological molecule thus experiences a huge number of interactions with other molecules and the majority of these interactions are badly characterised due to experimental limitations. Rather than attempting to create universal soft matter models, which are likely to fail, studies of the statistics of anomalous transport provide agnostic model-free characterisation of the stochastic processes involved (an important first step for the description of the biological molecule). Some early applications of soft matter models in cellular/molecular biology have neglected accurate characterisation of the statistics of motion and are poor descriptions as a result.

Polydispersity and diffusing diffusivity
In dilute solutions every flexible molecule, and thus an ensemble of flexible molecules, will experience heterogeneous transport since changes in each molecule's size will occur in response to thermal forces, which will modulate the instantaneous diffusion coefficients of their centres of mass; this is the phenomenon of diffusing diffusivity. Thus, a single isolated flexible molecular chain will experience heterogeneous diffusion. This is in addition to heterogeneity displayed on the ensemble level due to variations in the size and chemistry of a population of molecules (called polydispersity and chemical heterogeneity respectively), e.g. due to the imperfect synthesis of the molecules. Heterogeneous diffusion due to variations of size is commonly characterised in physical chemistry laboratories, e.g. via light scattering experiments where polydispersity indices are defined [24]. Heterogeneous diffusion due to flexibility has been much less explored than that due to chemical heterogeneity [28,29], but will contribute to a wide range of phenomena in soft matter physics causing a broadening of dynamic processes, e.g. during reptation of flexible polymeric chains in concentrated solutions.

Viscoelasticity
From another perspective, the majority of complex fluids demonstrate heterogeneous viscoelasticity on micron and nanometer length scales and this is directly associated with HAT. Such behaviour is explored in the field of microrheology [30,31]. The manner in which materials respond to stress and strain is heterogeneous in space due to their complex structuration on small length scales. This in turn impacts the efficiency of transport of cells and molecules as they move through the heterogeneous microenvironments created by the complex fluids from which they are constructed. Molecular motors acting at the molecular scale, gelation, glassy phenomena, jamming or driven flows at larger length scales can lead to a time dependence of the mechanical responses (e.g. thixotropy or aging) and thus heterogeneous viscoelasticity can occur in both time and space. For equilibrium systems, the generalized fluctuationdissipation theorem (GFDT) provides a rigorous link between HAT and heterogeneous viscoelasticity [32]. Specifically, HAT that follows (1), a power law dependence of the mean square displacement on time, corresponds to a heterogeneous power-law fluid. The creep compliance (J(t)) of a microsphere of radius a can be calculated from the GFDT, where k B T is the thermal energy and J 0 is a constant [33,34]. Analogous expressions are expected to hold for non-equilibrium systems connecting HAT with viscoelasticity. A similar equation to (1) can be written for angular diffusion [35], which holds in the small time limit, where ⟨∆φ 2 ⟩ is the mean angular displacement of a microsphere, D γ is the generalized angular diffusion coefficient and γ is an anomalous exponent. Heterogeneity is also observed for anomalous angular transport and it can be quantified in angular microrheology experiments [36]. Extensions have been made to consider diffusing diffusivity of angular motion in colloidal suspensions [37].

Turbulence
The turbulence of fluids provides a range of challenging statistical problems [38]. Many cells are subjected to turbulent flows e.g. microorganisms in marine environments or blood cells in the aorta. Biological molecules can also experience classical turbulence e.g. biopolymer chains can give rise to the fireman's hose effect in turbulent flows. Other varieties of intermittent non-linear flow phenomena have also been defined, such as elastic turbulence (e.g. shear flows of semi-dilute DNA solutions [39]) and active turbulence (e.g. in bacterial suspensions [40]). Curiously, both these extended classes of turbulence occur in low Reynolds number flows (opposite to classical turbulence) and they have distinct statistical properties for particle motions e.g. the spectra of their velocity fluctuations is distinctive.
In the context of HAT, turbulence is novel in that it corresponds to an accelerated type of motion and the mean square displacement can have a super-ballistic scaling e.g. in classical turbulence the anomalous exponent, α, is 3 in equation (1) [41] i.e. ⟨R 2 (t)⟩ ∼ t 3 , which is called Richardson's law. This stochastic accelerated scaling behaviour has been explained using a Lévy walk model [42]. Beyond turbulent flows, experimental observation of accelerated particle motion in cellular and molecular transport is very rare, presumably due to the systems' low Reynold's numbers and thus overdamped dynamics.
Multifractal scaling is a modern paradigm to explain the universal features of turbulence [43] i.e. the flows are heterogeneous in time and space. How to quantify HAT in turbulence predominantly remains an unsolved problem, although neural networks are providing some new tools [44].

Wavefronts in reaction-diffusion systems
Reaction-diffusion (RD) equations are often used to describe pattern formation in cellular and molecular biology [45,46]. A prototypical example is ∂c/∂t = D∂ 2 c/∂x 2 + f (c), which is Fick's second law for diffusion of a molecule with concentration, c, and diffusion coefficient, D, plus a reaction term, f (c).
Mathematically, the additive contribution of the reaction term in RD equations is also applicable to describe anomalous reaction-diffusion processes with no memory [47]. However, for systems with subdiffusion of the continuous time random walk variety, the addition of a reaction term is physically inconsistent [48], and no universal description of anomalous reaction-diffusion processes in terms of partial differential equations is currently available. More hope is currently offered by numerical solutions of anomalous RD models e.g. using agent-based models. Furthermore, in cellular and molecular biology often small numbers of molecules determine pattern formation via RD equations and extensive work has been performed to understand the impact of fluctuations in the number of molecules involved on Brownian RD processes [49].
Heterogeneity due to the curvature of substrates, varying diffusion coefficients, varying local geometry (porous materials and gels) and varying density will all affect the propagation of the RD wavefronts. A specific example is electrical signalling in bacterial biofilms [50] e.g. the speed of propagation of wavefronts of potassium ions in a mushroom-shaped biofilm will be affected by the local curvature of the mushrooms. We anticipate that agent-based models combined with HAT will be particularly useful for studying wavefronts in reaction-anomalous diffusion.

Anomalous diffusion framework and its extension to HAT
The abundance of Brownian (normal diffusive) processes in nature is a consequence of the central limit theorem (CLT). Under conditions of finite mean µ and variance σ 2 , with independent and identically distributed random displacements X 1 , X 2 , ..., X n , the CLT implies the convergence of their partial sums S n = X 1 + X 2 + ... + X n to a standard normal (Gaussian) distribution, N (0, 1), with zero mean and unit variance, Such normal distributions are thus ubiquitous in physical phenomena, as their name implies e.g. normal distributions describe the probabilities of random errors in the majority of physical measurements. Anomalous diffusion corresponds to physical phenomena in which the CLT no longer holds in its standard form [1]. The CLT has been generalized to a larger class of stable distributions. If there is a sequence of constants a n and b n , the generalized CLT (GCLT) states that the sum of a large number of independent identically distributed random variables, including those with infinite variance, converge to Lévy stable distributions [51] S n − b n a n → S α (β, γ, δ), n → ∞, where α is a characteristic exponent, β is a skewness parameter, γ is a scale parameter and δ is a location parameter. The only stable distributions that have closed forms for their densities are the normal distribution α = 2, the Cauchy distribution α = 1, and the Lévy distribution α = 1/2. The characteristic exponent 0 < α ≤ 2 controls the thickness of the distribution tails. For α < 2, the PDF of the displacements (x) is asymptotically proportional to |x| −α−1 as x → ±∞. Therefore, stable distributions, except the normal distribution, have long tails. A large variety of different mathematical models for anomalous transport have recently been developed and their number is ever increasing. Therefore a full account is beyond the scope of this review. Here we mention several approaches in molecular and cellular biology and their promising generalizations to HAT.
• Scaled Brownian motion (SBM). Perhaps one of the simplest interpretations of the non-linear growth of the mean squared displacement (1) can be given in terms of decelerating Brownian diffusion with a diffusivity D that changes in time as a power law, MSD(t) ≃ D(t) t with D(t) ∼ t α−1 . This defines a non-stationary process with the probability density function that coincides with that of the fractional Brownian motion (FBM) equation (9) below [8,[52][53][54]. In contrast to FBM, SBM is non-ergodic (see the discussion of ergodicity in section 5.13) [55]. Although it might be difficult to demonstrate such a dependence of diffusivity on time in experiments, clearly this interpretation belongs to the domain of HAT. A further generalization of SBM towards HAT can be realized by making the diffusion constant a function of both time and space, D(r, t). A deterministic dependence of D(r, t) = |r| β t α−1 was studied in [56].
• Fractional Brownian motion (FBM). FBM is a self-similar Gaussian process with stationary increments which is characterized by the self-similarity Hurst exponent H. FBM can be modelled via the Langevin equation driven by the fractional Gaussian noise (fGn), ξ f Gn (t) [8]: FGn is a stationary Gaussian process defined by the correlation function: where D H is a diffusion coefficient and τ is the time interval. This correlation function is negative for 0 < H < 0.5, zero for H = 0.5, and positive for 0.5 < H < 1 making the ξ f Gn antipersistent for H < 0.5 and persistent H > 0.5. Since ξ f Gn is a Gaussian process, the probability density function (PDF) for FBM is given by the Gaussian distribution: The MSD can be obtained as ⟨x 2 (t)⟩ = x 2 P (x, t)dx and reads: The anomalous diffusion exponent α is related to the Hurst exponent by α = 2H.
Recently, a variety of FBM-like stochastic models driven by fGn with random diffusivities were considered in [57][58][59]. Also, FBM can be generalized to HAT via multifractal fractional Brownian motion (mFBM) with a time-dependent Hurst exponent H(t) (H(t) is a deterministic function of time) [60][61][62]. It has been used in modelling turbulence [63] and time series data for ECGs [64][65][66] or generalized gray Brownian motion defined as FBM with random generalized diffusion coefficients drawn from the Mittag-Leffler distribution [67,68]. Generalizations of FBM characterized by time random Hurst exponents H(t) [69,70] and random H(t) and D H (t) were successfully applied for the description of the heterogeneous motion of endosomes [71][72][73] and the motion of cells in Drosophila embryos [74].
The fractional Langevin equation (FLE) shares many common features with FBM. Similar to generalizations to FBM discussed above, FLE with time- [53,75] and spacedependent diffusion coefficients [76] are good candidates to model HAT.
• Continuous time random walk (CTRW). The CTRW is a process that consists of instantaneous jumps, δr, drawn from the probability distribution, ϕ(δr), interspersed with periods of waiting drawn from the waiting time probability distribution, ψ(τ ) [77,78]. Depending on the existence of the mean and variance of these distributions, a wide variety of behaviours from sub-to superdiffusion can be modelled. For example, the power-law distributed waiting times ψ(τ ) ≃ τ α 0 /τ 1+α with 0 < α < 1 which leads to subdiffusion ⟨r 2 (t)⟩ = 6D α t α was used to interpret diffusion of tracer microbeads in actin networks [79], the motion of colloidal particles [80], and diffusion in the plasma membrane of living human cells [81].
• Lévy walks (LW). The simplest realization of a LW is a continuous time random walk model in velocity space [8,18]. It describes a particle that travels in a certain direction with constant velocity v for a random period of time τ drawn from the running time probability distribution ψ(τ ). After that period, the particle instantaneously and randomly changes its direction and the process is repeated. The position of the particle is given by r(t) = t 0 v(t ′ )dt ′ . If the running time distribution is a power law ψ(τ ) ≃ τ α 0 /τ 1+α with 1 < α < 2, the LW has subdiffusive behaviour ⟨r 2 (t)⟩ = 6D α t 3−α . More complex LWs can involve different velocities (either constant for each running segment drawn from the distribution of velocities or randomly fluctuating) [91] or periods of rests between runs [78].
Heterogeneous Lévy walks were proposed by introducing the space dependence of the directional switching rate. To model the interactions of walkers and the emergent superdiffusive regime, the switching rate was controlled by the local density of walkers [92]. Another model contained some statistical features of LW but describes HAT using an underdamped Langevin equation where each particle was characterized by its own relaxation time and velocity diffusivity [93].
• Distributed and variable order time and space fractional diffusion equations. Chechkin and colleagues studied the diffusion-like equation with a time-fractional derivative of the distributed order whose diffusion exponent varies with time [97][98][99]. This describes a random process in which the exponent of the MSD decreases (retarding sub-diffusion) or increases (accelerating sub-diffusion) in time. It was also used to describe ultraslow and anomalous aggregation [100]. Distributed order space fractional diffusion equation can describe accelerating superdiffusion [98].
• Soft-matter models. An alternative perspective is to consider models for anomalous transport that have their origins in soft-matter physics [11]. This is a rich area of research and has the advantage that anomalous dynamics are directly related to molecular architectures. Thus Zimm, Rouse and semi-flexible modes are expected for the sub-diffusive internal dynamics of polymer chains [101]. Membrane dynamics have their own characteristic sub-diffusive spectra. The collective dynamics of simple colloidal materials are also rich in phenomena e.g. in dense colloidal suspensions cage hopping is observed with a spatially dependent diffusion coefficient [102]. However, in molecular and cellular biology the variety of interacting molecules is often so vast, complex, and inaccurately characterised that the motion of particles cannot be directly related to their molecular structures. Thus statistical approaches that are agnostic with respect to the exact molecular details are preferable e.g. HAT. Furthermore, HAT emphasizes the statistics of the processes involved, which are often neglected in standard soft-matter models, e.g., standard Zimm, Rouse and semi-flexible models of polymers assume Gaussian fluctuations which are often poor approximations when HAT occurs. Thus, HAT models can be more accurate in describing the underlying statistical physics of the biological processes considered.
Anomalous transport leads to the motion of molecules in cells and these often react with one another. Reaction-diffusion equations form the mathematical framework to describe these phenomena and they have been important to understand pattern formation in biology, e.g. Alan Turing's seminal work on morphogenesis [103,104]. Similar to equation (1) the mean square position of a wavefront, ⟨R 2 (t)⟩, can be considered as a function of time (t) and regimes of anomalous transport can be defined, where R c is the critical radius for nucleation of the wavefront, D β is a generalized diffusion coefficient and β is the anomalous scaling exponent. Similar to α, when β < 1 the wavefront moves sub-diffusively, β = 1 is diffusive, β > 1 is super-diffusive, β = 2 is ballistic and β > 2 is super-ballistic. Historically, the non-ballistic motion of wavefronts was considered analytically by perturbative calculations from perfect ballistic motion by invoking the idea of pushed and pulled wavefronts [105]. However, these models struggle to handle the motion of strongly anomalous wavefronts. Thus, anomalous reaction-diffusion equations have been developed to understand particles with anomalous motility [45] e.g. waves of activity that spread across cells. How heterogeneity affects the motion of these waves [106] has not been quantified in detail in experiments. Agent based models for bacterial biofilms are able to predict super-diffusive wavefronts based on a fire-diffusive-fire formalism across a heterogeneous biofilm (with varying bacterial concentration) [50]. A subtlety is seen in how to define the position of the wavefronts in such systems. With bacterial signalling the best choice is the threshold concentration for excitation of the cells, not the peak concentration of the diffusing particles (which is strongly affected by the dimensionality and shape of the biofilms). An extensive range of theoretical models were developed for transport in porous heterogeneous materials driven by technological applications e.g. oil exploration in which the oil needs to be extracted from porous rock. Some of these models could be applied to biology, although they tend to stress heterogeneity in space and ignore that in time [107,108].

Modelling heterogeneous diffusion in heterogeneous media
Diffusion in heterogeneous media is intimately related to heterogeneous anomalous transport. Diffusion in heterogeneous media itself shows many anomalous features. A classical example is the diffusion on percolation clusters [109]. Relatively new developments include anomalous diffusion caused by ensemble heterogeneity and randomly changing diffusivity.

Diffusion in quenched random environments
The study of diffusion in disordered materials has a long history [1] with many applications e.g. the motion of oil in porous materials for extraction by the petroleum industry or water in glassy carbohydrates which is important in food spoiling and production. Heterogeneous diffusion in porous materials has been considered in detail by the pulsed NMR community [110]. Models were developed based on the analogy of an ant crawling diffusively in a maze. Simple models relate the rate of diffusion at long times to the porosity of the maze [110]. Diffusion on percolation clusters, named by de Gennes 'an ant in a labyrinth' [111], is the simplest system to study how diffusion is affected by the presence of microscopic random irregularities. Anomalous diffusion can result in the long-time behaviour of the ant if the maze has a fractal structure with fractal dimension d w , MSD(t) ∼ Dt 2/dw . For continuum percolation (the Swiss cheese model) in three dimensions, d w ≃ 3.2 corresponds to subdiffusion [112] and a non-Gaussian propagator [3]. Due to strong disorder, the generalized diffusion coefficients for single realizations of clusters, D = MSD(t)/t 2/dw , fluctuate randomly in time but their distributions at fixed times show universal fluctuations [113]. Furthermore, a universal Lévy distribution of single-particle diffusivity was found in a quenched trap model (a random walk on a quenched random energy landscape) with diverging mean trapping time [114].

Diffusing diffusivity models
For annealed disorder, a particle diffusing in a heterogeneous environment can be characterized by the time-dependent local diffusion coefficients randomly changing in time, D(t) = MSD(t)/t = 2dk B T /γ(t), following a stochastic differential equation [54,[115][116][117]. Chubynsky and Slater considered the position of a Brownian particle, x, described by Langevin equation with random time-dependent diffusivity that was governed by Ornstein-Uhlenbeck process via an auxiliary variable Y : where ζ and ν are independent Gaussian processes, τ is the correlation time of the Ornstein-Uhlenbeck process, and σ is the amplitude of the fluctuations of Y . The squared dependence, D(t) = Y 2 , guarantees the positivity of the diffusion coefficient. The MSD, in this case, grows linearly with time but the displacement distribution of particles has a non-Gaussian form at short times scale (hence the name "Brownian yet non-Gaussian" diffusion) and transitions to a Gaussian distribution at longer times [118]. Evidence for this type of motion was found for submicron tracers moving along linear phospholipid bilayer tubes and in entangled actin networks [119], for tracer dynamics in hard-sphere colloidal suspensions, nanoparticles adsorbed at fluid interfaces and membranes, inside colloidal suspensions and the motion of nematodes (see [116] for a recent review and references therein). Diffusing diffusivity phenomenon is also expected in heterogeneous diffusion of single molecules in dilute solutions due to the flexibility and reflects internal dynamic modes. Examples include flexible polymers (Rouse or Zimm chains), semi-flexible polymers, and membranes [101]. Other phenomena, such as reptation, the snake-like motion of entangled polymers in concentrated solutions, depend on one-dimensional diffusion along a tube and it is also affected by heterogeneous diffusion. Heterogeneous diffusion in reptation will be driven by both chain flexibility and the interaction with neighbouring flexible chains in the tubes.

Superstatistical models: anomalous diffusion caused by ensemble heterogeneity
Often non-Gaussian probability distributions of displacements can emerge as a consequence of polydispersity in the particles' masses and radii [120] or particles moving in patches of different local properties attaining different diffusion constants, D [118]. In contrast to diffusing diffusivity models, D in superstatistical models are random variables, but are assigned constant values for each trajectory. The probability distributions of displacements P (x, t) of such systems are given by (according to the super-statistical approach) the superposition of the probability distribution G(x, t|D) to find a particle at x at time t given its diffusion coefficient D with the probability distribution of diffusion coefficients p(D) [54,116,121,122], For a large class of HAT systems (containing anomalous yet Brownian diffusion [115,116,118]), G(x, t|D) is given by a Gaussian distribution and thus non-Gaussian tails for P (x, t) are due to the distribution of diffusion coefficients, P (D). Homogeneous diffusion with a single value for the diffusion coefficient,  74,115,118,123]. The Gamma probability distribution of diffusion coefficients (γ and β are constants), p(D) ∼ D β−1 exp(−γD), produces Laplacian tails in the P (x, t) [124]. On the contrary, power-law distributed diffusion coefficients, p(D) ∼ D −1−γ , lead to a power-law density of displacements, [54,72,125].
As discussed in [118], a constant probability distribution of polydisperse diffusion coefficients is not the only interpretation of (12). Instead, a particle can intermittently change gears to attain different diffusion coefficients from the distribution p(D).
The superstatistical approach to random diffusivity can also be formulated via stochastic processes. The superstatistical model for FBM of a variable Y (t) is defined as Y (t) = XB H (t) where X is a random (but constant) diffusivity drawn from a distribution of diffusivities [59,67,68,70,126,127] and B H is FBM with zero mean and unit variance. A generalised Langevin equation, which is closely related to FBM, describing non-Gaussian viscoelastic anomalous diffusion, was studied with a superstatistical approach via a random parameterisation of the stochastic force [128]. If the scale parameter is given by X = √ S α with S α drawn from the Mittag-Leffler distribution, the process is called generalized gray Brownian motion (ggBM) [67,126]. At short times ggBM is equivalent to the diffusing diffusivity model but behaves differently in the long time limit e.g., the probability distribution functions of ggBM remain non-Gaussian, whereas they tend to a Gaussian form in the diffusing diffusivity model.

Measuring HAT: experimental methods and measurables
State of the art experiments to probe heterogeneous transport in molecular or cellular biology are mostly based on optical microscopy [129]. These experiments allow the discrimination of individual particles (e.g. cells, single molecules, organelles, nanoparticles etc) and allow intrinsic heterogeneity in both time and space to be probed simultaneously without significant ensemble averaging. There have been many modern developments in optical microscopy that are ideal for studies of HAT in cellular biology. Specific examples include selective plane illumination microscopy with embryos [74], confocal microscopy with bacterial biofilms [130], and super-resolution fluorescence microscopy (specifically, stochastic optical reconstruction microscopy, STORM) with bacterial capsular rafts on the membranes of live E. coli [131].
In our experience, the most robust platform for unambiguously and routinely measuring HAT over a wide range of time and length scales has been to use absorption contrast of particles in an inverted optical microscope with a bright LED and ultra-fast CMOS camera (10 5 fps) [31]. The apparatus is identical to that extensively used by our group for particle tracking microrheology. Vibration damping is crucial in the design of the apparatus to isolate measurement noise from the intrinsic stochastic processes in soft biological materials. Thus a 'princess and the pea' architecture is used for our apparatus, with the microscope placed on a slab of acoustic isolation foam, which is placed on an active noise cancellation table, which in turn is placed on a large (high inertia) floated optical table. This provided us with routine access to particle tracks with sub 10 −4 s time and 10 nm spatial resolution e.g. for endosomes containing gold nanoparticles inside human cancer cells [132].
Fluorescence correlation spectroscopy (FCS) is normally microscope based and it has the advantage that faster motility can be explored (down to microseconds) than standard microscope-based methods that use fluorescence, which is useful for rapidly photobleached fluorophores, but length scale information tends to be lost in FCS (this can be ameliorated using multiple experiments with different measurement volumes, but this is often impractical with cells [22] and only intensity correlation functions can be conveniently extracted. A wide range of other experimental techniques support the existence of HAT. Inelastic and quasi-elastic scattering techniques demonstrate that anomalous transport exists ubiquitously in soft materials (synthetic polymers, membranes, globular proteins, etc.) at time scales down to picoseconds and length scales down to Angstroms. Such methods have been invaluable in establishing the field of anomalous transport (e.g. quasi-elastic neutron scattering, inelastic neutron scattering [133], X-ray photon correlation spectroscopy [134] and dynamic light scattering [24], but studies are predominantly constrained to ensemble-averaged measurements with a relatively poor spatial resolution (precluding single cell measurements). NMR modalities also provide strong evidence for HAT in ensemble-averaged data sets [110], but again their spatial resolution precludes single-cell measurements. Recently developed optically detected NMR techniques with nitrogen defects in diamonds provide improved resolution [135], but the measurements are then again optical microscope based (similar to standard instruments for HAT) and have not yet been specifically used for fluorescence-based studies of HAT.
Super-resolution fluorescence techniques based on stimulated emission-depletion (STED) microscopy have recently demonstrated with 1.7 nm resolution per millisecond for tracks of kinesins in live cells [136,137]. Specifically, the MINIFLUX version of STED was implemented in which minimum intensities of overlapping fluorescence emissions from fluorophores excited with a series of laser pulses were measured. Such worldleading performance will be invaluable for future studies of HAT in molecular molecular biology e.g. to model the transition from subdiffusion to superdiffusion for transport with motor proteins [132].

Heterogeneous microrheology
Rheology is defined as the study of the flows of materials. Practically it can be thought of as a form of generalized fluid mechanics in which all materials that flow are considered, not just those that are purely fluids. Thus, viscoelasticity (in which the responses of materials to flow are intermediate between solids and liquids) is of central importance in rheological studies.
Microrheology is an extension of classical rheological techniques to consider the flow of materials as a function of length scale; typically distances on the order of microns or smaller are considered [30,31]. There are a range of microrheology techniques, but the majority measure the response of nm-micron-sized probes to known stresses (either thermal stresses in passive microrheology or externally applied stresses in active microrheology). With structured complex fluids at the nm-micron scale, the responses of the probes is invariably heterogeneous and they experience heterogeneous anomalous diffusion. The expression for the viscoelastic compliance (2) shows that heterogeneous anomalous diffusion of the probe spheres necessarily implies heterogeneous linear viscoelasticity in equilibrium systems. In practice this is true in both active and nonactive systems, although (2) is not necessarily an accurate approximation [138].
There is a large range of evidence for heterogeneous microviscoelasticity in polymeric, colloidal and surfactant suspensions [30,31]. It is a universal phenomenon appearing in all gels, structured complex fluids, and semi-dilute polymeric solutions. Specific examples of heterogeneous compliance (and thus heterogeneous MSDs) in polymer gels include cytoskeletal filaments [139], mucins [140] and aggrecans [141]. Eukaryotic and prokaryotic cells both demonstrate heterogeneous intracellular microrheology [142].
In addition to heterogeneous viscoelasticity, the simpler field of heterogeneous viscosity is also actively being studied. For example, in suspensions of marine phytoplankton cells gradients of viscosity are observed in the vicinity of the cells e.g. 40 times the viscosity of seawater at 30 µm away from the cells [143]. Non-equilibrium concentration gradients of secreted molecules are expected to occur around the majority of cells (not just those embedded in biofilms or the extracellular matrix) and they will perturb the microrheology of their surroundings, which in turn will affect their motility.

Measurables to probe heterogeneous anomalous transport
Different statistical measures tend to emphasize different aspects of a stochastic data set. The choice of measure is dictated by the ability to robustly calculate them in experiments, their information content, and the ease with which they can be modelled, e.g. via analytic theory or stochastic simulations. In dealing with experimental HAT data, it is important to use as many complementary statistical tools as possible to fully describe the behaviour over as broad a range of time and length scales as possible.

Ensemble and time mean squared displacements
The ensemble-averaged mean squared displacement (eMSD) of 3D trajectories with where P (r, t) is the probability density function to find the tracked particle at time t and position r. The angled brackets denote averaging over an ensemble of trajectories, is the number of trajectories in the ensemble available at time t. Plotting the eMSD versus the time is a first check for anomalous diffusion defined via (1). The MSD can be fitted to a power law function to extract the anomalous exponent α and the generalized diffusion coefficient D α . Notice that α and D α are constants that characterize averaged transport properties of the ensemble of trajectories. To make the generalized diffusion coefficient D α dimensionless, the time and length scales are commonly set to τ = 1 sec and the length scale l = 1 µm. This facilitates the comparison of generalized diffusion coefficients between experiments, otherwise, they have varying fractional units which makes life awkward e.g. plotting them on a single axis of a graph is challenging.
A standard procedure for the estimation of the ensemble of the anomalous exponents and the generalized diffusion coefficients from the ensemble-averaged eMSD is to fit a straight line on a log-log scale using the least squares method to extract the slope and the intercept. The optimal number of data points for the best fit as well as accurate handling of the static and dynamic errors were derived for Brownian diffusion [144,145] and for fractional Brownian motion [146,147]. Some guidelines were also given to optimally fit mean square displacements of single particles in the general case of anomalous diffusion [148]. However, it is assumed the ensemble anomalous exponents represent all the trajectories at all times and, therefore, cannot accurately describe HAT. However, the eMSD is useful to determine whether ergodicity occurs as we discuss in Section 5.13.

Time-dependent anomalous exponents and generalized diffusion coefficients
The time dependent ensemble anomalous exponent α(t) can be calculated by the division of the logarithm of eMSD(t) by the logarithm of time, This method is not very accurate and leads to large errors due to poor statistics at longer times, especially when trajectories in the ensemble have different lengths i.e. different tracks in the ensemble have different durations. Fewer data points are available for averaging at larger times, which makes the eMSD noisy. For Brownian diffusion in 3D, the instantaneous diffusion coefficient is calculated as and the equilibrium diffusion coefficient is D eq = lim t→∞ D inst (t). In the case of anomalous transport, the MSD increases as a power law (13) and D inst (t) become coupled to the anomalous exponent α. Therefore, a fractional derivative in (15) or other definitions of diffusion coefficient must be used. A two-parameter fit of (13) is typically used to calculate the generalized diffusion coefficient D α . The time-averaged mean squared displacement (tMSD) of an individual trajectory is one of the most commonly used tools to probe diffusive properties of single-particle trajectories [149], where l is the characteristic length scale, T is the duration of the track, and i is the index of the track. For example, l is often set as l = 1 µm in experiments with cellular motion. The time averaging is used to improve the signal-to-noise ratios and typically the resulting short time data on the tMSD is much less noisy than the long time data due to the relative number of MSD samples that are averaged over.

Velocity auto-correlation function
The time-averaged velocity auto-correlation function (VACF) of an individual trajectory is defined as where ⃗ v = (⃗ r(t + δ) − ⃗ r(t))/δ , δ is the time increment, r(t) is the particle displacement at time (t) and T is the duration of the track. To improve statistics, the VACF can be further averaged over the ensemble of all the trajectories. For Brownian motion, the VACFs exponentially decay to zero, as observed in many cases of single-cell motility e.g. isolated human dermal keratinocyte cells [150]. For anomalous diffusion processes, the decay of the VACF is non-exponential and very sensitive to the nature of the anomalous diffusion processes [151,152]. Positive valued VACFs can be attributed to inertial effects or intermittent persistent movement [72]. Negative valued VACFs can be a sign of the viscoelasticity of the medium which induces antipersistent behavior in a particle's trajectory [152].

Local MSD, local anomalous exponent, and generalized diffusion coefficient
The local tMSD (L-tMSD), defined as the tMSD i in a small sliding time window (t − W/2, t + W/2) [71,153,154] (window duration W ), can be used to describe heterogeneous diffusion processes which switch between different states along a track. By fitting small numbers (e.g. 10 data points) of L-tMSD data points to power law functions, the time-dependent local anomalous exponents α L (t) and the time-dependent local generalized diffusion coefficients D α L (t) can be extracted, The time scale τ = 1 sec is introduced in order to make the local generalized diffusion coefficients dimensionless and facilitate comparison. The behaviour of the MSD determines the character of the movement of the ensemble of particles with normal diffusion for α L = 1, sub-diffusion for α L < 1, and super-diffusion for α L > 1. Similarly, the time dependence of the L-tMSD determines the character of the local movement of a particle from its trajectory.

Distributions of anomalous exponents and diffusion coefficients
Since HAT is defined by the time and space dependence of anomalous exponents and diffusion coefficients, one can characterise them by calculating their probability distributions. The probability distribution of local anomalous exponents α L and generalized diffusion coefficients D L were used to describe HAT in the dynamics of endosomes within eukaryotic cells [73] and the movement of single cells in Drosophila embryos [74]. However, one should keep in mind that limited statistics in a sliding window to calculate L-tMSD introduces errors in α L and D L (see section 5.6 above) and distorts their true distributions (it tends to broaden them). For example, for FBM with constant anomalous exponent α L = 2H and constant diffusion coefficient, one would get a normal distribution of α L and log-normal distribution of D L which will approach the expected Dirac delta distributions upon increasing the window size. Similar distortion was also found for individual anomalous exponents and diffusion coefficients due to the limiting statistics in L-tMSDs [155]. Another artifact of the calculation of MSDs in overlapping windows is the occurrence of spurious positive correlations between α L and D L , D L ∼ exp(α L ). Interestingly, α L and D L of hemocyte cells in Drosophila embryos were found to be anticorrelated [74]. One way to overcome these distortions is to avoid calculations of L-tMSDs altogether, for example by estimating anomalous diffusion and diffusion coefficient via machine learning techniques (see sections 5.14 and 5.15). Another possibility to obtain the probability distribution of diffusion coefficients for Gaussian processes (e.g. Brownian diffusion or FBM) is to use the Richardson-Lucy non-parametric method which is routinely used to deconvolve images that have been blurred by a Gaussian point spread function [156]. The method computes the probability distribution of diffusion coefficients, f (D), iteratively by inverting the probability distribution of displacements P (x, t) [118]: where P n = D 0 0 f n (D)G(x|D)dD is an approximation of the displacement distribution, G(x|D) is the Gaussian density, and the initial guess could be chosen as exponential In Sections 6 and 7 we will discuss various probability distributions of local anomalous exponents and instantaneous diffusion coefficients measured in experiments which also depends on how the anomalous exponents and diffusion coefficients were defined.

First passage probability
The first passage probability (FFP) can be the key statistical property in a physical problem e.g. the times for chemical reactions, the exit times of a particle from a maze, or the chain retraction times for reptation of branched polymers [157]. In general, the FPP is the probability distribution of the times for a process to reach a threshold value for the first time. For a set of tracks {R n (T )}, the first passage probability is calculated by finding the smallest positive time t that satisfies |R n (T + t) − R n (T )| = L at each starting time point T for each track (indexed by n) that travels a length L [158]. In the analysis of HAT, the FPP has the advantage that probability distributions can be considered as a function of transit length (L) e.g. to ask the question of whether long endosomal transits travel faster than short transits [132]. The mean of the FPP (the MFPP) provides an alternative to the MSD to characterise anomalous transport [132]. Different scaling regimes of a FPP distribution on time necessitate that there will be different scaling regimes for any reaction rates (which can be anomalous), since all reaction rates are to a degree controlled by transport processes (chemical reactivity can also be important).

Survival analysis
Another fundamental statistical process to describe time-varying data is described using survival analysis. This was originally developed in medicine and describes the fraction of patients alive in a trial after a certain time in terms of the survival probability Ψ(t). Useful algorithms have been developed to correct for censuring of the data, e.g. when people leave a trial before they die, and the empirical survival probability can be estimated by using the non-parametric Kaplan-Meier estimator [159]. Furthermore, a hazard function can be defined that facilitates the analysis of complex survival functions γ(t) = −Ψ ′ (t)/Ψ(t) where Ψ ′ (t) = dΨ(t)/dt, e.g. to characterise non-Poisson distributions in which the hazard rate varies with time. The survival function Ψ(t) can be written in terms of the probability distribution of the first passage time F (t) as Ψ(t) = 1 − F (t). Survival analysis was used to understand the switching between persistent and antipersistent motions of endosomes in human cells and hemocyte cells in drosophila embryos [71,74,160].

Directional persistence
Directional persistence is a key parameter of particle tracks, which is invisible to a mean square displacement analysis since MSDs are only sensitive to amplitudes of motion, not directions [161]. One method is to consider the average cosine of the angles between successive displacements (r i , r j ) of the particle as a function of time interval (τ ). This can be conveniently calculated using vector calculus, ⟨cos θ(τ )⟩ is bounded between 1 and -1 which corresponds to persistent and antipersistent motion respectively. The values of the cosine can be averaged over time and considered as a function of time interval (τ ), similar to a time-averaged MSD. Directional persistence was used to analyse tracks from endosomes in human cells and anti-persistence was induced by the motor protein tethers of the endosomal cargoes [161].

Intensity correlation function analysis
An experimental technique introduced to measure the mobility of molecules is called fluorescence correlation spectroscopy (FCS) [162]. The technique measures the time correlation functions of the intensity fluctuations of fluorescence emitted by fluorophores in a sample that are created by changes of the local concentrations. Often FCS can explore the dynamics of molecules at 2-3 orders of magnitude faster time scales than an equivalent particle tracking experiment using fluorophores due to the reduced requirements on its detector (point detectors are used for FCS as opposed to CMOS pixel arrays for microscopy imaging). Single molecules can also be examined with FCS by calculating the auto-correlation curve of the fluorescence intensity F (t) where δF (t) = F (t) − ⟨F (t)⟩ and ⟨...⟩ is the time averaging. Fitting the autocorrelation function, C(τ ), to the function [9,23,163] where A and B are constants, one can determine the generalized diffusion coefficient D α and the anomalous exponent α. The main weakness of the technique is that the correlation functions mean that data is analysed less directly, and both time and spatial averaging occur. There are therefore more sources of ambiguity in elucidating the fundamental causes of HAT in a FCS experiment than in an equivalent microscopy tracking experiment.

Distribution of displacements and increments
The probability density function (PDF), P (r, t), to find a particle at displacement r(t) at time t assuming it begins from the origin r(0) = 0, is an important statistical tool to describe the transport properties of particles. The ensemble mean-squared displacement is the second moment of the displacement PDF, which provides a connection to equation (13). Particles moving via Brownian diffusion have Gaussian probability distributions of displacements. A departure from the Gaussian form is a hallmark of anomalous transport. In low-resolution data the departure from a Gaussian can be quantified using the fourth moment of the distribution, the kurtosis, which is more robust to smaller numbers of samples than attempting to fit the full probability distribution. The third moment of the PDF is called the skew and often indicates driven transport e.g. the activity of motor proteins or convective transport that breaks the symmetry [164]. However, skew is not indicative of anomalous transport, since driven motion with a significantly skewed PDF can occur in both Brownian and anomalous systems. Non-Gaussian distributions of displacements were experimentally observed for a subdiffusive motion of cytoplasmic RNA-protein particles in live Escherichia coli and Saccharomyces cerevisiae cells [123] and for the acetylcholine receptors diffusing on live muscle-cell membranes [165]. Furthermore, the distribution of increments of 150 nm diameter nanoparticles passively moving in Dictyostelium discoideum cells was found to have a parabolic Gaussian-like region for small increments and an exponential tail for larger increments [166]. In living cells, non-Gaussian transport features are thought to be due to 1) sample-based variability, 2) rarely occurring events with large amplitude motions, 3) aging and 4) spatiotemporal heterogeneities of the medium.
Another important tool that can be used to test HAT is the distribution of increments ∆r, P (∆r, τ ), a particle makes during the time interval τ . Brownian diffusion and fractional Brownian motion are characterized by Gaussian P (∆r, τ ). Non-Gaussian distributions of increments can be a signature of HAT.

Ergodicity breaking and aging
For ergodic motion, such as Brownian diffusion, the Birkhoff ergodic theorem allows the diffusive properties to be estimated using a single long trajectory via the equivalence of ensemble averages and long-time averages [8,167], In experiments, frequently only short trajectories are available and the long time limit of tMSD i (t) is replaced by additional averaging over the ensemble of trajectories to improve statistics, Checking for the equivalence between the time-averaged and ensemble-averaged MSDs can be used to discriminate between different models of anomalous diffusion i.e. testing for ergodicity [5,8]. Several ergodicity breaking (EB) parameters were introduced to probe the ergodicity of anomalous diffusion [8,168]. An EB parameter based on the fourth moment of the particle displacement, with ξ(t) = tMSD i (t)/tMSD(t), works well when all particle trajectories have the same duration. For heterogeneous trajectories which have different durations, the ergodicity breaking parameter is defined using the ratio between the ensemble and the timeaveraged MSDs, works better. For ergodic processes, the EB parameter converges to zero. Many anomalous transport processes in living cells demonstrate ergodicity breaking [169,170], i.e. the non-equivalence of time and ensemble averages, ⟨tMSD i (t)⟩ ̸ = eMSD(t). The time-averaged MSDs for single trajectories, tM SD i (t), remain random functions and can be described by probability distributions that depend on the anomalous dynamics. For example, with mono-fractional CTRW-type motion, this distribution takes the form of a one-sided Lévy stable probability distribution [168] and approaches the expected delta-function as the anomalous exponent goes to 1. Ergodicity breaking could stem from the non-stationary nature of the process when the probability distribution of the times associated with immobilization events has a divergent mean [5,8]. Several experiments have shown that nonergodic behavior is a consequence of interactions occurring in heterogeneous cellular environments [21,170]. Ergodicity breaking could also emerge as a consequence of the heterogeneity of the system, in the absence of immobilization [70]. In some systems ergodic and non-ergodic processes coexist, e.g. in plasma membranes [81] and the intracellular transport of insulin granules [94].
Aging is probed by delaying the beginning of the measurement over the aging time t a from the initiation of the process at t = 0 or equivalently aging the system in (−t a , 0) and starting the measurement at t = 0. Statistical aging is defined as the dependence of quantities on both the measurement time t and the aging time t a [171]. Aging is related to ergodicity breaking [172]. The ensemble MSD for aging is [173,174] Similarly, the aging tMSD of a single trajectory is defined as Notice that for experimental trajectories, the duration of a trajectory, T ′ , must be adjusted using T ′ = T − t a , where T is the duration of a non-aged trajectory. The adjustment can be omitted when dealing with simulated trajectories which can be made as long as required, but it is necessary in typical experiments.

Machine learning of single trajectories
Hidden Markov Models (HMMs), a variety of machine learning, have been used to analyze heterogeneity in biological experiments based on the analysis of classical diffusion coefficients [175]. Unfortunately, most of the current models for anomalous transport required to motivate (1) have a memory (in agreement with experimental measures, such as the survival time) and are thus non-Markovian. This presents a serious flaw of HMMs [71] and they are expected to perform badly in the majority of cases in intracellular and cellular biology as a result. An alternative method is to use deep learning neural networks (see the section below) trained on non-Markovian models that can better handle the subtle effects involved. Thus, calculating a spectrum of FBM exponents for endosomal motility with a neural network was found to outperform an equivalent HMM [71] which failed to accurately describe changes in the persistence of the motility. A Bayesian approach for anomalous dynamics in many cases relies on Markovian approximations and inference with HMM models. It was used for least-squares fitting of empirical MSD curves of particle motion in live cells [176], to discriminate between FBM and GLE models [177], for model selection and parameter inference for Lévy walk [178], FBM [179], diffusing diffusivity [180,181], and time-dependent random walk [117]. A variational Bayesian framework for reaction-diffusion models [182] based on a hidden Markov model identified transitions between several diffusing states from short trajectories.
Random forests are another machine learning method that was used to classify the underlying diffusion mechanism of anomalous diffusion trajectories and estimate the anomalous exponent [183].

Neural networks for time series analysis
A wide range of machine learning techniques are showing promise for the analysis of time series data e.g. data from electrocardiograms (ECGs), stock market returns, and particle tracks [184]. In tracking experiments, neural networks (NN) can provide an order of magnitude improvement in sensitivity and robust analysis of highly non-linear phenomena that are hard to handle analytically, allowing heterogeneity to be robustly quantified in space and time [14,71,185]. NNs and other more old-fashioned machine learning techniques provide a major impetus for research in heterogeneous anomalous transport. Important tasks performed by NNs with HAT include the estimation of the anomalous exponent, the classification of the anomalous diffusion model, and the automatic segmentation of trajectories. Here NNs are being used to emulate generalized non-linear functions, which is permitted by the universal approximation theorem [186].
Different NN architectures have been used to analyze HAT. Here we overview several NN architectures with an emphasis on those which were used for inference of the anomalous exponents and diffusion coefficients. We will discuss their morphology, and training in order to better understand their pros and cons. The benchmarking of different NNs is an open problem that has been attempted during the AnDi challenge where different software to analyse anomalous transport were compared based on common datasets [14]. Due to the rich variety of possible parameters to quantify the algorithms' performance, it is concluded that the software chosen needs to be matched to the applications.
In addition to the NN architecture chosen, how the training data is presented to a NN can make a big difference e.g. whether a network is trained on particle positions or particle displacements, normalized or unnormalized data is used or 3D data is reduced to 1D (e.g. using the absolute magnitude of the displacement). This process of feature engineering is an art form during the use of neural networks [187]. Intuitive choices for feature engineering tend to focus on the reduction of ambiguity in data sets (reducing the variability in data sets due to parameters that are irrelevant to the models used) to maximize their information content.
A simple feedforward neural network (FFNN) performed well when calculating instantaneous time-dependent anomalous exponents [71]. To achieve this, a window of size M was fed with the displacements scaled by their range and shifted along the particle trajectory. The input layer was equal to the length of the window and a single output neuron was used. Multiple NN architectures were tested (anti-triangular, rectangular, and triangular deep learning NN) with different numbers of hidden layers. The NN had no regularization and was trained on FBM trajectories without noise. The results showed no improvement for NNs with more than 3 layers and were insensitive to the NN architecture. The accuracy of NN assessed by the mean absolute error, σ H = N n=1 |H sim n − H est n | /N , was σ H ≃ 0.05 for trajectory lengths of 50 data points which is much lower than other existing methods. The main advantage of the NN is its high accuracy and simplicity. The drawback of this NN is that it only works for data sets that are consistent with FBM motion due to its training, therefore a preparatory statistical analysis is necessary to ensure this. Secondly, the NN in the present architecture only predicts the time-dependent anomalous exponent. Further work is needed to modify it to predict both the anomalous exponent and the generalized diffusion coefficient. This NN was not a part of the AnDi challenge [14], however, it has been made available for benchmarking [71]. Previously FFNN with backpropagation was applied to differentiate between Brownian, confined, and directed modes of motion and detect their transitions in time [188]. Combined with feature engineering based on classical statistics, FFNN was used to identify the anomalous diffusion model, infer the anomalous exponent as well as segment trajectories [189,190].
Recurrent neural networks (RNN) have been used for inference of the anomalous diffusion exponent and for the classification of anomalous diffusion models [191][192][193][194] and changes between diffusion models [195]. Although questions exist with regard to their accuracy in handling long-time correlations. The architecture of the NN called RANDI was chosen using two layers of RNNs with long-short-term memory (LSTM), with dimensions of 250 and 50. The input trajectory was fed into the first LSTM layer. The NN was trained on increments of trajectories (∆x i = x i+1 − x i ) generated similar to tests of the AnDi datasets which consisted of a mixture of trajectories from several anomalous diffusion models [14]. The increments were normalized to have zero mean E[∆x] = 0 and unit variance σ 2 (∆x) = 1 for each trajectory. The NN was trained on trajectories of a fixed length when tested on trajectories of different lengths. To make a prediction on a trajectory of a certain length, the predictions made on the closest lengths were combined. The mean absolute error ranged from 0.34 for trajectories containing 32 points to 0.1 for trajectories containing 1024 points. The authors made RANDI freely available as a Python package [192]. The disadvantage of RNN models is that it can take a long time to train them since they are typically updated for each training example. Bayesian Neural Networks (BNN) or Deep Learning Bayesian NN (BDLNN) use an ensemble of neural networks to model the distribution of outputs [196]. In [197] a recurrent LSTM neural network architecture was chosen to obtain anomalous exponent inference, and anomalous diffusion model selection together with the corresponding error estimates. Trajectories from AnDi data sets [14] used for training were converted to increments ∆x t = x t+1 − x t and normalised to a unit standard deviation. The NN was implemented using the PyTorch library and the code was made freely available online [197].
A convolutional neural network (CNN) architecture was used for classification to discriminate between FBM, CTRW and Brownian diffusion models, the inference of anomalous exponents from FBM trajectories, and for the inference of the diffusion coefficient of Brownian motion [185]. The network architecture consisted of four sets of convolution blocks with different filter sizes which operated in parallel and were implemented in Python using TensorFlow. The network was trained on 300, 000 simulated trajectories with added normally distributed localization error and received the velocity autocorrelation function as the input, equ. (17) and was tested with the anomalous exponents of experimentally obtained trajectories of fluorescent beads diffusing in entangled F-actin network gels with various mesh sizes estimated via time-averaged MSDs. Both experimental trajectories and neural networks were made available [185].
A CNN with sparse long-distance connections (WaveNet) is expected to provide state-of-the-art performance for the analysis of anomalous transport [198] due to the accurate handling of long-time correlations. The NN consisted of the input, WaveNet encoder, recurrent neural network (3 stacked long short-term memory LSTM units), and output multilayer perceptron (MLP). The NN was implemented using the PyTorch library and trained on trajectories from the AnDi datasets [14]. Before training or inference, trajectories were normalized to ensure zero average and unit standard deviation of the positions. The addition of the convolutional unit improved the NN performance compared to recurrent NN, however, it increased the complexity during design and training. The code is freely available online [198].
Graph neural networks (GNN) use supervised learning to infer the physical properties of anomalous random walks by representing trajectories in the form of graphs and associating a vector of features with each observed position [199]. Trajectories were normalized either by the standard deviation of step sizes or position or by using the mean step size. The GNN architecture consisted of an encoder and task-specific multi-layer perceptrons estimating the anomalous exponent and the random walk model. GNNs are expected to perform poorly if long-range correlations are present, therefore special care must be taken for classifying various random walks and inferring their anomalous exponents [199].
Transformer neural networks are a recent development in time series analysis [200] (they famously occur in ChatGPT) and they may be efficient to describe anomalous transport since their attention mechanism could be matched to the memory of the non-Markovian transport process [201]. Specifically, an attention mechanism is included in the transformer architectures and this can provide longer terms memories in comparison with other architectures e.g. the long short-term memory (LSTM) constraint on RNNs [202]. In [201] a bi-layered CNN architecture was combined with a transformer (the Convolutional Transformer) to extract features from trajectories by CNNs and then fed them to transformer encoding blocks to infer anomalous exponent or to perform anomalous diffusion model classification. The model was trained in Python with Pytorch framework using the AnDi datasets [14].
Generative adversarial neural networks (GANs) have not yet been extensively used with anomalous transport (they are a key algorithm in the creation of deep fakes), but they could be used to improve statistical measurements with relatively limited training data ( [203]) through the creation of virtual tracks. In general, GANs are often used to leverage relatively small amounts of training data during data analysis.
Variational autoencoders (VA) are closely related to GANS and can provide nonlinear separation of features (a non-linear generalization of classical linear principal component analysis for spectral decomposition [204] i.e. dimensionality reduction of data sets which is often used in classification problems. VAs could be used to analyze tracking data in which non-linear dependencies are expected e.g. distributions of run times on subsequent rests. An unsupervised convolutional autoencoder architecture was used to classify anomalous diffusion models and detect mixed diffusion models [205].
In the next section, we discuss further perspectives on the use of neural networks for time series analysis.

Perspectives on the use of NNs
An advantage of many NN models in the analysis of tracks is their sensitivity, so a small length of the trajectory (window size) can allow robust classification of the anomalous exponent. Thus HAT can often be explored as a function of time and space, without any additional experimental requirements on track length or data acquisition rate, if NNs are used to analyse the data. Furthermore, it is possible to calculate anomalous exponents over a range of time intervals without retraining a NN by regularly down-sampling data sets. This allows the calculation of coherence times over which there are distinct changes between the anomalous exponent probability distribution functions extracted using the NNs e.g. from peaks at α = 2 to α = 1 for a run and tumble model of a bacterium as longer time intervals are considered. Thus, the multi-fractal properties of the data sets over a range of time intervals can be probed.
Skeptics have questioned whether time series data is being overfitted with the new NN models. Assuming the data is well described by a particular anomalous transport model (including HAT) then clearly the answer is no. The fact that FBM has an obvious geometrical interpretation implies that it will correctly model changes in the persistence of motility. Thus as a form of non-linear spectral analysis with strong geometrical foundations, FBM using NNs will provide useful information in most cases. Classical models typically neglect directionality and with NNs it is thought to be a key contributor to their increased sensitivity e.g. with NNs trained on FBM it makes them sensitive to transitions between persistent and anti-persistent motion. Alternatives to HAT calculated with NNs normally involve underfitting with a bad classical model e.g. using heterogeneous classical diffusion for motility in high-concentration systems, such as cells.
It is expected that neural networks can be used to analyse time series data in microrheology. For example, tracks of probe spheres in a complex fluid with timeor space-varying viscoelasticity could be measured. The NN could then be used to characterise the viscoelasticity of the fluids with time or space e.g. the process of gelation with antibodies in response to a pH switch could be followed or the large spatial heterogeneity in the response of formulated complex fluids (e.g. lamellar gels in hair conditioner) and, in general, it would be reasonably straightforward to probe a wide range of power-law fluids that can be modelled with fractional Brownian motion. Thus NNs should provide an order of magnitude improvement in sensitivity in probing heterogeneous viscoelasticity in both space and time.
Scaling laws are very common in physics e.g. to calculate the conformations of polymer chains [11]. It is expected that analogous neural network techniques to those used for tracking experiments [71] (e.g. considering polymer chains in space rather than tracks in time) can be applied to a broad range of scaling problems in soft-matter physics to provide a robust description of heterogeneity e.g. to better describe boundary effects on flexible polymer chains, non-Markovian effects on polymers (semi-flexible polymers will have longer memories and thus higher fractional exponents than flexible polymers to describe the spatial persistence), heteropolymer conformations, the conformations of polymer chains inside gels in the presence of quenched disorder, block copolymers and flexible polyelectrolytes. Scaling models for polymers are often ad hoc multi-fractals e.g. de Gennes introduced the idea of blobs to understand flexible polymers. The blob length can be found by equating the thermal energy (kT ) to the free energy of the fraction of the chain involved and the chain conformations are multi-fractals, as a result, i.e. the scaling exponent for the chain size as a function of the degree of polymerisation has a sudden change in behaviour at the blob size. The use of a NN to analyze the positions of monomers along a fluorescently labelled polymer chain or the chain conformations from molecular dynamics simulations of polymers could allow rigorous quantification of the blob idea e.g. monodisperse blobs may be a convenient mathematical fiction and flexible polymeric chains are more accurately described by a continuous spectrum of scaling exponents as a function of length scale (or equivalently the number of monomers) [71].
In addition to the analysis of tracks, NNs can be useful to create tracks from microscopy experiments. Convolutional neural networks can be used to segment particles in images from a movie, which are subsequently linked together to form tracks [206]. CNNs tend to require less user optimisation than Gaussian trackers and they can function well with less symmetric particles e.g. bacterial cells. However, care is needed to physically constrain NNs for their optimal performance and under some conditions classical tracking techniques are preferable e.g. Gaussian trackers used with movies of diffraction-limited particles that are well described by Gaussian functions can outperform unconstrained CNNs.

Computer simulations
Simulations play a crucial role in testing the underlying physical mechanisms of anomalous transport. They are also indispensable for the creation of training sets for neural networks. Several anomalous diffusion models e.g. FBM [207], CTRW [2,77], Lévy flights/walks [18], scaled Brownian motion [53] and diffusion processes with space-dependent diffusivity [76] have become popular. Some standard algorithms for simulating FBM are the exact methods due to Hosking, Cholesky-Levinson, Davies-Harte, and Wood-Chan. Although the Hosking and Choleski-Levinson methods are exact, they are slow, since their computational complexity is of order O(N 2 ) and O (N 2 logN ), respectively. The Wood-Chan method is exact for simulating fractional Gaussian noise (increments of FBM), and is faster O (N log(N )). Several approximate methods also exist to simulate FBM e.g. the stochastic representation method [207] and wavelet-based simulations [208,209]. Some models have been extended to heterogeneous FBM [71,72,210,211], multifractional FBM with deterministic and random exponents [212,213], heterogeneous CTRW [87], heterogeneous Lévy flights/walks, and subordinated combinations to simulate HAT.

Case studies in molecular biology
There are a large number of examples of HAT in soft matter physics. Here we will focus on examples found to date in molecular biology experiments, but many more are expected to occur due to the rich variety of molecules available e.g. carbohydrates, polyphenols, proteoglycans, etc. will all experience HAT. We thus expect to find HAT in the majority of biological systems in vivo.

DNA
DNA is unusual compared with most synthetic polymers in that it can be completely monodisperse i.e. all DNA chains have the same length when sourced from a specific chromosome of a single organism. However, the 3D spatial organization of the chromatin (DNA with associated bound proteins) in the nucleus of eukaryotic cells is hierarchical and highly heterogeneous [214]. Chromosomes occupy specific regions of a nucleus, called chromosome territories and each consists of megabase A compartments which contain a large amount of the bases for expressed genes, and B compartments with less transcriptionally active bases [215]. Compartments are further spatially structured into multiple self-interacting topologically associating domains (TADs).
Unsurprisingly, due to its polymeric nature, the dynamics of DNA is mostly anomalous on a short time scale (e.g., [216,217] and many others). A simple Rouse polymer model predicts sub-diffusive dynamics of the chains with the anomalous exponent α = 0.5 [218], the inclusion of the hydrodynamic interaction between neighbouring sections of the chain gives α = 2/3 (Zimm dynamics) and inclusion of both semi-flexibility and hydrodynamic interactions gives α = 0.75. The heterogeneous structure of chromatin, interactions with various proteins (e.g. cohesin and RNA polymerase II), different geometrical constraints, and ATP-driven active processes lead to the heterogeneous dynamics of the DNA (see [219] for a recent review). Polymeric modeling shows that the formation of TADs is a key driver of heterogeneity in the motion of chromatin [220]. Various values of anomalous exponent α were reported depending on the position in the genome, cell type, and cell condition [221]. A wide range of spot-to-spot variability in diffusion dynamics was found in a mouse ES cell line carrying TetO arrays inserted at random genomic locations [222] quantified using D α s and αs. Heterochromatin-rich regions showed less movement, while the activity of transcriptional machinery increased the amplitude of the dynamics [223].
Telomere (regions at the ends of chromosomes) dynamics in living mammalian cells is highly heterogeneous [224][225][226]. Analysis of single telomeres' trajectories revealed exponential distributions of displacement increments [224]. There was a crossover of tMSDs from sub-diffusion α < 1 for time lags τ < 10 seconds to normal diffusion α ≃ 1 for large τ [224,225]. Although there could be several reasons for such a crossover, it was argued that the heterogeneity of telomeres' motions stems from non-equilibrium cytoskeletal forces that drive their anomalous diffusion [224].
Spatially resolved mapping of genome dynamics was organized into domains with a mosaic pattern (figure 1) [223,[227][228][229]. Such correlated motion might be caused by active mechanisms [230], such as transcription [229]. Long-range directional movement of interphase chromatin was also found which was super-diffusive [231][232][233]. Similarly,  Reprinted from [125] coherent movements over 1 µm distances by human RNA polymerase II molecules were observed over the entire nucleus in human cancer cells [234].
Many DNA-binding proteins (Rad4-Rad23 in yeast or telomeric sequence binding proteins TRF1, TRF2, and SA1) have anomalous dynamics, which were reviewed recently in the context of facilitated diffusion [235]. A broad probability distribution of anomalous exponents and a power-law probability distribution for the generalized diffusion coefficients of individual histone-like nucleoid-structuring (H-NS) proteins were found for their anomalous dynamics (figure 2) in Escherichia coli [125]. Furthermore, non-Gaussian power-law-type (Pearson VII) probability distributions of protein displacements were observed. The H-NS proteins interact with both other proteins and DNA simultaneously. The broad probability distribution of anomalous exponents was due to the active motion of a sub-population of the H-NS proteins, while the power-law probability distribution of generalized diffusion coefficients, P (D α ) ≃ D −(β+1) α , where β is a constant, was suggested to stem from protein polymerization. Transient non-specific interactions of DNA-binding proteins with DNA were shown to lead to highly heterogeneous dynamics with a broad probability distribution of diffusion coefficients [236] and a broad (power-law) survival probability distribution for nonspecific (TetR-DNA) interactions [237]. These results indicate the current paradigm of facilitated classical diffusion for the dynamics of DNA-binding proteins needs to be extended.

RNA
RNA-protein particles (MS2-mRNA complexes) in the cytoplasm of the bacterium E. coli and the eukaryotic microorganism S. cerevisiae (budding yeast) exhibit heterogeneous subdiffusive behavior of viscoelastic origin [123]. Probability distributions of the displacements of the complexes, ∆x = x(t+δ)−x(t), where δ is a time increment, were found to be non-Gaussian at all observed timescales and were well described by a Laplace distribution, where σ describes the breadth of the distribution. Subdiffusive dynamics of MS2-mRNA complexes which had Laplace distributions of displacements were also reported [238].
In [123], the probability distributions of generalized diffusion coefficients extracted from the time-averaged MSD of individual RNA-particle trajectories were well approximated by an exponential distribution, where ⟨D⟩ describes the breadth of the distribution. This non-Gaussian behavior is a consequence of significant heterogeneity between trajectories and dynamic heterogeneity along single trajectories.

Membranes
Membranes fulfill many crucial functions in cellular biology as external barriers that maintain the integrity of cells and as barriers that define specialized organelles. The cell membrane was long known to be highly heterogeneous [20] with patches with strongly varying diffusivity [239] e.g. the lipid raft hypothesis. Various mechanisms underlying anomalous diffusion in plasma membranes have been reviewed [240,241]. For example, anomalous subdiffusion can be caused by transient immobilization and spatial heterogeneity [19] or specific interactions between molecules which temporarily modify their diffusive behavior [242][243][244][245]. The spatially heterogeneous dynamics of two membrane proteins transfected into COS-7 cells was measured using photoactivated localization microscopy (PALM) combined with live-cell single-particle tracking [246]. Some trajectories showed subdiffusion typical for motion in a membrane. Broad probability distributions of shorttime diffusion coefficients revealed dynamic heterogeneities in the cell membrane. Hyperspectral microscopy allowed single particle tracking of quantum dot labels at 27 frames per second and could resolve the time-dependent instantaneous diffusion coefficient of membrane proteins interacting with membrane-associated structures [247]. The motion of dendritic cell-specific intercellular adhesion molecules on cell membranes revealed nonergodic subdiffusion, which was interpreted as due to changes of diffusivity (figure 3) [21]. Single receptor trajectories displayed Brownian motion with relatively constant diffusivity over intervals of varying length but changed significantly between intervals. The probability distribution of diffusion coefficients, D, could be described by a Gamma distribution, with σ > 0 and b is a constant. Anomalous dynamics of Kv2.1 potassium channels that form stable clusters in the plasma membrane of transfected human embryonic kidney cells and native neurons have both ergodic and nonergodic components with very broadly distributed anomalous exponents [81,248]. A probable reason for the HAT behaviour is the binding of the ion channel clusters to clathrin-coated pits with a heavytailed distribution of immobilization times. Molecular dynamics simulations suggested that interactions between membrane-binding proteins and lipids can lead to fluctuating diffusivity [249]. Numerical simulations predict multi-fractal, non-Gaussian, and spatiotemporally heterogeneous anomalous lateral diffusion for the motions of lipids and membrane proteins [250]. Protein crowding induces spatiotemporal heterogeneity in the lateral diffusion of lipids that stochastically transition between high and low diffusivity states. The probability distribution of generalized diffusivities D α has a double-Gaussian form.
Self-assembled lipid vesicles created in solution are popular as a model for cell membranes. Heterogeneous fluctuations of the environment lead to non-Gaussian diffusion of fluorescently tagged unilamellar lipid vesicles (liposomes) diffusing in nematic solutions of aligned F-actin filaments with a spectrum of diffusivities. The MSDs of liposomes grew linearly with time, but the PDFs of the displacements showed a crossover from a Laplace (double-exponential) form at small times to a Gaussian form at later times [118].
Dynamical heterogeneities were recently reported for the diffusion of transmembrane AChRs receptors in Xenopus embryo muscle cells [165] which moved subdiffusively with a broad spectrum of anomalous exponents α on time scales of up to 1 second and performed Brownian diffusion at larger times. The transition from sub-diffusion to Brownian diffusion and non-Gaussian distributions of displacements with exponential tails suggest spatiotemporal fluctuations of diffusivity, which is also supported by the observation that the mobile trajectories have some immobile segments of varying durations [165]. These transient confinements of AChRs could be caused by the transient binding of the AChRs to the cortical actin network.
Lipid microdomains (lipid rafts) attached to capsular carbohydrates were found to experience HAT on the surfaces of E. coli in super-resolution fluorescence microscopy experiments [131]. The motion of the labeled rafts was subdiffusive with the average anomalous exponent α = 0.33 ± 0.23 at 1 h and α = 0.36 ± 0.20 at 1.5 h. The rafts eventually merge together at long time scales to sterically stabilise the bacteria with giant carbohydrate brushes (>200 nm in height).
Finally, the superdiffusive motion of molecules that target C2 domains in membranes alternated between phases of two-dimensional motility on the membrane and three-dimensional diffusive excursions [251]. Such intermittent dynamics was characterized by a probability distribution of displacements with both Gaussian and Cauchy components and a broad probability distribution of diffusion coefficients. The field of non-Brownian interfacial anomalous diffusion was recently reviewed [252].

Endosomal transport
Endosomes are sophisticated membrane-bound packages that are transmitted intracellularly (extracellular transport is also possible via exosomes). Endosomes constitute a complex targeted postal system inside eukaryotic cells (e.g. membranebound proteins, such as Rab, can label endosomal packages that direct processes of sorting) and are used for the active transport (in contrast to passive diffusion) of molecules. Early studies of active transport considered engulfed microspheres moving along microtubules in living eukaryotic cells [253]. The mean square displacement of microspheres was superdiffusive at short times, with a clear crossover to subdiffusive or ordinary diffusion scaling at longer times. However, for large spheres this is a process of phagocytosis and the biology is different to endocytosis that creates standard endosomes (there is a size filter that differentiates between the two possibilities, with smaller particles experiencing endocytosis). In general, the biological phenomena associated with endocytosis of nanoparticles are very rich [254].
In a series of publications over the last 10 years, our group has considered a number of the issues involved in heterogeneous endosomal transport. Our early microscopy experiments suffered from issues with signal-to-noise ratios and temporal sensitivity. Robust statistics (e.g. MSDs) could only be extracted from data sets that were averaged over both the time and the ensemble. The process of endosomal transport was found to be anomalous and super-diffusive [160,255,256]. This was followed by the use of an ultrafast camera with gold-containing endosomes (plasmon-based contrast using bright field microscopy). These experiments allowed the heterogeneous anomalous transport of single endosomes to be probed at very fast (sub-millisecond) time scales. The anomalous transport was still found in single particle tracks and at fast time scales. A crossover was observed from sub-diffusive motion at fast time scales (<0.01 s) to super-diffusive motion at intermediate time scales i.e. clear evidence for HAT as a function of time interval was observed. This was followed by slower fluorescence microscopy experiments that labelled specific types of endosomes (Rab5 early endosomes). These experiments allowed us to collect a large ensemble (more than 10 4 ) of trajectories and track them over much longer time scales. More recently, based on observations pointing to FBM-like dynamics, we developed a neural network (NN) to probe the heterogeneities in the anomalous transport of endosomes [71]. The NN successfully learned patterns in endosomal dynamics in terms of persistence (a tendency to keep or reverse the direction of motion quantified by the Hurst exponent H) and elucidated switching from sub-diffusive to super-diffusive motion in a single trajectory (figure 4) without using any arbitrary thresholds [160,257]. Instead of normal and log-normal probability distributions of anomalous exponents and generalized diffusion coefficients respectively, expected for homogeneous systems, an exponential probability distribution of anomalous exponents and a broad power-law probability distribution of generalized diffusion coefficients were found [72] (figure 4e, f), similar to heterogeneous dynamics of histone-like nucleoid-structuring proteins in live Escherichia coli [125]. For endosomes, we also found power-law probability distributions of displacements and increments [72]. Power-law probability distributions were previously reported for trajectories of individual nucleoid-structuring H-NS proteins [125]. Endosomal fusion and fission events were recently studied using Smoluchowski-like coagulation equations to describe the power-law probability distributions of endocytosed low-density lipoprotein [258], endosome maturation [259] and clustering of nanoparticles [260]. A power-law probability distribution of cluster sizes naturally leads to a power-law probability distribution of diffusion coefficients [261].
The dynamics of endosomes share many similarities with other organelles, e.g., lysosomes. Lysosomal trajectories studied using wavelet-based analysis were found to be super-diffusive and composed of alternating bursts and pauses in their active transport [262].

Endoplasmic reticulum
The endoplasmic reticulum (ER) is the largest organelle in eukaryotic cells and can span the entire width of cells. Membranes are the principle component of the ER and they adopt a wide variety of different structures due to the interplay of lipids with other surface active molecules e.g. membrane-bound proteins. Many of the structures are nonequilibrium and continuous motor activity is needed to maintain them [263]. A previous model from our group for the ER was that it acts as a stirred reaction vessel to increase the chemical kinetics, with motor proteins doing the stirring. The ER is a molecular factory that performs a wide range of functions including the synthesis of proteins and lipids. It is thought that ribosomes (protein factories) associate with the membrane of the ER to reduce the dimensionality of the reactions responsible for making new proteins and thereby increase their rate of synthesis. The ER is divided into rough and smooth varieties. The rough ER is constructed from a network of membranous tubules and a practical constraint for tracking experiments is that they resemble linear objects rather than point objects. Thus, they need to be analyzed in images using snakes algorithms and the dynamics resembles that of semi-flexible polymers [164,264].
The statistics of the driven transverse fluctuations of ER tubules were quantified by our group [164,264]. HAT was observed for these fluctuations with the majority of anomalous exponents around 0.5 (the predicted value for stressed semi-flexible filaments), but a substantial minority had super-diffusive exponents driven by motor proteins (figure 5). These results are similar to the dynamics of three-way ER junctions in the cell periphery of HeLa cells [265]. ER junctions show heterogeneous anomalous diffusion with a broad probability distribution of anomalous exponents around a mean α = 0.49 which was interpreted as ER driven by microtubule-dependent active noise.
Heterogeneous dynamics was also found for individual quantum dots loaded into the cytoplasm of HeLa cells [266,267]. It was suggested that the observed heterogeneous subdiffusion stemmed from nonspecific binding of quantum dots to the cytoskeleton and the ER network causing broad probability distributions of anomalous exponents, diffusion coefficients, and exponential decays for the PDFs of the normalized increments.
The ER interacts with a huge number of molecules and organelles in the cell, so there is a huge range of possible sources of dynamic heterogeneity in both time and space. Correctly quantifying the source of the dynamic heterogeneity is an important future goal in understanding the activity of the ER and is expected to relate to disease states e.g. spastic mutations [268]. Machine learning techniques were used to find patterns in the ER's HAT e.g. how the anomalous transport varied with the distance to the cell's nucleus [164].

Globular proteins
Anomalous diffusion of proteins due to molecular crowding has been studied for decades [9]. However, anomalous diffusion in the cytoplasm of living cells can be caused by different mechanisms in addition to macromolecule crowding effects. Dilute biomacromolecules, such as globular proteins, are thought to demonstrate heterogeneous diffusion due to internal flexibility [28]. This will inevitably lead to HAT in more concentrated suspensions e.g. a flexible protein moving in a heterogeneous polymeric mesh [269]. The process of time-fluctuating instantaneous diffusivity has been observed in molecular dynamics simulations and predicted for simple analytic models of polymers e.g. Rouse, Zimm, etc., and for reptation in concentrated solutions. To examine the fluctuations of the diffusivity, the magnitude, and orientation, correlation functions of the diffusivity were introduced to distinguish between different polymer models [29].
Caging phenomena lead to heterogeneous diffusion in concentrated suspensions of colloids [270]. Globular proteins are examples of colloids, so caging phenomena and jamming are also expected in these systems. The viscoelasticity of concentrated antibodies is important for injections in anti-cancer immunotherapies. Classical liquid state theories typically predict a heterogeneous diffusion coefficient that depends on length scale via the momentum transfer q, D(q). Historically, inelastic scattering techniques were the primary technique used to measure D(q), since they operate in Fourier space (hence the use of q). A multi-fractal model for HAT is expected to be superior to D(q), since it is more flexible e.g. it can describe the directional phenomena involved in both the antipersistence of caged particle motions and more persistent hopping phenomena between cages. Systematic comparison has not to our knowledge been completed for the multi-fractal analysis of caging in colloids. An important practical issue is to monitor the size of antibodies during their production in bioprocessing plants e.g. monoclonal antibodies extracted from Chinese hamster cells. Antibodies have many crucial applications in current biotechnology such as pregnancy kits, COVID tests, and immunotherapy in cancer treatments. Currently, the top ten best-selling pharmaceuticals are cancer treatments based on antibodies. Monomeric forms of antibodies are clinically preferred and dimerisation or more extensive aggregation can lead to inactivity and less favorable outcomes [271]. Thus, differentiating diffusing diffusion from aggregation is an important issue in this context. Furthermore, therapeutic antibody formulations are required at high concentrations to increase dosages (e.g. to decrease the probability of resistance to treatment with cancers) and reduce the frequency at which they are administered. Thus studies of the HAT of high-concentration antibodies and how it relates to viscoelasticity are important e.g. how the antibody formulations can be injected through syringes.
Sub-diffusive motion has been established for the internal dynamics of peptides based on standard molecular dynamics simulations (GROMACS) [272,273]. Power law and logarithmic relaxation of hydrated proteins were also found in a molecular dynamics simulation with elastin-like polypeptides [274]. Evidence for fractional Brownian motion was found in quasi-elastic neutron scattering from myoglobin and C-phycocyanine [26]. A two-state model for the folding of globular proteins with anomalous dynamics was presented and scattering functions were calculated [275]. A generalized elastic model yielded a fractional Langevin equation [276] which could be used to describe a broad range of phenomena related to anomalous dynamics in soft-condensed matter including proteins.
NMR evidence for fractional dynamics was found for methyl side-chain dynamics in proteins [277]. Fractional dynamics was also seen in NMR spectroscopy from lysozyme and was confirmed with molecular dynamics simulations [27].

Amyloids
Amyloid diseases include Alzheimer's, Parkinson's, and bovine spongiform encephalopathy (BSE). Progression of the diseases is driven by the aggregation of misfolded pro-teins into giant plaques and such aggregates have thus experienced intensive research over recent years. The self-assembly of amyloids is fairly easy to recreate in vitro in a laboratory. Misfolded proteins commonly adopt beta sheets, which are thought to be their most energetically favourable structures. Less obviously, many standard globular proteins (e.g., lysozyme [278]) experience self-assembly of the beta sheets into giant aggregates that resemble the amyloids observed in the disease. Synthetic peptides also can create giant aggregates and show promise as designer functional materials that can be created at relatively low costs. The transverse displacements of self-assembled peptide fibres measured in dynamic light scattering experiments follow the predictions for semiflexible polymers i.e. they have anomalous exponents for their transverse displacement fluctuations with α=3/4 [279]. Extensive research has been done on the self-assembly of peptide surfactants [280]. We were able to measure entanglement tubes at the single molecule level within peptide gels using single-molecule fluorescence microscopy [281]. Pre-stresses in the gel fibres due to quenched anomalous disorder provided a novel source of heterogeneous anomalous diffusion [282].

Mucins
Heterogeneous anomalous diffusion in mucin is a crucial step during many forms of drug delivery (e.g. to deliver drugs through the intestines, lungs, eye, mouth etc [283]) and for the transport of infectious organisms (e.g. viruses and bacteria). In general, mesoscopic interactions are thought to play an important role in modulating the diffusive transport of nanoparticles in hydrogels [284]. Two broad heterogeneous dynamic modes are seen in experiments and the dynamics of synthetic probe spheres have been interpreted in terms of polyelectrolyte scaling theories [140]. In [177] heterogeneity displayed by trajectories of 1 µm diameter tracer particles in human lung mucus were described by a hierarchical FBM equivalent to a GLE model with a distribution of parameters which induce ergodicity breaking. More modern analysis of particles in mucin gels finds evidence for ergodicity breaking (as expected for most polymeric gels), non-Gaussian pdfs and non-Fickian diffusion [180,285].

Embryonic cells
Stem cell-aided repair is one of the principal goals of tissue engineering in medicine and stem cells are found in embryos. Cells in live embryos also provide important insights into morphogenesis, which impacts all of the physiology of the resultant organism. Reaction-diffusion equations can describe pattern formation in embryos e.g. how the leopard got its spots. HAT will therefore affect such pattern formation and a formalism for reaction-anomalous diffusion is needed.
Temporal heterogeneity within trajectories of haematopoietic stem cells was found in the form of alternating periods of a confined random walk followed by processive motion [286]. Statistical analysis of trajectories of persistently migrating cells in 2D and 3D culture environments revealed that the cells display non-Gaussian superdiffusion with a spectrum of anomalous exponents and diffusivities [287]. In contrast to normal diffusion, cells during super-diffusive motion are covering new areas more quickly. The heterogeneity of single-cell trajectories could originate from both the phenotypic and functional heterogeneity of stem cells, with cells moving between two or more metastable states [288]. In vitro, mouse fibroblast cells on a two-dimensional substrate were also found to move super-diffusively due to heterogeneous noise during the runs, leading to large fluctuations in speed and a run-and-tumble behaviour, where cells move in a straight line for a while before rapidly changing direction [289]. However, the displacement probability distribution was found to have the classical Gaussian form.

Cancer cells
Anomalous heterogeneous diffusion of cancerous cells emerges during the migration of tumor cells, metastasis formation, and the analysis of crawling gaits. One origin of heterogeneity is the migration and proliferation dichotomy in tumour-cell invasion [290] (also known as the Go-or-Grow mechanism) which could be described using a continuous time random walk model [291]. On 2D substrates and in 3D collagen matrices, cell heterogeneity results in the super-diffusive migration of individual cancer cells, with an exponential probability distribution of cell displacements and a non-Gaussian velocity probability distribution [292]. Non-Gaussian double exponential probability distributions of displacements and super-diffusive ensemble averaged MSDs were found for the motility of breast cancer carcinoma cells [117]. The interpretation of the anomalous diffusion of cancerous cells is based on models that range from Lévy walks [293] to the fractional Klein-Kramers equation [294]. Higher resolution experimental data is needed to rigorously differentiate between the different possibilities.
Bayesian inference of time-dependent persistence and local activity (noise amplitude) parameters revealed heterogeneity in measured cell trajectories with large variations of cell behaviour, both with time and between individual cancer cells. Anomalous diffusion is heterogeneous even within trajectories of individual tracer particles in the cytoplasm of cancerous cells. In Hela cells, analyses of the movements of quantum dots revealed heterogeneous sub-diffusion with antipersistent increments [266]. It was suggested that quantum dots move with single constant anomalous exponents but stochastically switch between different mobility states, most likely due to transient associations with the cytoskeleton-shaken endoplasmic reticulum network. Previously, the emergence of heterogeneous subdiffusion in HeLa cells and the values of the anomalous exponent were shown to depend both on particle size and non-specific interactions with the cytoplasmic interior [267].

Leukocytes/hemocytes
Diffusion of T cells (a type of leukocyte) is essential for the immune response of mammals. In vivo, T cell motility is orchestrated by a complex combination of chemokines, interactions with antigen-presenting cells, and multiple guidance cues leading to the persistent directional movement of cells for several minutes following short periods of rest, broadly similar to the run-and-tumble motion of bacteria. There is a lot of evidence that T cell movement is super-diffusive and highly heterogeneous [295]. In naïve CD4 + T cells in the inguinal lymph nodes of anesthetized mice, T cells were found to move via anomalous diffusion, cycling between states of low and high motility roughly every 2 min [296]. In the brains of chronically infected mice, T cells were also moving super-diffusively on time scales of up to several minutes with non-Gaussian probability distributions of displacements [297]. The probability distributions of velocities suggested different types of movement of naïve T cells, with slow moving cells showing a heavy-tailed probability distribution while faster moving cells were more Brownian [298]. Super-diffusion of T cells was also observed in larval zebrafish [299] with the variability in motility amongst the cells attributed to heterogeneity in speed and turning behavior amongst the cells. The heterogeneity of motility emerged from a positive feedback loop between actin flows and the maintenance of cell polarity in motile cells which results in a higher persistence of motility for faster cells [299,300]. Run-and-stop motion in the three-dimensional migration of T cells with periodic shape oscillations was observed [301].
Hemocytes are phagocytes of invertebrates that plays a crucial role in their immune system. Recently, our group discovered heterogeneous anomalous diffusion of hemocytes during their migration in live Drosophila melanogaster embryos (figure 6). Hemocytes were moving super-diffusively in the control and in two mutant embryos (LanB1 and SCAR) with positive velocity auto-correlation functions which decay to zero as powerlaws, t α−2 , with α in agreement with values calculated from the MSDs. The probability distributions of incremental displacements dx = x(t + τ ) − x(t), dy = y(t + τ ) − y(t) and dz = z(t + τ ) − z(t) follow Laplace distributions instead of Gaussians. The non-Gaussian form of the probability distributions of the displacements suggests that the motility of hemocytes is heterogeneous. Using a deep learning neural network [71], a broad spectrum of Hurst exponents H (related to anomalous exponents as H = α/2) and generalized diffusion coefficients were found [74]. The probability distributions of Hurst exponents were characteristic of the type of mutant considered. The hemocytes are switching between different levels of persistent motility, rather than distinct subpopulations of cells having altered dynamics. Interestingly, local (time resolved) anomalous exponents (and also local generalized diffusion coefficients) of individual hemocytes were found to exhibit oscillatory behaviour which could be a manifestation of a positive feedback loop between actin flows and maintenance of cell polarity [299,300] and/or similar periodic shape oscillations to those observed during the run-and-stop motions of T cells [301]. Upon their contact, we found that the hemocyte motion was synchronised, although their clear sense of directional motion was lost [74]. Contact of the cells inhibits their motion (a process of contact inhibitory locomotion) and the anomalous transport of cells continues to be synchronised in terms of the anomalous exponents, generalized diffusion coefficients, displacements, and velocities. An anomalous tango is thus observed between the hemocyte cells when they touch.

Bacterial cells
The simplest model for bacterial motion is Poisson distributed stretches of ballistic motion followed by tumbles that randomise the direction [302]. It is equivalent to the model for the conformations of freely jointed polymers in which space is replaced with time. However, it has been known that bacterial motion is to some degree anomalous, since it was first quantified with tracking experiments e.g. the runs clearly have some curvature in the images of H. Berg (e.g. on the front cover of the book "Random Walks in Biology" [303]) and thus the runs are subballistic with a reduced persistence. Therefore the motion of the bacteria during the runs is not fully persistent as expected for ballistic transport.
Super-diffusive bacterial motion was probed by micron-scale beads in a freely suspended soap film [304] that contained a dense swarming bacterial suspension [305]. Efforts have been made to understand the anomalous transport of bacteria using fractional Brownian motion [306]. A multi-fractal model has also been introduced to describe the swimming of bacteria [307]. A more sophisticated form of HAT with bacteria is related to their memory due to sensory proteins [308]. Large behavioral variability of wild-type E. coli was revealed in their three-dimensional trajectories (figure 7) with a broad probability distribution of persistence times within a monoclonal population of bacteria. HAT for the internal motion of proteins inside bacteria has been examined using a super-statistical approach. A q-Gaussian was found for the displacement probability distribution [309]. Lévy walks have been successful in describing anomalous transport in concentrated suspensions of bacteria [305]. More recently there is evidence for the applicability of Lévy walks in quiescent dilute solutions of E. coli (figure 8) [310] and the dynamics of gold nanorods in bacterial swarms [311]. These models better describe the distribution of run times (moving beyond simple Poisson models with exponential distributions), but still fail to describe the reduced persistence of runs observed in experiments. Correlated truncated Lévy processes were used to describe mixing in dilute suspensions of algae and bacteria [312].

Bacterial biofilms
Most bacteria spend most of their time in surface-attached aggregates. There are numerous advantages for them to follow this communal style of living and the main one is protection from antibiotics.
Bacterial biofilms are rheologically heterogeneous on the micron scale and can become more homogeneous as a function of height as they mature. The motion of P. aeruginosa (a rod-shaped swimming bacterium) is a combination of active and thermal motion, whereas with non-motile S. aureus the motion is only thermal. A wide range of anomalous transport in biofilms was found with both varieties of bacteria [313,314]. The biofilms were locally heterogeneous and the MSDs of neighboring bacteria differed by up to a factor of 10. Heterogeneity on individual trajectories was also observed during the biofilm formation. Trajectories of individual B. subtilis cells can be composed of superand subdiffusive regimes as cells continuously enter or leave aggregates [315], figure 9. Elastic mechanical waves have been observed to propagate across bacterial biofilms due to their motile activity [316].
Bacterial biofilms have heterogeneous structures on the micron and nanoscale. Furthermore, in general, both polymeric and colloidal gels exhibit heterogeneous viscoelasticity (even simple one-component gels with a solvent are heterogeneous), so both properties may contribute to the heterogeneous viscoelasticity of biofilms (both predominantly colloidal, i.e. very little polymer, and polymeric, i.e. very few cells, biofilms can be made). The average spatial heterogeneity of bacterial cells in biofilms can be quantified separately using the Ripley K function [314], but pair correlation functions can also be used. Recent studies indicate that the motion of some antibiotics in biofilms is subdiffusive [317] and biofilm/sputum composites have also been considered. Transport in the composites was studied using a system consisting of two different media [89,318]. Thus, the effective action of biofilms against antibiotics could be due to the introduction of slow subdiffusive transport, reducing the rate at which high concentrations of antibiotics arrive at the bacteria. Other factors are also known to be important, such as phenotypic changes of the bacteria in the biofilms, which adopt very low metabolic states e.g. persister cells.

Eukaryotic microorganisms
Eukaryotic microorganisms are known to generate non-equilibrium fluctuations in their surroundings, which have been studied using passive tracer particles immersed in suspensions of eukaryotic swimmers. The long-time dynamics of the tracers was described with a normal (Brownian) yet non-Gaussian diffusion model [319]. Theoretically it was predicted that hydrodynamic interactions between the tracer and the swimmers lead to the occurrence of a Lévy flight regime in the tracer dynamics and non-Gaussian distributions of displacements [320]. Lévy flight models are nonphysical since they imply instantaneous transport of particles during jumps, so these results should be revisited with a Lévy walk model. Interestingly, the shape asymmetry of tracers is predicted to induce ratchet effects that alter fluctuations in cell motion and lead to asymptotic super-diffusion [321].
The two-dimensional motion of the social amoeba Dictyostelium discoideum (a species of soil-dwelling amoeba) was found to be super-diffusive with broadly distributed diffusivities [322] (probably due to cell-to-cell variability) and was characterized using an exponential probability distribution of displacements and generalised gamma probability distribution of cell speeds [323]. The diffusion of the parasitic soil nematode Phasmarhabditis hermaphrodita was found to be normal but with a significant variation of the diffusion coefficients among the the population that followed a gamma probability distribution [124]. Analysis of the motion of four flagellated Protozoa species revealed super-diffusion with non-Gaussian probability distributions of displacements together with a generalised gamma probability distribution for their speeds and a broad probability distribution of Hurst exponents characterizing the self-similarity of the cells' velocity time series [324].

Conclusions
The questions involved in heterogeneous anomalous transport are of central importance to quantitative studies of molecular and cellular biology. The main challenge is to decipher the origin of the underlying distributions of anomalous exponents and generalized diffusion coefficients. In specific cases quantitative treatments are possible e.g. for diffusing particles undergoing fission and fusion, a power-law probability distribution of cluster size was predicted by Smoluchowski's coagulation theory, which naturally leads to polydispersity with diffusion coefficients that scale with the cluster size as a power law [261]. Alternative theoretical methodologies to consider the motility of molecules and cells include focusing on the small system thermodynamics of trajectories [325] or soft matter models [326]. Here we prioritised the experimental measurement of the statistics of motion and the development of agnostic models which can directly describe the anomalous transport observed. The extreme complexity of high-concentration biological systems in living cells often precludes more fundamental stochastic thermodynamic or soft matter modelling. The hope is that heterogeneous anomalous transport will allow the gap to be bridged to more fundamental molecular models. Furthermore, many classical soft matter models were based on Gaussian fluctuations and still need to be extended to describe the heterogeneous anomalous statistics observed experimentally. Machine learning techniques (e.g. deep learning neural networks) combined with high-resolution microscope-based tracking allow some of the major challenges in the characterisation of HAT to be overcome. Continued development of anomalous agent-based models will allow a fundamental physical framework to understand collective anomalous motility to be developed e.g. models for the collective motion of cells in tissue based on HAT.
So, what if there are more complicated models based on HAT that are required to fit the stochastic motion of particles in biology? There are a number of advantages to this approach: • HAT has diagnostic potential in medicine. Models based on anomalous transport provide more detailed information to differentiate mutants or diseased states of cells [74]. This could be used for diagnostic purposes e.g. cancer cells often have altered motile phenotypes in leukemia that can drive metastasis and could be diagnosed from a blood sample [327]. Although genetic analysis of the cancer cells would also be a priority for diagnosis, connecting genotype to phenotype is a crucial step, and making the connection necessitates motility assays and robust methods to quantify them. Furthermore, gait analysis (how cells swim or crawl) could be used to classify varieties of microorganisms, which again has diagnostic potential [328]. Also, diseased states of eukaryotic cells (e.g. in humans or livestock) are often associated with impaired intracellular motility e.g. in motor neuron disease where the motility of neurotransmitters in endosomes can be compromised [329]. Information could be used from HAT analysis for gene therapy targeted to treat the phenotypes of impaired intracellular motility and to understand the success of treatments.
• HAT provides fundamental insights into biology. The machinery inside cells is finely optimised by evolution to provide efficient solutions to biological challenges. How do cells and molecules make use of anomalous stochastic phenomena [6,7,330]?
• Anomalous transport presents a fundamental barrier in a diverse range of problems in quantitative cellular and molecular biology e.g. understanding the action of enzymes inside cells where the kinetics can be dramatically affected [331], how cells move in developing embryos [74], the transcription of information from DNA [169,332], how reaction-diffusion processes lead to robust patterning during morphogenesis [104] and the finely orchestrated movement of endosomes inside cells [71,72]. Without a thorough quantitative understanding of anomalous transport, we lack tools to handle a large number of crucial biological problems.
• Numerous new physical phenomena are still left to be discovered that are related to heterogeneous anomalous transport e.g. the motility of molecular motors in active matter [104], collective cellular motility in embryos [74], the motion of wavefronts of electrical signalling in bacterial biofilms [50] and how bacteria swim [333]. The field provides an interesting perspective to the subject of soft matter physics, focusing on the statistics of motility.
• The applied mathematics of HAT is novel and interesting in its own right e.g. fractional partial differential equations [334], the generalized central limit theorem, Ito calcululus [335] and multi-fractals [64].