Unsupervised learning of anomalous diffusion data

The characterization of diffusion processes is a keystone in our understanding of a variety of physical phenomena. Many of these deviate from Brownian motion, giving rise to anomalous diffusion. Various theoretical models exists nowadays to describe such processes, but their application to experimental setups is often challenging, due to the stochastic nature of the phenomena and the difficulty to harness reliable data. The latter often consists on short and noisy trajectories, which are hard to characterize with usual statistical approaches. In recent years, we have witnessed an impressive effort to bridge theory and experiments by means of supervised machine learning techniques, with astonishing results. In this work, we explore the use of unsupervised methods in anomalous diffusion data. We show that the main diffusion characteristics can be learnt without the need of any labelling of the data. We use such method to discriminate between anomalous diffusion models and extract their physical parameters. Moreover, we explore the feasibility of finding novel types of diffusion, in this case represented by compositions of existing diffusion models. At last, we showcase the use of the method in experimental data and demonstrate its advantages for cases where supervised learning is not applicable.

Stochastic diffusion processes are ubiquitous in nature, with applications in diverse fields such as physics, chemistry, biology or social sciences. The best-known model for these is Brownian motion [1], but recently, deviations from such paradigmatic phenomena have been discovered in a huge range of systems [2]. We define as anomalous diffusion any of these deviations, which are usually characterized by a power-law scaling of the mean squared displacement (MSD), where α is defined as the anomalous exponent. For α = 1 one typically recovers normal diffusion, and usually a return to the Brownian motion universality class (for exceptions, see Ref. [3]). Any α = 1 showcases anomalous diffusion, while other features such as non-ergodicity or correlated displacements may also be signals of its appearance [4,5]. In recent years, there has been an increasing interest in the topic, motivated by the development of single particle tracking techniques (SPT), which open the possibility of studying the motion of particles beyond the diffraction limit [6]. Such achievements have shown that diffusion in many different biological scenarios is indeed anomalous [7]. Moreover, researchers have satisfactorily applied such theoretical framework to systems at all scales, ranging from ultra cold atom experiments [5], to animals [8] or economic signals [9].
While currently both theory and experimental techniques are very well developed, connecting these two is a non-trivial task. Due to the stochastic nature of the problem, the analysis of anomalous diffusion data and its comparison to theoretical predictions often relies on * munoz.gil.gorka@gmail.com the use of ensemble averages [10,11] or extensive statistical analysis [12,13]. Moreover, trajectories arising from SPT are usually short and cannot be compared to the long time predictions theories usually make. On top of that, the experimental data is often very noisy and rather difficult to gather, hence increasing even more the difficulty of the problem.
Lately, considerable interest has been devoted to develop and improve single trajectory methods, i.e. tools which extract the maximum amount of information from a single anomalous diffusion trajectory. Here, machine learning (ML) approaches have shown incredible success and are able to beat state of the art methods in a variety of scenarios [14]. A wide range of ML architectures have been tested: from usual neural networks [15], convolutional [16] and recurrent layers [17,18], graph neural networks [19], Bayesian inference [20], or extreme learning machines [21]. All these methods are trained in a supervised scheme, which means that the machine learns to characterize the data by training with human-labelled data. While for some applications, such as parameter estimation, this approach is completely valid and advantageous, for others it creates an inherent problem that may deeply affect our understanding of the data analyzed.
An example of such inherent problem is the discrimination of the diffusion model of the particles, which has been usually stated as a ML classification problem [22][23][24]. When classifying between diffusion modes (i.e. normal, anomalous, confined or directed diffusion), the problem is simplified, as they probably contain all possible kinds of diffusion. However, a different problem is to classify between diffusion models, such as e.g. fractional Brownian motion (FBM) or continuous time random walk (CTRW) (see Section I C for further details on these). In this case, it is very possible that the actual model of the data analyzed is not contained in the training set, or may even be a combination of these. Usual supervised approaches lack the tools to discriminate these cases and they associate them with the most resembling model [24].
In this work, we explore the use of unsupervised techniques for single trajectory characterization and show that they can indeed be used to analyze anomalous diffusion data coming both from simulations and various experimental setups. In particular, we focus on the use of neural network autoencoders, a typical ML architecture for unsupervised and semi-supervised approaches. Our aim is: first, to investigate if unsupervised learning yields the same results of supervised techniques previously studied; second, to study the use of unsupervised approaches to understand and compare different anomalous diffusion models beyond supervised methods; third, to explore the possibility of finding new physical processes related to anomalous diffusion by means of autoencoders.
The work is organized as follows: in Section I we present the unsupervised approach of use, namely convolutional autoencoders (Section I A) and anomaly detection (Section I B). Moreover, in Section I C we present the anomalous data used, as well as further details on the anomalous diffusion models considered. Then, the results of the paper are presented in Section II. First, we explore the feasibility of anomaly detection in simulated data (Section II A), both for model discrimination (Section II A 1) and parameter estimation (Section II A 2, but also to characterize novel types of diffusion, showcased in this case by the combination of existing models. (Section II A 3) We then apply the knowledge gained to analyze experimental data arising from SPT trajectories (Section II B).

A. Autoencoders
Autoencoders (AE) are artificial neural networks that learn efficient encodings of high-dimensional data. They consist of two parts: an encoder that compresses the input data into a low-dimensional embedding, referred to as the latent space, and a decoder that recovers the original input from the compressed representation. Ideally, the output of the decoder is equal to the input of the encoder. However, due to the compression of information happening in the latent space, such an ideal scenario is often not found. Because of such compression, in order to improve the similarity between reconstructions and inputs, the AE has to learn features of the dataset during the training. Such features will then be used to plug the information gaps caused by the encoder's compression.
In our particular problem, the training data consists of a set of simulated anomalous diffusion trajectories X = {x (1) , x (2) , . . . , x (n) }, where each training sample x i consist of a one dimensional single trajectory with T timesteps, , with x ∈ R. A typical loss function for the reconstruction of such data is the mean squared error (MSE), which for a single trajectory reads where f (x (i) ) is the AE output for the input trajectory x (i) . The AE used consists of a stack of convolutional and fully-connected layers. Moreover, we consider features such as batch normalization, as well as a global average pooling and upsampling before and after the latent space. An schematic representation of such AE is presented in Fig. 1(b). In this scheme, connections exist between subsequent layers, as typically occurs in most neural networks architectures, but extra connections are created between not subsequent ones. Such a special feature is often referred to as skip connections, and is widely used in residual neural networks [25]. Heuristically, we find that the use of such connections boosts the power of the AE, increasing vastly its accuracy. Moreover, due to the stochastic nature of the trajectories, any decrease in the dimensionality of the trajectory (i.e. to the size of the latent space) will cause the loss of necessary information for its reconstruction. In that sense, the reconstruction error will always have a lower bound related to the size of the latent space [26]. In our particular implementation, the latent space consists of a fully connected layer whose size will depend on the length of the trajectories T . A detailed description of the architecture is presented in Appendix A as well as in the repository of Ref. [27].
Importantly, in order to avoid the effect of overfitting of the AEs, and to ensure that their predictions are related to the learnt features and not the memorization of the data, all the results presented in this manuscript correspond to predictions over sets of trajectories not used in the training.

B. Anomaly detection
Anomaly detection refers to the problem of detecting instances or samples from a certain dataset that differ in some sort from the rest. In other words, the goal is to detect outliers, a.k.a. anomalies, which while having similarities with the rest of the dataset, also have special features that crucially make them different. In recent years, machine learning methods have shown great capabilities in dealing with such problem [28], with interesting application in Physics [29].
Particularly, AEs offer a powerful and versatile architecture for anomaly detection while relying on unsupervised or semi-supervised learning [30]. As commented in the previous section, the goal of an AE is to reconstruct an input instance with minimal error. In this sense, anomalies can be detected by setting a threshold to this error, in such a way that any value above indicates that the input reconstructed is indeed an anomaly of the dataset. In this particular work, the prediction error is given by the MSE, Eq. (2). Heuristically, we find that such a metric, in conjunction with the data preprocessing presented in Section I C, yields good results and suffices for the task at hand. Nevertheless, other interesting approaches, based on the reconstruction probabilities [31] rather than absolute values are left to be explored in future works.

C. Anomalous diffusion data
As commented, the datasets considered consist of trajectories arising from different anomalous diffusion models. In particular, we consider four different models, with which we aim to cover a large variety of real-world phenomena: annealed transient time motion (ATTM) [32], continuous-time random walk (CTRW) [33], fractional Brownian motion (FBM) [34], Lévy walk (LW) [35], and scaled Brownian motion (SBM) [36]. Moreover, these models allow us to explore the whole range of anomalous diffusion exponents α: ATTM and CTRW are subdiffu-sive models (α ≤ 1) and FBM and SBM cover the whole range (i.e. 0 < 1 < 2). A complete review on the properties of each model can be found in Ref. [2,14].
The trajectories that we analyzed here were generated via the ANDI dataset python package [14]. Moreover, each trajectory is normalized such that the standard deviation of their displacements is equal to 1. This ensures that all trajectories have a similar diffusion coefficient D ≈ 1, but the anomalous exponent is unaltered. Moreover, to avoid very slow and immobile trajectories, we will restrict the dataset to α >= 0.1. Finally, all the results presented in this work corresponds to AE trained with trajectories of only 20 time steps, if nothing different is stated. While similar results are found for longer trajectories, our goal is to showcase the applicability of the method for experimental scenarios, where such short trajectories are often the only ones accessible.
In Fig. 1(c-d) we briefly showcase the expected results when dealing with anomaly detection with anomalous diffusion trajectories. Depending on the features of the input data, the AE will do a better or worst reconstruction. For instance, in Fig trained with FBM yields better results, seen both visually and with the values of the MSE, when reconstructing trajectories of this same model (highlighted in gray), rather that trajectories of ATTM (left) or SBM (right). Similarly, in Fig. 1(d), we showcase that the MSE is lower when dealing with trajectories of the same exponent the AE was trained with. In the next sections we will deeply analyze this behaviour and understand the power and limitations of the proposed anomaly detection method.

II. RESULTS
We present here the results on applying anomaly detection to anomalous diffusion trajectories. In particular, we focus here on two important aspects of the trajectories: their diffusion model and their anomalous ex-ponents. Our aim is to show that an autoencoder can learn to reconstruct trajectories arising from different anomalous diffusion conditions. More importantly, after training, the AE must be capable of reconstructing new, unseen trajectories. As commented previously, the reconstruction error will then be used as a measure of how related the new trajectories are w.r.t. the ones used for training. In order to benchmark the method in a controlled scenario, we will first present the results on simulated trajectories, and then show its applicability to data from various experimental conditions.
A. Simulated trajectories

Detecting changes of diffusion model
Our first result relates to the ability of the method to differentiate between distinct diffusion models. We start with the simplest scenario, in which an AE is trained with a single diffusion model. Then, we compare its error (as given by the MSE, Eq. (2)) when reconstructing trajectories of other models. In Fig. 2(a) we show the results when training AEs with subdiffusive trajectories with the models indicated in the legend. Each line corresponds to the averaged error over ten different AE and the shaded area represents their respective standard deviations. Then, we reconstruct trajectories with each of the AE for different models, as shown in the x-axis. Importantly, in all cases, the error of the AE is minimal in the model it was trained with (highlighted with a black rounded circle). Moreover, we can already infer, based on the reconstruction error, similarities between some of the models. For instance, the AEs trained with ATTM and SBM trajectories share a very similar behaviour. This is to be expected as they are the most resembling from the pool of chosen models. Indeed, both are so-called random diffusivity models, i.e. showing Brownian-like diffusion but with non-trivial changes of the diffusion coefficient. We note that, even in the case of supervised and model specific algorithms, their discrimination is very challenging [14]. In the case of the FBM trajectories, we see that the error made by AEs is an order of magnitude larger than the rest of the models (note the logarithmic scale). This showcases the strong differences between FBM and the rest of models, being the former the only ergodic model, as well as the only with correlated displacements.
To further understand the presented results, we study how the MSE behaves for the different models at different anomalous exponents α, as shown in Fig. 2(b-e). Each panel corresponds to an AE trained with the indicated model, while each line corresponds to trajectories of the different models (as indicated by the legend in Fig. 2(a)). These AE were trained in the whole range of α showcased. We see here that when α → 1, trajectories of all models collapse to the same MSE, no matter the AE used. This is an important result, as even if their microscopical behaviour may be different, all models converge to the Brownian motion universality class at α = 1 (up to logarithmic corrections [37]). Hence, the AEs are able to learn reliable features from the anomalous diffusion trajectories and not just trivial patterns.
Such results also allow us to further compare the different models considered. Again, we see that the nonergodic models tend to share very similar MSE. For instance, the AEs trained with ATTM ( Fig. 2(b)) and SBM (Fig. 2(e) are able to correctly reconstruct all models but FBM. However, for CTRW trajectories the best results are found when trained with the latter model ( Fig. 2(d), as the characteristic trapping times of CTRW may be difficult to reproduce if not learnt in the training. As previously seen, the error made by the FBM trained AE shows errors an order of magnitude larger when reconstructing other models, which nevertheless compare to the error of FBM at α = 1.

Detecting parameter changes within a diffusion model
While the AEs trained with ATTM, CTRW and SBM show an almost constant MSE at all α for their own model, the one trained with FBM tends to minimize the error for a subset of exponents. Taking advantage that both FBM and SBM are able to generate both subdiffusive and superdiffusive trajectories, we trained AEs in α ∈ [0, 2]. In Fig. 3(a-b) we show such results for FBM and SBM, respectively. The line color indicates the model of the input trajectories. Interestingly, the FBM trained AE shows the same MSE behaviour stated above, but now the minimal MSE occurs at a different exponent. Surprisingly, in the subdiffusive range, the MSE is lower for SBM than for FBM (and also in the highest part of the superdiffusive range). This does not occur in the SBM trained AE, where the MSE is almost constant for all α and is always minimal for the model it was trained with. Again, in this case, the minimal error for FBM trajectories is found at α = 1.
The FBM model offers an interesting playground to study the suitability of anomaly detection in anomalous diffusion. Within the same diffusion model, FBM shows two completely different behaviours, depending on the value of α. For 0 < α < 1, FBM is anti-persistent, hence showing negative correlations, while it is persistent and positively correlated for 1 < α < 2. This makes trajectories for different α very different from each other, hence the difficulty of an AE to correctly reconstruct trajectories at all α. Such behaviour is highlighted in Fig. 3(b-e), where AEs were trained with FBM and SBM in the subdiffusive (b-c) and superdiffusive (d-e) ranges. While the information learnt by the SBM trained AE in any of these two ranges is sufficient to correctly reconstruct trajectories for all α, the one trained with FBM only yields low errors for the range it was trained on.
In order to understand further this behaviour, we analyze the reconstructed trajectories output by the AE trained with two models, FBM and CTRW. We note that the results found for CTRW are analog to the ones found with SBM and ATTM. We proceed to input trajectories to the AE with different α. Then, we calculate the anomalous exponent of the output trajectories by either fitting an ensemble average of the MSD (for CTRW) or by performing a time averaged MSD (for the FBM). A linear fit in the logarithmic scale outputs the anomalous exponent α f ≈ α.
We present results for AEs trained at three different α, as shown in Fig. 4. For this test, we consider trajectories only of the model the AE was trained with (i.e. FBM and CTRW in this case). Moreover, to allow for more accurate estimation of α f via the ensemble and time averaged MSD, the AEs trained with trajectories of T = 200 time steps. In the left column, we show the fitted α f w.r.t. to the exponent of the input trajectories. In the right column we show the reconstruction error for these same AEs, again as a function of the exponent of the input trajectories. In all cases, for the CTRW trained AE (orange), no matter the exponent it was trained with, the output trajectories will maintain the exponent of the in-  Fig. 4(a-c)). Surprisingly, even if the MSE increases for αs different of the one used in the training (see e.g. Fig. 4(d)), the autoencoder is able to reproduce a key feature of the CTRW trajectories as is the anomalous exponent. Oppositely, the outputs of the AE trained with FBM are transformed such that their α f equals the one used for training (blue line in Fig. 4(ac)). This hence causes that the MSE grows significantly when dealing with trajectories of different α, as we saw already in Fig. 3 and again highlighted in the right column of Fig. 4.

put (orange line in
We note here that while both CTRW and FBM trajectories can be characterized by the anomalous exponent, the source of anomalous diffusion complete differs from one another. In the case of the CTRW, the exponent is related to the distribution of waiting or trapping times of the particle. However, in the case of FBM, the exponent is connected to the displacement's correlations. The previous results showcase that the later phenomena is a crucial feature for the FBM trained AE, as once a certain correlation is learnt (i.e. one α is learnt), all the reconstructed trajectories share that same correla-tion. However, in the case of CTRW, the AE is able to reconstruct trajectories without 'loosing' their characteristic exponent, even with different α of the one used from training. The reason of this difference is still unknown and further exploration of such phenomena is currently ongoing with more sophisticated AE.

Detecting mixed diffusion models
To further understand the extent of use of unsupervised learning in anomalous diffusion data, we test the method in trajectories arising from the composition of different diffusion models. Indeed, it is expected that the diffusion of particles in complex environments, such as biological systems, is not described by a single model but the intertwine between two or more. For instance, in Refs. [38,39] the diffusion of particles through the cell membrane was described by a particle performing a FBM in the presence of a fractal structure and traps, respectively. The latter phenomena can be associated to a CTRW, hence leading to the intertwined diffusion of CTRW and FBM. In the small temporal scale, the dominant model would be FBM. However, in a bigger picture, traces of CTRW such as a broad distribution of trapping times will arise. Recently, other kind of diffusion model compositions have been studied, in which particles switch between different diffusion states [40,41]. Nevertheless, we believe that such problem may be better solved with via segmentation algorithms [14,17] and focus on the former subordinated phenomena.
In order to simplify the present example, we consider trajectories whose displacements ∆ i γ,β are the weighted sum of displacements of three different diffusion models: where ∆ i modelj refers to the i-th displacement of a trajectory of the model j. We note that the displacements used to generate a composite trajectory come from a single trajectory of each model, hence maintaining their properties (e.g. correlations and ageing effects). Then, the parameters γ and β allow us to change the dominant model, creating the phase space schematically represented in Fig. 5(a).
We start our analysis by performing anomaly detection of trajectories arising from Eq. (3) with AE trained with pure models, as shown in Fig. 5(b-d). As expected, the loss is minimized in the corner associated with the diffusion model used for training. Interestingly, the AE trained with CTRW favours trajectories of pure models, while the MSE is maximal in the regions where the models are equally weighted. In the case of the ATTM trained AE, the MSE increases when approaching the SBM corner and in the case of pure CTRW. Such a result reproduces what is expected from a theoretical point of view: the lower right sector of the phase diagram can be understood as a noisy CTRW model, which is indeed very difficult to differentiate from an ATTM trajectory in the trajectory length of work (T = 20). Last, the SBM trained AE seems to correctly reconstruct a much bigger region of the phase space, while favouring ATTM over CTRW. We note that the results found in the corners of the phase space are completely analogous to the ones in Fig. 2. As expected, most of the conclusions drawn here are also similar to the one discussed in Section II A 1. However, this approach also for a much more refined analysis of the trajectories, allowing for a deeper level of understanding and the possibility of finding novel types of diffusion, as will be discussed in Section II B. The previous results allow us to validate the use of the composite models and their analysis with AE trained with single models. A different approach is to consider autoencoders trained with trajectories of different γ and β. Then, performing anomaly detection with such AE will allow us to characterize the model of the input trajectories by finding the γ and β with minimal MSE. Such approach has no particular interest applied to the considered simulated trajectories, and will hence be directly applied to experimental datasets with unknown models.

B. Experimental trajectories
The previous results showcased the validity of the presented method to detect variations and outliers in simulated data. However, as commented previously, the main utility of these techniques relies on their implementation in experimental data, as they provide a fast and accurate method to analyze anomalous diffusion trajectories. To showcase that unsupervised methods can be used for that aim, we present here results on the use of anomaly detection in trajectories from various single particle tracking experiments. While the focus here is in biophysical problems, as stated in the Introduction, these techniques are applicable at any scale or system where stochastic trajectories arise.
Here we consider trajectories from three different experiments and present their results in each row of Fig. 6. The trajectories of the two first experiments were cropped to have 200 time steps while for the third trajectories have only 20 time steps. Hence, different AEs will be used for each length. First, we analyze the trajectories of telomeres in the nucleus of mammalian cells [12,42]. In these two references, the authors showed that the trajectories are well described by a FBM model, with a crossover in the anomalous exponent from α ∼ 0.5 at short times to α ∼ 1 for longer times. The second dataset corresponds to the trajectories of receptors diffusing in the plasma membrane of mammalian cells [43]. These trajectories have been shown to be consistent with the ATTM and α ∼ 0.8.
Before presenting the third dataset, we focus on the results found for the latter datasets. We start by inputting the trajectories to AEs trained with different diffusion models. In the case of the telomeres (Fig. 6(a)), the MSE is largely reduced when reconstructing trajectories with a FBM trained AE, hence coinciding with the expected results. For the diffusion of receptors (Fig. 6(b)), the MSE is minimal for ATTM, but also very close to the SBM one. While these results are obviously not conclusive, they already hint at important information and are indeed very well related to what we showed for simulated trajectories above (i.e. shared behaviour between SBM and ATTM). Moreover, this is also consistent with previous ML supervised approaches (see Fig.5(c) of Ref. [14]).
When performing anomaly detection with different anomalous exponents, the results on each dataset differ, as the assigned diffusion models are also different. For both datasets, we show in Fig. 6(c-d) the MSE w.r.t. to the anomalous exponent α of the training dataset, for AEs trained with FBM (solid line, circles) and ATTM (dashed line, triangles). For the diffusion of telomeres, there is a sharp decrease of the MSE for the FBM trained AE (Fig 6(c), circles), indicating an expected exponent of α = 0.6, consistent with previous predictions. For the receptors' trajectories, this same AE has a minimum at α = 1, an expected result as our previous analysis shows that these trajectories do not diffuse as FBM. Hence, the MSE is expected to be minimized at α = 1. This result is analogous to what was found in Section II A 1 and Fig. 2(b-e)). Inputting the receptors' trajectories to an ATTM trained AE, the minimisation of the AE occurs in α ∈ [0.7, 1]. While these results are in accordance with previous works, we note that AE trained with ATTM at different α are able to correctly reconstruct trajectories from the rest of the range, as shown in Fig. 3(d-f). We note again that, while the exploration of diffusion models by means of AE gives rise to great opportunities, we believe that the extraction of diffusion parameters, such as in the case the anomalous exponent, will always be better with supervised techniques.
At last, we focus on the motion of the Progesterone receptor as it diffuses in the cell nucleus, under the presence of the R5020 hormone [44]. This system shows liquidliquid phase separation (LLPS) above a certain hormone concentration. Interestingly, two populations coexist, one where particles diffuse freely through the cell membrane, following ATTM and α ∼ 0.8 and a second one containing the particles interacting with the chromatin and forming phase separated aggregates, following FBM and α ∼ 0.4. As the hormone concentration increases, more and more particles interact with the chromatin and hence the FBM population grows.
Importantly, such result is also found by means of anomaly detection. In Fig. 6(e) we show the MSE for four AEs, trained with the diffusion model indicated in the legend. While the MSE is similar for all AEs at low hormone concentrations, we see a sharp increase above 10 nM. This is consistent with the results shown in Ref. [44] and corresponds to the point in which most of the particles in the system phase-separate. This makes the FBM population increase drastically. Hence, the AE trained with such a model decreases its error, while the rest see a big increase. At low concentrations, the ATTM and FBM populations are balanced, hence the errors for ATTM and FBM are of the same order. Interestingly, the AEs trained with CTRW and SBM show a similar behaviour as the ATTM one, following what we already saw in simulated trajectories (see Fig. 2). We note here an important remark: the trajectories of this last experiment are extremely short (only 20 time steps). This makes any proper statistical analysis extremely challenging and even impossible. While the presented results may not be conclusive in differentiating the non-ergodic models, it already hints very important information that was inaccessible to non machine learning approaches. Moreover, being this an unsupervised method, the AE's predictions are based solely on the features learnt and not in the need of minimizing a classification based metric, as may happen in supervised learning.
To further investigate the diffusion of the PR, we use a similar method as the one presented in Section II A 3. In this case, we will train AE with trajectories at different γ and β. For this particular example, we consider CTRW, ATTM and FBM as model 1,2,3, respectively (see Eq. (3)). This yields the phase space schematically represented by the points and labels of the first panel of Fig. 7(a). First, we calculate the MSE over the set of trajectories with AE trained at different γ and β. At low hormone concentrations, the minimal MSE is found with AE trained with mixed models, all with the presence of FBM. Such result is consistent to what was found in Fig. 6 mixes with FBM. Then, at higher hormone concentrations, the lowest MSE is found with AE trained with trajectories with higher FBM weights, again, similar to what was shown in Fig. 6(e).
The use of such composite model's maps allows for a fast and visual inspection of anomalous diffusion data. Nevertheless, the plots presented in Fig. 7(a) shows the average of the MSE over the trajectories of the dataset. However, one of the goals of this work is to analyze the diffusion at the single trajectory level. To do so with the previous method, we look at the prediction for each trajectory with the AEs trained with composite models. Then, we find the AE that yields the lowest MSE. In Fig. 7(b) we show the percentage of trajectories whose minimal MSE is found at some γ and β. As expected, such approach is analogous to the one of Fig. 7(a) but allows for a single trajectory based analysis. As before, attm fbm ctrw Figure 7. (a) MSE of the experimental dataset of Ref. [44] for AE trained with composite models (Eq. (3)) made of CTRW, ATTM and FBM at various γ and β. (b) Percentage of trajectories whose minimal MSE is found for the AE trained at the given γ and β. Each point corresponds to the average over 7 different autoencoders.
we see that below the critical hormone concentration (1 nM), there is big heterogeneity in the data. As explained in Ref. [44], in this cases two diffusion modes coexist, one related to the anomalous diffusion of PR through the cell membrane and the other related to phase separated PR interacting with the chromatin. Only above the 1 nM hormone concentration, the diffusion is dominated by the chromatin interaction, hence most of the trajectories have minimal MSE with AE trained with pure FBM trajectories (i.e. γ = 1 and β = 0).

III. CONCLUSIONS
We have shown that autoencoders can learn to reconstruct trajectories arising from various anomalous diffusion models. By using the reconstruction error as a metric, we presented a method based on anomaly detection able to characterize the diffusion at the single trajectory level. Particularly, we focus on: 1) the discrimination of the theoretical diffusion model that better describes the trajectory's features; 2) the scaling factor of the mean squared displacement, defined as the anomalous exponent, usually used to evidentiate the arising of anomalous diffusion. More importantly, the training is completely unsupervised. In previous machine learning approaches, single trajectory characterization was based on either a classification problem (for diffusion models) or a regression problem (anomalous exponent). This makes the discovery of novel types of diffusion very challenging or even impossible, as the machine is restricted either to a pool of models or a range of parameters set by the training data.
Anomaly detection avoids such restrictions, as the training is solely based on the correct reconstruction of the given trajectories. Hence, any relevant information learnt by the AE is directly related to the features of the trajectories, rather than a preassigned label. This method then allows to discover novel types of diffusion, as the machine is not biased at assigning a predefined label to the trajectories. Contrarily, these new types will be heralded by an increased reconstruction error. More importantly, the method is successful at characterizing experimental trajectories, even when these are extremely short or noisy.
This work paves the way for the study of anomalous diffusion with unsupervised machine learning. While an extended theoretical framework for anomalous diffusion exists nowadays, its connection with the phenomena arising in real physical systems is still challenging. Indeed, it is not clear to what extent the diffusion in complex scenarios, as e.g. crowded environments or the presence of non-trivial interactions, can be fully described by an idealist theoretical model. In that sense, combinations of existing theories, as the ones considered in this work, or data-driven models, may indeed give a better description of the actual physical phenomena. As shown, anomaly detection allows us to test such approaches w.r.t. to experimental data in a reliable and interpretable form.

ACKNOWLEDGMENTS
We thank Carlo Manzo for stimulating discussions, as well as sharing experimental data; We thank Juan A. Torreno-Pina and Matthias Weiss for sharing experimental data. We also thank Pamina Winkler for useful comments on the manuscript. We acknowledge support from ERC AdG NOQIA, State Research Agency AEI ("Severo Ochoa" Center of Excellence CEX2019-