Classification of stochastic processes based on deep learning

Stochastic processes model the time evolution of fluctuation phenomena widely observed in physics, chemistry, biology, and even social science. Typical examples include the dynamics of molecular interactions, cellular signalling, animal feeding, disease transmission, financial market fluctuation, and climate change. We create three datasets based on the codes obtained from the published article; the first one is for 12 stochastic processes, the second one for the Markov and non-Markov processes, and the third one for the Gaussian and non-Gaussian processes. We do the stochastic process classification by employing a series of convolution neural networks (CNNs), i.e. VGG16, VGG19, AlexNet, and MobileNetV2, achieving the accuracy rates of ‘99%’, ‘98%’, ‘95%’, and ‘94%’ on the first dataset, respectively; in the second dataset, the test accuracy of VGG16 is ‘100%’, and for the rest of the models, it is ‘99%’; and in the third dataset, the test accuracy of all models is ‘100%’, except the VGG19, which is ‘99%’. According to the findings, CNNs have slightly higher accuracy than classic feature-based approaches in the majority of circumstances, but at the cost of much longer training periods.


Introduction
The phenomena of anomalous diffusion are widely observed in various biological activities, like cellular migration, signalling and trans-membrane transport [1,2].Direct monitoring of molecular movement using single-particle tracking techniques has added to a growing body of data, suggesting several biological systems display anomalous diffusion rather than standard Brownian motion [3][4][5].However, accurately identifying the type of physical processes underlying anomalous diffusion remains a huge challenge for various reasons.For example, fundamentally different transport modes may result in the same diffusion power law which is commonly determined from mean-square displacements (MSDs).Accurate diffusion model analysis necessitates the calculation of more observables.Owing to the biological significance of anomalous diffusive processes, identifying and characterizing them continue to be a newsworthy research topic in biophysics [5].
Currently, anomalous diffusions are pervasively sorted on the basis of MSDs.In particular, a significant class of stochastic processes (SPs) x(t) is with MSDs acting as ⟨x 2 (t)⟩ ∼ t α .For Brownian motion, α = 1.When 0 < α < 1 and α > 1, the phenomena are respectively called subdiffusion and superdiffusion.The popular model to characterize anomalous diffusion is continuous time random walk (CTRW), entailing two random variables: jumping length and waiting time.Generally, these variables are assumed to be independent identically distributed (i.i.d.).If the jump length has a finite second moment, like a normal distribution, but the average of waiting time (e.g.power-law distribution 1/τ 1+α with 0 < α < 1) is infinite, the CTRW models subdiffusion.Superdiffusion is described by CTRW with divergent second moment of jump length and finite average of waiting time, which is called Lévy flight.To overcome the non-physical behaviors, i.e. divergent MSD and instant jump of Lévy flight, Lévy walk is adopted to model superdiffusion [6,7].
Other models describing anomalous diffusions include fractional Brownian motion (fBm), Lévy process, alternating renewal process, and the processes with multiple internal states.The fBm B H (t) is continuous Gaussian process with zero mean and covariance function expressed as follows: where H ∈ (0, 1] is a real number, called Hurst index, describing normal diffusion, subdiffusion, and superdiffusion, respectively, when H = 1/2, <1/2, and >1/2 [8].A continuous time process X(t) with values in R d is named a Lévy process if (i) X(0) = 0 a.s.; (ii) X has independent increments, i.e. for any 0 ⩽ t 1 < t 2 ⩽ t 3 < t 4 , the increments X(t 4 ) − X(t 3 ) and X(t 2 ) − X(t 1 ) are independent; (iii) X has stationary increments, meaning for all 0 ⩽ s ⩽ t the random variables X(t) − X(s) and X(t − s) have the same distribution; (iv) X is stochastically continuous, i.e. for every t ⩾ 0 and ϵ > 0, it holds Brownian motion is the most common example of Lévy process, in which X(t) − X(s) is normally distributed with zero mean and variance t − s.Other examples of Lévy process include Poisson process, Cauchy process, compound Poisson process, Gamma process, and the variance Gamma process [9,10].The alternating process is a typical model in renewal theory.It effectively models intermittent strategy, which alternates, e.g. between Brownian motion and Lévy walk [7,11,12].It is also observed in the transport of the neuronal messenger ribonucleoproteins delivered to their target synapses, where a type of Lévy walk process is interrupted by the emerging of rest; the rest period can be very long, characterised by power-law distribution without a finite mean [13].The SP with multiple internal states is also very popular.A concrete example is a compound renewal process with multiple internal states, where the holding times for different internal states are drawn from different distributions and the jumping lengths are with the same distribution [14].Its applications include trapping in amorphous semiconductors, electronic burst noise, movement in systems with fractal boundaries, the digital generation of 1/f noise, and ionic currents in cell membranes [15,16].
The applicability of classical methods for accurately extracting the underlying parameters in this regime has been somewhat limited, thus necessitating a more reliable approach, e.g. machine learning (ML) algorithms, in particular, deep learning (DL) [17].Several attempts to classify SPs with ML methods have been carried out.For instance, Monnier et al [18] used a Bayesian approach to MSD-based classification of motion modes.Dosset et al [19] used a simple back-propagation neural network to discriminate different types of diffusion.Wagner et al [20] built a random forest classifier for normal, anomalous, confined, and directed diffusion.Other studies concentrate on anomalous diffusions.For example, Thapa et al [21,22] used Bayesian analysis methods to differentiate among Brownian motion, fBm, and Brownian motion with diffusing diffusivity, demonstrating its relevance to biological data of mucus hydrogels.In a second noteworthy work, Munõz-Gil et al [23] recently used a random forest algorithm to classify a given trajectory as one of several anomalous diffusion models and to estimate the anomalous diffusion exponent as part of a classification problem with a resolution of 0.1 and an accuracy of 70%-90%, depending on the trajectory length and noise.Recently, AL-hada et al used GoogleNet, ResNet-18 and ResNet-50 to classify 12 stochastic processes, in which the ResNet-50 model can achieve 99% accuracy [24].Most of the above research works are involved in the Anomalous Diffusion challenge (AnDi), an open competition, which is divided into three distinct tasks: model categorization, inference of anomalous exponents, and trajectory segmentation [25].In addition, the stochastic differential equations used in this paper are detailed in the appendix of the [24].
DL has become a research hotspot of convolution neural networks (CNNs) in all parts of life as a result of the rise of artificial intelligence and the rapid growth of computing power.Furthermore, with the availability of larger datasets, traditional ML models based on predetermined features perform less well than DL methodologies [26].Random walk features are deduced using DL algorithms that are tuned.CNNs, which are used to classifying images, are applied to deduce parameters related to sub-diffusive behavior and show oscillations in various modes of diffusion.The restrictions of DL algorithms used in SPs and other DL applications are similar: trained models are difficult to comprehend and detecting overfitting can be difficult.Furthermore, the neural network's recurrence level and parameters raise the training's computational cost.
The main challenge in the discipline is to establish a way to know the type of diffusion by using a simple approach that applies to non-experts.In this paper, we use four neural networks, namely, VGG16, VGG19, AlexNet, and MobileNetV2, to categorize the stochastic processes based on the diffusion type of the single-particle trajectories, i.e. to identify diffusion types from images of individual particle trajectories.In the datasets, the 12 stochastic processes are, respectively, named as 'Lévy process BM1905' for Brownian motion, 'Lévy process stable subordinator' for stable subordinator, 'Lévy process homogeneous Poisson process' for homogeneous Poisson process, 'Lévy process symmetric stable Lévy process' for symmetric stable Lévy process, 'Alternating process LWandBM2019' for alternating process between Lévy walk and Brownian motion, 'Gaussian process fBm1968' for fractional Brownian motion, 'Multi-state process CTRW2017' for fractional compound Poisson process with multiple internal states, 'Multi-state process CTRW2018' for Lévy walk with multiple internal states, 'Random walk CTRW196501' for CTRW with exponential waiting time and Gaussian jump length, 'Random walk CTRW196502' for CTRW with power-law waiting time and Gaussian jump length, 'Random walk CTRW196503' for CTRW with exponential waiting time and power-law jump length, and 'Random walk CTRW196504' for CTRW with power-law waiting time and power-law jump length.
For the considered stochastic processes, one can refer to appendixes A and B of the recently published paper [24].In addition, a SP owns the Markov property if the conditional probability distribution of its future states is dependent only on its current state, meaning that provided the present, the future does not rely on the past.Specifically in this paper, the models that belong to Markov processes are all Lévy processes and two Random walks: CTRW196501 and CTRW196503, and the rest are non-Markovian.Moreover, Gaussian processes (GPs) are natural generalizations of multivariate Gaussian random variables to infinite (countably or continuous) index sets, which are routinely used to solve hard machine learning problems.In this paper, the models that belong to Gaussian processes are 'Gaussian process fBm1968' and 'Lévy process BM1905' , and the rest are non-Gaussian.

Models and their structures
We use VGG16, VGG19, AlexNet, and MobileNetV2 to do the SP classification.

VGG
This section covers essential information on the VGG model [27] and detailed information about the model used in the experiment, including the layer configuration setup.The 16 layers of the VGG16 architecture consist of 13 convolution layers and three fully coupled layers, accompanied by five max-pool layers (see figure 2).A 64-size channel is initialized in the architecture, the size of which is doubled following each max-pooling layer till reaching the 512 channels.The receiving field of the convolutional layer is 3 × 3, and the step is 1.Every convolutional window has row and column padding to ensure that the input and output feature maps have the same size.The max-pooling activity is accomplished using a window of size 2 × 2 with a step of two to avoid overlapping the maximum pooling windows.Each of the first two fully-connected (FC) layers has 4096 channels.One thousand channels become available on the final and third FC layer.The output layer is the third completely connected layer.We use ReLU as the activation function of the VGG16.The VGG16 architecture's visual representation and the data stream within the VGG16 architecture are, respectively, depicted in figures 1 and 2.

MobileNetV2 network
Google proposed the MobileNetV2 network architecture in March 2019 [28].It primarily introduces a linear bottleneck structure whilst reversing the residual structure.The exact implementation structure for converting characteristics from N to M channels with stride s and expansion factor t is demonstrated in table 1.This bottleneck utilizes linear activation rather than nonlinear one after the pointwise convolutional layer, adding a 1 × 1 convolution layer in front of the depthwise convolutional layer.The parameters in the depthwise convolutional layer are then tuned to accomplish downsampling.The MobileNetV2's entire network structure is demonstrated in table 2, in which c denotes output channels and n denotes repetitions.The intermediate layer of this network, which has an overall of 19 layers, is employed to extract features, while the final layer is used for classification [29].By utilizing inverted residual and linear bottleneck structures, MobileNetV2 optimizes the network, producing a deeper network with a more compact and quicker model, showed in figure 3.

AlexNet
In 2012, Alex Krizhevsky and his colleagues presented a CNN model larger and deeper than LeNet.They won the ILSVRC, the most difficult ImageNet visual object recognition challenge [30].In fact, at that time AlexNet surpassed all standard computer vision and ML methods in recognizing accuracy.This development is important in ML and computer vision for visual classification and recognition tasks, and it has generated a spike in DL research.The AlexNet architecture is presented in figure 4. The first convolutional layer accomplishes max-pooling and convolution with 64 distinct 11 × 11 size reception filters.Maximum pooling operations are carried out by utilizing 3 × 3 filters with a stride size of 2. In the second layer, similar actions

Input
Operator Output are conducted with 5 × 5 filters.The third, fourth, and fifth convolutional layer, respectively, has thirty-three filters with 384, 256, and 256 feature maps.The last two layers are FC with a dropout, followed by a soft max layer.AlexNet is composed of five convolutional layers and two entirely linked layers [31,32].

Setup of the experiment
To evaluate the performance of the classification models and demonstrate the efficacy and superiority of the proposed SP classification approach, we implement the CNN using Python and ML library PyTorch.The experimental environment characteristics are listed in table 3. Three datasets are used for the experiments [33].The first dataset is composed of 12 stochastic processes, each of which contains 3500 images.The second dataset includes Markov and non-Markov processes (In the CTRW framework, when the distribution of the waiting time is power law, the corresponding stochastic process is non-Markovian; while temporarily non-local coupling also induces non-Markovian property, e.g.fractional Brownian motion), both of which contain 21 000 images.The third dataset is consist of Gaussian and non-Gaussian processes, having 7000 and 35 000 images, respectively.

Results and discussions
The current study presents a multi-label classification of 12 SPs, a binary classification for Markov and non-Markov processes, as well as Gaussian and non-Gaussian processes.The four pre-trained CNNs with hyperparameters are trained by the datasets.All the models show good results.

Results for the classification of 12 processes
We evaluate the efficiency of the classification of SPs of the four pre-trained AlexNet, VGG16, MobileNetV2, and VGG19 on the datasets listed in the above section.We use 80% of the labeled dataset for training and 20% for tests in the investigations.The Adam algorithm is used to determine the best weights and biases for a neural network to minimize the function's loss.The Adam algorithm picks a limited training inputs at random throughout the learning process.The size of the batch is set as 10 and the learning rate is chosen to be 0.000 01.The choice of the parameters is summarized in table 4.  Four commonly used classifier evaluation indices (accuracy, precision, recall, and F1-measure) are adopted to thoroughly and accurately evaluate our models' performances.The accuracy is defined as where the letter i denotes the precision of class i, TP i (True Positive) indicates the accurate classification of positive class i, TN i (True Negative) refers to the accurate classification of negative class, FN i (False Negative) represents the incorrect classification of positive class i, and FP i (False Positive) denotes the inaccurate classification of the number of negative classes i.The precision is expressed as which denotes the correct frequency value in a positive sample instance.The recall is given as The F-Measure is defined as Figures 5 and 6 represent the confusion matrices for all models, that may be utilized to evaluate individual misclassification levels.Rows show the instances of known classes, while columns indicate the predicted classes.The upper-left to the lower-right diagonal of the square matrix contains all correct categories.The confusion matrix illustrates that the models can usually recognize the majority of classes with ease.Some classes seem to be harder classified than others, such as 'Random walk CTRW196501' , 'Random walk CTRW196502' , 'Random walk CTRW196503' , and 'Random walk CTRW196504' , because they have similar characteristics.The 'Support' denotes the actual counts of each class in test data.Table 5 summarizes the categorization results for all models.The training accuracy of the VGG19, AlexNet, and MobileNetV2 is 98%, 95%, and 94%, respectively, while the training loss is 7%, 10%, and 13%, respectively.VGG19, AlexNet, and MobileNetV2 achieve overall accuracies of 97%, 95%, and 94%, respectively, with losses of 6%, 11%, and 15%.The VGG16 model outperforms all the others and provides the highest accuracy, achieving 98% for training and 99% for testing.Additionally, it produces fewer losses, reaching a 4% training loss and a 3% test loss.Tables 6 and 7 summarize the models' other metrics (Recall, Precision, Support, F1_score, and Processing time).VGG16 evidently outperforms VGG19, AlexNet, and MobileNetV2.

Classification for Markov and non-Markov processes
Figure 9 shows confusion matrices of all models.Columns represent expected classes, whereas rows denote known class instances.All accurate categories are placed along the upper-left to the lower-right diagonal of the square matrix.The confusion matrix reveals that the models have little trouble in detecting Markov and non-Markov classes on the whole.For instance, 4229 of 4313 Markov processes are correctly predicted by the MobileNetV2.In addition, 4294 of 4294 Markov processes are predicted correctly by the VGG16.Their performance is summarized in tables 8 and 9. Overall, it turns out to have good accuracy, high precision, high recall, and high F1-Score.

Classification for Gaussian and non-Gaussian processes
Figure 12 shows the confusion matrices of all the models.The confusion matrices demonstrate that the models have no difficulty in classifying Gaussian and non-Gaussian processes.Figures 13 and 14   the training and test accuracies and loss values.Again, it can be observed that training the VGG16 and VGG19 models takes a little more time.
The performance per class is displayed in tables 11 and 12.The models provide good precision, recall, and F1-Score rates across all classes, at around 100%.

Conclusions
SPs are powerful model in describing the stochastic phenomena observed in natural world.This study concerns the SPs classification utilizing CNN-based DL algorithms.We generate three datasets: the first is for 12 sub-stochastic processes from 5 initial processes (Random walk, Gaussian process, Multi-state process, Alternating process, and Lévy process), the second for Markov and non-Markov processes, and the third for Gaussian and non-Gaussian processes.We adopt four pre-trained models, namely, VGG16, VGG19, AlexNet, and MobileNetV2.Although all models perform exceptionally well on the datasets, VGG16 outperforms all others at the cost of taking much longer training periods.

Figure 2 .
Figure 2. The pathway of data inside the VGG16 architecture.

Figure 4 .
Figure 4.The structure of AlexNet, composing of convolution, max-pooling, local response normalization, and a fully connected layer.

Figure 5 .
Figure 5. Normalized confusion matrices for classifiers built on training data with (a) VGG16 and (b) VGG19.

Figure 6 .
Figure 6.Normalized confusion matrices for classifiers built on training data with (c) AlexNet and (d) MobileNetV2.

Figure 7 .
Figure 7. Evolution of the 'training and test accuracies' and 'training and test loss' for the models (a) VGG16 and (b) VGG19.

Figure 8 .
Figure 8. Evolution of the 'training and test accuracies' and 'training and test loss' for the models (c) AlexNet and (d) MobileNetV2. illustrate

Figure 11 .
Figure 11.Evolution of the 'training and test accuracies' and 'training and test loss' for the models (c) AlexNet and (d) MobileNetV2.

Figure 13 .
Figure 13.Evolution of the 'training and test accuracies' and 'training and test loss' for the models (a) VGG16 and (b) VGG19.

Figure 14 .
Figure 14.Evolution of the 'training and test accuracies' and 'training and test loss' for the models (c) AlexNet and (d) MobileNetV2.

Table 5 .
A summary of the accuracy and loss in the four models used to classify the 12 classes of stochastic processes.

Table 6 .
A summary of the performance of the Recall, F1-Score, and Precision values collected from the VGG16 and VGG19 models used to classify the 12 classes of stochastic processes.

Table 7 .
A summary of the performance of the Recall, F1-Score, and Precision values collected from the AlexNet and MobileNetV2 models used to classify the 12 classes of stochastic processes.

Table 8 .
Performances of the Recall, F1-Score, and Precision values collected from the VGG16 and VGG19 models for Markovian and non-Markovian process classification.

Table 9 .
Performances of the Recall, F1-Score, and Precision values collected from the AlexNet and MobileNetV2 models for Markovian and non-Markovian process classification.

Table 10 .
Accuracies and losses of the four models for Markov and non-Markov process classification.

Table 11 .
Performance of the Recall, F1-Score, and Precision values collected from the VGG16 and VGG19 models for Gaussian and non-Gaussian process classification.

Table 12 .
Performance of the Recall, F1-Score, and Precision values collected from the AlexNet and MobileNetV2 models for Gaussian and non-Gaussian process classification.

Table 13 .
Accuracies and losses in the four models for Gaussian and non-Gaussian process classification.