Functional data learning using convolutional neural networks

In this paper, we show how convolutional neural networks (CNN) can be used in regression and classification learning problems of noisy and non-noisy functional data. The main idea is to transform the functional data into a 28 by 28 image. We use a specific but typical architecture of a convolutional neural network to perform all the regression exercises of parameter estimation and functional form classification. First, we use some functional case studies of functional data with and without random noise to showcase the strength of the new method. In particular, we use it to estimate exponential growth and decay rates, the bandwidths of sine and cosine functions, and the magnitudes and widths of curve peaks. We also use it to classify the monotonicity and curvatures of functional data, algebraic versus exponential growth, and the number of peaks of functional data. Second, we apply the same convolutional neural networks to Lyapunov exponent estimation in noisy and non-noisy chaotic data, in estimating rates of disease transmission from epidemic curves, and in detecting the similarity of drug dissolution profiles. Finally, we apply the method to real-life data to detect Parkinson's disease patients in a classification problem. The method, although simple, shows high accuracy and is promising for future use in engineering and medical applications.


Introduction
Functional data (FD) are functions observed for each unit over certain intervals; see (Xiaoying et al. 2021, Ramsay & Silverman 2005b).FD appears in various scientific fields, such as engineering, geology, biology, medicine, pharmacology, and chemistry.It involves analyzing data in the form of continuous vector functions or curves, which could be treated as realizations of stochastic processes; see (Xiaoying et al. 2021, Yarger et al. 2022).Functional data analysis (FDA) provides methods for extracting intrinsic information from infinite-dimensional and irregular observation data; see (Xiaoying et al. 2021, Ramsay & Silverman 2005b).FDA combines statistics, spatial analysis, and multivariate modeling tools to analyze and predict functional data.It provides advantages over traditional pointwise estimation methods by using irregularly sampled data in space, time, and depth to fit space-time functional models (Górecki et al. 2018).
Earlier stages of FDA were developed by (Ramsay 1991) to study the relationship between the ability of an examinee and his or her probability of correctly selecting an option in a standardized item response test.First, (Ramsay 1991) used kernels to correctly fit the observations to one dimension and later extended the idea in (Ramsay 1995) to fit the data to larger dimensions.Then, (Ramsay 1996) introduced Principal Differential Analysis to find an approximate solution that solves the differential equation Lu = 0 in which we can pick the order of the differential operator L and a basis for the weights, using splines or other adequate functions.This technique was applied to approximate Chinese writing in (Ramsay 2000) using a second-order operator.Later, complexity was added to this type of multilevel model (Ramsay 2002).For more information and applications of FD, we recommend reading the full work on Functional Analysis case studies of Ramsay's work in (Ramsay & Silverman 2002) and finding more mathematical foundation in (Ramsay & Silverman 2005a).
Functional data learning (FDL) has also received a lot of attention recently in the domain of machine learning and deep learning.Different approaches were used to establish predictive models of functional data like in (Zhao 2012) where gradient descent was derived for classical Neural Networks using Fréchet derivates and the Reisz representation theorem to establish a deep learning method for FDL.In (Abraham et al. 2014), the support vector machine method was applied to feature vectors obtained by voxels transformation.In (Pfisterer et al. 2019), some FDA R libraries were compared to machine learning techniques such as random forest and were shown to be outperformed by the latter.In (Zhang et al. 2021), a new methodology was used to learn the bases of different functional subspaces to model functional data before applying learning methods.In (Basna et al. 2021), orthogonal B-spline bases for FDA were shown to be more efficient than other Fourier-based methods.
Other deep learning and machine learning approaches include the work in (Yao et al. 2021), in which inputs are fed directly into a layer composed of nodes of nested neural networks.The output of these neural networks is the basis function themselves.Inspired by the square root velocity, the characteristics in (Rafaj lowicz & Rafaj lowicz 2021) were obtained from functional data using derivatives.It was found to work well along with classification methods such as logistic regression and support vector machine.Other methods of FDL such as manifold learning have recently been introduced, see, e.g.(Hernández-Roig et al. 2020, Mughal et al. 2020, Hernández-Roig et al. 2021).
Convolutional neural network (CNN) has been used extensively in image recognition.The architecture of Neural Networks has been evolving since the introduction of very deep CNNs (Simonyan & Zisserman 2014), ResNets (He et al. 2016), and MobileNets (Howard et al. 2017, Sandler et al. 2018).The efficiency of CNN is affected by its architecture and hardware (Polson & Sokolov 2020), as well as training and cross-validation sets and scaling (Tan & Le 2019).Several other advances have been made by adding other algorithms to CNN, such as the "You Only Look Once" approach proposed by (Redmon et al. 2016, Redmon & Farhadi 2017).To our knowledge, convolutional neural networks have not been used before for functional data learning.
In this paper, we explore how to use CNNs in functional data learning.In Section 2, we introduce our method and the CNN architecture.In most of the problems, we use the same CNN architecture except for a few that will be described in place of their application.In Section 3, we start with some regression and classification problems of curves that represent different functions with and without random noise.In particular, we examine the ability of CNN to estimate parameters of exponential and trigonometric functions.In addition, we employed the same CNN to estimate the magnitude and width of the curve peaks.We also use CNN for the classification problems of increasing versus decreasing function, concave versus convex function, algebraic versus exponential growth, and discerning curves of 0, 1, and 2 peaks.In Section 4, we examine the ability of CNN to estimate parameters of characteristics of dynamical systems.In particular, we use it to estimate Lyapunov exponents of some chaotic curves with and without random noise.We apply the same CNN to estimate transmission rates from epidemic curves.Then, we applied a Siamese CNN to test the similarity between two drug dissolution curves (profiles).In Section 5, we apply the same methods to classify the actual functional data of cases of Parkinson's disease and control.

Methods
Let {(x i , f (x i )) : i = 1, 2, . . ., n} be the data points of a graph with equidistance sampled points x i , i = 1, 2, . . ., n.The data pre-processing step in functional data learning via CNNs creates an input image using functional data points.We assume that f is Min-Max normalized, so we can assume that f : R → [0, 1].A signed distance matrix D is defined, so its elements (i, j), d ij = f (x i ) − f (x j ).The matrix D represents a grayscale intensity value that is used to produce a 28 by 28 grayscale image; see Figure 1.
Next, we describe a typical architecture as presented in Mathworks' documents (MATLAB 2022), which is based on Lenet networks in (LeCun et al. 1998).The first multilayer of CNN consists of batch normalization, a RELU activation function, and an average normalization layer.The first output is 13 × 13 × 8 layers.The second multilayer is the same as the first with an output of 5 × 5 × 16 layers.The third multilayer does not contain an average pooling layer and has an output of 3 × 3 × 32 layers.The final multilayer has no average pooling layer; however, it has a dropout layer of 20% with a fully connected layer followed by a regression/classification layer.All codes and simulations were performed in MATLAB and using its deep learning toolbox (MATLAB 2022).
We use the same CNN's architecture for all of the regression and classification problems.We tested the procedure on different functional types and we start first by discussing our findings for seven case studies of functional data.

Results
In this section, we discuss various regression and classification problems of functional data with and without random noise.We use randomly generated values for the functional parameters from specified ranges to produce training, validation, and testing data sets of sizes 1000, 100, and 100, respectively.We use n = 100 data points for each curve.In the case of width estimation, height estimation, and number of peaks classification, we use 10000 curves for training, and we use n = 1000 functional data points for each curve.This would ensure that there will be a wide variety of heights and widths and enough peaks for each classification.

Regression Problems
3.1.1.Exponential Function Let the exponential curve be y = exp(ωx) with parameter ω representing the rate at which y increases or decreases.The first task is to use our method to estimate ω.See Figure 2 for the curve of y = exp(−0.27x)with and without noise as an example and the corresponding 28 by 28 image.For training, validation, and testing data, the parameter ω is sampled from a uniform distribution over the interval [−1, 1] for 1000, 100, and 100 times, respectively.Figure 3 shows the predicted values versus the estimated value of the exponential function parameter that closely follows the diagonal line with no intercept and slope of one.Table 1 shows a strong diagonal linear relationship between the true and predicted values of the rate.Figure 6 shows the predicted values versus the estimated value of the parameter of the sine and cosine functions that closely follow the diagonal line without an intercept and slope of one.Tables 2 and 3 show a strong diagonal linear relationship between the true and predicted bandwidth values.maximum of the curve.The width of a curve is the horizontal distance of the contour located half of the prominence of that peak.It is important to note that the curves need to be normalized with the highest peak of all the generated curves to have grayscale values representative of all of the heights.Otherwise, the highest grayscale value of 1 will be assigned to the peak height of each curve and thus will not be able to train the model properly.On the other hand, with the width estimation and peak detection (see peak classification section), we use local normalization, since they do not necessarily depend on the height.
To generate the curves, we use a mixture of Gaussian kernels given by y Then the heights H k are randomly and uniformly selected from [0, 2200].The shape parameters W k are randomly generated using W k = ⌊50 • U 1 + 1⌋ where U 1 is a uniformly distributed random variable on [0,1].Similarly, position parameters are randomly generated using P k = ⌊50 • U 2 + 1⌋ where U 2 is a uniformly distributed random variable over [0,1].See Figure 7 for an example of a mixture of Gaussian kernels with and without random noise.Figure 8 shows that the normalized width predictions closely follow the actual values.In this part, Table 4 shows a strong indication of accurate prediction of the width of the peaks of the curves.Table 5 supported by Figure 9, however, shows that the CNN could not retrieve the actual values of height on average unless it has more noise.

Classification Problems
In this subsection, we examine the capabilities of CNN in functional data classification.
For this type of problem, the same CNN architecture is used except that after the dropout layer, the fully connected layer has hidden nodes equal to the number of classes, followed by a softmax layer followed by a classification Layer.

Increasing versus Decreasing Curves
We use CNN to classify the monotonicity of curves.Curves y = e w 1 (x−w 2 ) are used to generate increasing or decreasing exponential curves for training, validation, and training datasets, see Figure 10.We also use the following random variables w 1 = sign(U 1 − .5)and w 2 = 2U 2 + 2.5 where U 1 and U 2 are uniformly distributed random variables on [0,1].We found that the accuracy of the classification of the functional test data is 100%. .

Convex versus Concave Curves
We also examined the ability of CNN to classify the curvature of the curves as convex or concave.Curves y = w 1 (x − w 2 ) 2 are used to generate convex and concave curves for training, validation, and training datasets; see Figure 11.We also use the following random variables w 1 = sign(U 1 − .5)and w 2 = 2U 2 + 2.5 where U 1 and U 2 are uniformly distributed random variables on [0,1].We found that the accuracy of the classification of the functional test data is 100%.

Exponential versus Algebraic Growth Curves
We examined the capabilities of CNN in classifying curve growth as exponential (in the form of y = e cx for c > 0) or algebraic (in the form of y = x c for c > 0), see Figure 12.Training, validation, and training curves are generated using a random c = 3U + 1 where U is a uniformly distributed random variable over [0,1].We found that the classification accuracy of the functional test data is 100%.3.2.4.Finding the Number of Peaks of Curves The same CNN architecture that is used to estimate the magnitude of the maximum height and the width of the peaks is used for classification with the number of peaks as mentioned in Subsection 3.1.3.The curves are generated with the same scheme as mentioned in Subsection 3.1.3.In this case, we use a local normalization; that is, the highest grayscale value of that particular curve is 1, and the lowest is 0. This helps to identify the peaks; since we are not interested in the height, the grayscale values do not need to be proportional to the height.A zero number of peaks is possible if the randomly generated position is outside the interval [0,50].We found that the classification accuracy of the functional test data is 98. 0% for noise-free data, 97.2% for noisy data with magnitude σ = .1 and 96.2% for noisy data with magnitude σ = 1.

Applications to Dynamical Systems
In this section, we apply a CNN to dynamical systems.First, we show how CNN can estimate the Lyapunov exponent from one curve of the three variables solving the Lorenz system.Second, we show that CNN can be used to estimate transmission rates from epidemic curves.Third, we use CNN to test the similarity and dissimilarity of drug dissolution profiles.

Estimating Lyapunov Exponent
The study of human motion has been the focus of interest in the medical field to determine which exercise and range of motion are stable.Different perspectives have been performed to study such stability.Although (Stergiou & Decker 2011) pointed out the link between movement variability and stability.(Wilson et al. 2008) emphasized that variability should not be confused with instability, as it can be observed in healthy and unhealthy subjects.(Dingwell & Cusumano 2000) helped develop the standard procedure for analyzing stability using Lyapunov exponents estimated by the Rosenstein method (Rosenstein et al. 1993).Lyapunov exponents are the average exponential divergence of the orbits between nearest trajectories, which are called nearest neighbors.
The more unstable the system, the higher the value of the Lyapunov exponent.An alternative parameter is the mean of the absolute value of the Floquet multipliers.(Hurmuzlu & Basdogan 1994) is the first to extensively use Floquet multipliers for the same purpose.Floquet multipliers are the eigenvalues of the Jacobian matrix that measure the separation between orbits of the system (Dingwell & Kang 2006) that could be calculated using Poincare maps.If the mean of the magnitudes of the eigenvalues is less than 1, the orbits are considered stable.
To examine our method in estimating the Lyapunov exponent we use the prototype of chaotic systems, Lorenz systems with parameters α, β, ρ defined as the system of ordinary differential equations: Initial values are x(0) = 1, y(0) = 1, and z(0) = 1. Figure 13 shows an example image of the attractor with α = 2.8029, β = 1.1114, and ρ = 11.9620.We randomly simulate the values α = 10 U 1 , β = 8/3 U 2 , and ρ = 20 U 3 , where U i are independent uniformly distributed random variables in [0, 1] for i = 1, 2, 3. We ran the Lorenz system simulation over the time interval [0, 1] using the Runge-Kutta hybrid order 4 and 5 numerical method in MATLAB.We use component x only for the training, validation, and testing of the CNN, see Figure 14. Figure 15 shows the predicted values versus the estimated value of the Lyapunov exponents that closely followed the diagonal line without an intercept and a slope of one.Table 6 shows a strong diagonal linear relationship between the true and predicted values of the Lyapunov exponents.An endurance test was also performed using 10000 test curves to estimate the Lyapunov exponent and found that the CNN is approximately 600 times faster (2.7628 seconds) compared to the MATLAB Rosenstein method (1692.6 seconds).

Estimating the Transmission Rates from Epidemic Curves
Estimation of transmission rates and exponential growth curve rates (see Subsection 3.1.1)are important for emerging epidemics (Boonpatcharanon et al. 2022) such as at the beginning of COVID-19 (Tuite & Fisman 2020).Some epidemics grow algebraically and not exponentially (Chowell et al. 2015, Kolebaje et al. 2022) and so it is also helpful to discern them through classification; see Subsection 3.2.3.One of the main goals in epidemiology is to estimate the basic reproduction number R 0 which is almost always proportional to the transmission rate β.If R 0 < 1, then the disease diminishes; otherwise, there is a chance that it will become endemic.
We use a susceptible-infected-recovered (SIR) to produce epidemic curves with different transmission rates β and estimate that parameter.The disease dynamics of a susceptible-infected-recovered (SIR) compartmental model follows the system of differential equations where S, I, and R are the proportion of susceptible, infected, and recovered individuals in the population at time t, such that S + I + R = 1.Initial values are S(0) = .99,I(0) = .01and R(0) = 0.The parameter µ is the per capita birth/death rate, β is the transmission rate, and γ is the recovery rate.The basic reproduction number in the above SIR model is given by R 0 = β/(µ + γ).
We assume values µ = 1/(365 * 50) days −1 , γ = 1/28 days −1 and β randomly selected from a uniform distribution over (.01, 1) to reflect a basic reproduction number in the range of (.28, 28).The SIR model simulations are run for 50 days using the Runge-Kutta hybrid order 4 and 5 numerical method in MATLAB.See Figure 17 for a simulated epidemic curve I. Figure 18 and Table 8 show strong prediction of transmission rates.

Detecting Similarity of Drug Dissolution Profiles
The problem of drug release or dissolution profiles is important for the pharmaceutical industry.Regulatory guidelines seek to advise coherent characteristics of drug dissolution prior to their approval; see, e.g.(Vranić et al. 2002, Pourmohamad & Ng 2023).Many statistical approaches have been developed to test the similarity between dissolution curves or profiles, including cluster analysis, decision trees, and linear models; see, e.g.(Costa & Sousa Lobo 2001, Maggio et al. 2008, Enǎchescu 2010, Paixão et al. 2017, Abend et al. 2023, Pourmohamad & Ng 2023).That also includes nonparametric measures such as the two measures adopted by the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA), and   To detect the similarity of any two drug dissolution curves, we use a Siamese CNN in which two parallel CNN's final layers are inputs to a cross-entropy measure of the two input images.Some minor changes are made to the overall CNN architecture, in which we avoid using batch normalization and change the average pooling layer to a max pooling layer.In the end, there is a dense layer with 28 2 hidden nodes.Also, the weights are initialized by sampling from a normal distribution of mean zero and a standard deviation of 0.01.Following (Koch et al. 2015), we use the cross-entropy in the output layer to identify the similarity between the images of the dissolution curves.The test resutls can be seen in the confusion matrix in Figure 20.It is important to note that the number of hidden nodes in the last layer helps in convergence.We notice that it is possible to use one hidden node; however, convergence is not consistently guaranteed.The number of nodes of the hidden layer therefore should be considered an important hyperparameter.Furthermore, below 28 2 nodes, there was no considerable change in convergence time.

Real-Life Application: Detecting Parkinson's Disease
Parkinson's disease is a progressive neurodegenerative disorder that results in motor and non-motor symptoms such as tremors, rigidity, and impaired movement control.Detecting Parkinson's disease involves a thorough physical examination to assess motor skills, reflexes, muscle strength, and coordination, and searching for characteristic signs of Parkinson's disease.
Our technique could be successfully applied to detect Parkinson's disease using motor tests.We use a dataset introduced by (Isenkul et al. 2014, Isenkul et al. 2017) in which 62 Parkinson's patients and 15 healthy subjects draw a spiral curve on a tablet.The original test was divided into three parts: a static test, a dynamic test, and a circular motion test.In the static test, subjects draw a certain fixed shape.In the  Our method gave 100% validation and testing accuracy for all of the combinations of the features X, Y , Z coordinates, the pressure P , and the griping angle of A. Using the simplest model with X and Y gives the confusion matrix in Figure 23.

Discussion
In this paper, we tested our new method of functional data learning using convolutional neural networks (CNN) with various examples and applications.CNN performance was very close to perfect, as evidenced by the test curves of the functional cases in both regression and classification problems.We also see that the variation starts to appear in practical cases such as the chaotic system of Lorentz's attractor and when estimating transmission rates from epidemic curves of the SIR system.Also, we found that training the CNN with noisy data improves CNN performance.These results show that in both cases the new method is robust to noise and would handle different cases of functional data.While some of the p-values of the results show slopes and intercepts that are statistically significant from one to zero, respectively, their estimates and the correlation coefficients show close values to one and zero, indicating an excellent effect size.This conflict might be the result of using large data sizes when testing the trained CNN.
On the practical side, a pre-trained CNN can be used in some of the applications.A pre-trained CNN could estimate Lyapunov exponents and assess the stability of some systems.For example, it could be used in the medical field to determine the stability of human motion or walking gait.This methodology provides a more practical approach in which, with moderate noise, the CNN performs well and is approximately 2 orders of magnitude faster.As such, the measured data can be used as input without the need to filter before using the CNN since it is robust to noise.A CNN pre-trained on epidemic curves can be used to estimate the transmission rate or directly estimate the basic reproduction number R 0 and discern exponential growth from algebraic growth.
When it comes to classification problems, the new method always gave an accuracy of 100%.We tested the new method for the classification of curves according to their monotonicity and curvature, as well as their type of growth.Furthermore, we simulated drug dissolution profiles to train a Siamese CNN which accurately managed to determine their similarity and dissimilarity.Finally, using real-life data, CNN trained with functional motion data of a few cases of Parkinson's disease and even fewer controls discerned cases with an accuracy of 100%.

Conclusion
In this paper, we show a simple method to convert any curve into an image.Using convolution neural networks (CNN), we trained on a number of those images, together with a validation set of images in various regression and classification problems.The same technique could be used for regression and classification problems in gene analysis and other medical sciences.Other areas where one can explore are multiple output problems in which several parameters are estimated, or classification and regression problems are combined to find the type of curve and estimate its parameters.Extension of the method to allow other types of kernel to produce those images might be a viable extension to the main idea in this paper.However, the presented technique might require large functional data that we could not find at the time of writing this paper.In that case, we had to perform a data augmentation.New methods for efficient functional synthetic data might be needed to handle small functional data learning problems.

Figure 1 .
Figure 1.Diagram and procedure proposed for the regression and classification problems.

Figure 2 .
Figure 2. (a) The curve of y = exp(ωx) when ω = −.27.(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of y = exp(ωx) + σz when ω = −.27 and σ = .1where z is a standard normal random variable.(d) The 28 by 28 image that corresponds to the curve in (c).(f) The curve of y = exp(ωx) + σz when ω = −.27 and σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image that corresponds to the curve in (e).

Figure 3 .
Figure 3. Results of the exponential function regression using the test dataset without noise (a), with noise of magnitude σ = .1 (b), and with noise of magnitude σ = 1 (c).

Table 1 .Figure 4 .
Figure 4. (a) The curve of y = sin(ωx) when ω = 1.06.(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of y = sin(ωx) + σz when ω = 1.06 and σ = .1where z is a standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(e) The curve of y = sin(ωx) + σz when ω = 1.06 and σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image that corresponds to the curve in (e).

Figure 5 .
Figure 5. (a) The curve of y = cos(ωx) when ω = .96.(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of y = cos(ωx) + σz when ω = .96and σ = .1where z is standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(f) The curve of y = cos(ωx) + σz when ω = .96and σ = 1 where z is a standard normal random variable.(e) The 28 by 28 image that corresponds to the curve in (e).

Figure 6 .
Figure 6.Results of the sine function regression using the test dataset without noise (a), with noise of magnitude σ = .1 (c), and with noise of magnitude σ = 1 (e).Results of the cosine function regression using the test dataset without noise (b), with noise of magnitude σ = .1 (d), and with noise of magnitude σ = 1 (f).

Figure 7 .
Figure 7. (a) The curve is an example of a mixture of Gaussians G(x).(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of G(x) + σz when σ = .1where z is a standard normal random variable.(d) The 28 by 28 image that corresponds to the curve in (c).(e) The curve of G(x) + σz when σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image that corresponds to the curve in (e).

Figure 8 .Figure 9 .
Figure 8. Results of the maximum width regression using the test dataset without noise (a), with noise of magnitude σ = .1 (b), and with noise of magnitude σ = 1 (c).

Figure 10 .
Figure 10.(a)The curve of y = e −(x−2) .(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of e −(x−2) + σz when σ = .1where z is standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(e) The curve of e −(x−2) + σz when σ = 1 where z is standard normal random variable.(f) The 28 by 28 image corresponds to the curve in (e).

Figure 11 .
Figure 11.(a)The curve of y = (x − 2) 2 .(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of (x − 2) 2 + σz when σ = .1where z is a standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(e) The curve of (x − 2) 2 + σz when σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image corresponds to the curve in (e).

Figure 12 .
Figure 12. (a)The curve of an algebraic growth represented by y = x 3 .(b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of y = x 3 + σz when σ = .1where z is a standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(e) The curve of y = x 3 + σz when σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image corresponds to the curve in (e).

Figure 15 .
Figure 15.Results of Lyapunov exponent estimation using the test dataset without noise (a), with noise of magnitude σ = .1 (b), and with noise of magnitude σ = 1 (c).

Figure 16 .
Figure 16.Results of Lyapunov exponent estimation when the CNN is trained only using the noise-free data for the test dataset without noise (a), with noise of magnitude σ = .1 (b), and with noise of magnitude σ = 1 (c).

Figure 17 .
Figure 17.(a) A simulation curve of I. (b) The 28 by 28 image that corresponds to the curve in (a).(c) The curve of I + σz when σ = .1where z is a standard normal random variable.(d) The 28 by 28 image corresponds to the curve in (c).(f) The curve of I + σz when σ = 1 where z is a standard normal random variable.(f) The 28 by 28 image corresponds to the curve in (e).
c(t−6)) for some release rate c > 0, see(Pourmohamad & Ng 2023).We follow (Pourmohamad & Ng 2023) by using the logistic curve at the sampling time points, t = 5, 10, 15, 20, 30, 60, 90, and 120 minutes.To generate a number of similar and dissimilar profiles, we use release rates c 1 = .01+ .001* z and c 2 = .03+ .001* z where z are generated randomly and independently from the standard normal distribution.We can see the simulated curves in Figure19for (a) dissimilar and (b) similar curves.In addition, we have the histogram for f 1 and f 2 for the dissimilar (c) and similar (d) curves in the training set. Figure also shows the histogram for f 1 and f 2 for the dissimilar (e) and similar (f) curves in the test set.

Figure 19 .
Figure 19.(a) Example of dissimilar curves.(b) Examples of similar curves.(c) Training data histogram of f 1 and f 2 for the dissimilar curves.(d) Training data histogram of f 1 and f 2 for the similar curves.(e) Test data histogram of f 1 and f 2 for the dissimilar curves.(f) Test data histogram of f 1 and f 2 for the similar curves.

Figure 20 .
Figure 20.Confusion matrix shows the true predicted positives and the true predicted negatives without resulting in false positives or negatives when comparing 50 similar and 50 dissimilar pairs of curves.
a random translation of X within [−25, 25], random reflections, and translation of Y within [−50, 50], adding a random translation for Z within [−.5, .5].We also performed a random translation of the pressure within [0, 50] and a random translation of the angle within [0, 25].Data augmentation was performed four times more for the control class of the set since there are only 15 control images compared to the 61 case images.By doing that augmentation, we prepared training data of 300 control images and 305 disease images.In combination, we split it into 80% for the training and validation set, and the rest 20% is used in testing.See examples of Parkinson's and control patients in Figure 22.

Figure 22 .
Figure 22.Curves with their respective image transformation.(a) Example of a control subject's x spiral component time series.(b) The 28 by 28 image corresponds to the curve in (a).(c) Example of a Parkinson's patient's x spiral component time series.(d) The 28 by 28 image corresponds to the curve in (c).

Figure 23 .
Figure 23.Confusion matrix shows the true predicted positives and the true predicted negatives without resulting in false positives or negatives when using xy combination only.

Table 2 .
Correlation Coefficient, Intercept, and Slope for Sine Data with P-values

Table 3 .
Correlation Coefficient, Intercept, and Slope for Cosine Data with P-values

Table 4 .
Correlation Coefficient, Intercept, and Slope for Width Estimation Data with P-values

Table 5 .
Correlation Coefficient, Intercept, and Slope for Height Estimation Data with P-values

Table 6 .
Correlation Coefficient, Intercept, and Slope for Noise-Free Testing Data with P-values Also, we tested the capability that a CNN model trained with noise-free data in estimating the Lyapunov exponent in noisy data.Figure16and Table7show relatively good results.

Table 7 .
Correlation Coefficient, Intercept, and Slope for Lyapunov Testing Data with P-values when CNN is Trained with Noise-Free Data.

Table 8 .
Correlation Coefficient, Intercept, and Slope for Transmission Rate Testing Data with P-values (Costa & Sousa Lobo 2001., n} and {(t i , S i ) : i = 1, 2, ..., n}.If f 1 is between 0 and 15 and f 2 is between 50 and 100, then the two curves are considered similar; see, for example,(Costa & Sousa Lobo 2001, Pourmohamad & Ng 2023)for a complete set of models and measures, as well as FDA & EMA guidelines.There are several mathematical models of drug dissolution; see, for example, an important model of drug dissolution is the logistic curve f =