Proprioceptive wake classification by a body with a passive tail

The remarkable ability of some marine animals to identify flow structures and parameters using complex non-visual sensors, such as lateral lines of fish and the whiskers of seals, has been an area of investigation for researchers looking to apply this ability to artificial robotic swimmers, which could lead to improvements in autonomous navigation and efficiency. Several species of fish in particular have been known to school effectively, even when blind. Beyond specialized sensors like the lateral lines, it is now known that some fish use purely proprioceptive sensing, using the kinematics of their fins or tails to sense their surroundings. In this paper we show that the kinematics of a body with a passive tail encode information about the ambient flow, which can be deciphered through machine learning. We demonstrate this with experimental data of the angular velocity of a hydrofoil with a passive tail that lies in the wake generated by an upstream oscillating body. Using convolutional neural networks, we show that with the kinematic data from the downstream body with a tail, the wakes can be better classified than in the case of a body without a tail. This superior sensing ability exists for a body with a tail, even if only the kinematics of the main body are used as input for the machine learning. This shows that beyond generating ‘additional inputs’, passive tails modulate the response of the main body in manner that is useful for hydrodynamic sensing. These findings have clear application for improving the sensing abilities of bioinspired swimming robots.


Introduction
The locomotion of fish and other aquatic animals has many desirable characteristics such as energy efficiency, agility, and stealth [1][2][3], which have inspired the design of swimming robots [4][5][6][7]. The ability of animals to identify and exploit vortex wakes in water or air is well documented, from the famous 'V'shaped formation of migratory geese which allows trailing individuals to derive thrust from the wakes of those in front [8], to the ability of trout in fastflowing streams to use the wakes of obstacles to allow station keeping with low to zero energy investment, even when the trout in question is dead [9]. Closely related to and aiding locomotion is the ability of fish to sense and process spatiotemporal information in the water around them. Objects moving in water or stationary objects in streams perturb the flow in their immediate area, and for a wide range of Reynolds numbers will create a vortex wake. A swimming animal or an underwater robot encountering the wake created by another body experiences disturbance forces and moments. These disturbances can be associated with the disturbance velocity field and the bodies creating them. Essentially, information about fluid flow and the objects that create these flows is encoded in the spatiotemporal evolution of the vortical structures, whether the bodies creating them are cylinders, hydrofoils, underwater robots or fish [3,[10][11][12][13][14]. Many species of fish sense these flow features using their lateral lines, a grouping of mechanosensors utilizing small sensing hairs, as part of their multimodal sensing [9,[15][16][17][18][19][20][21]. Researchers have long been captivated by the sensing capabilities of the lateral line and have sought to mimic these. Considerable research and engineering has been devoted to creating artificial lateral lines through a variety of electromechanical sensors such as miniature pressure sensors [22][23][24][25][26], ionic polymermetal composite sensors [27,28], multi-layered silicon beams [29], and micro-fabricated hot-wire anemometry sensors [30], and these sensors have been useful to perform state estimation [31] and improve the swimming efficiency of robots [24]. The whiskers of aquatic mammals have been shown to serve a similar role in flow detection by translating the fluid flow into detectable whisker vibrations [32].
While these sophisticated sensors mimicking the lateral line have undeniably improved the sensing of the local flow field, alternative sensing abilities that augment the lateral lines in fish are being better understood. The discovery [9] that a dead animal can 'sense' the wake well enough to exploit it, with no sensory input beyond the fluid passively deforming its limp body, offers an indication that passive degrees of freedom may instill this wake sensing ability into an artificial robotic swimmer. Proprioception has been demonstrated to be used by fish as part of their multimodal sensing. For instance, the rays and membranes of fins have been shown to act as mechanosensors in catfish [33], bluegill sunfish [34], and wrasses [35]. Fin mechanosensation has been found to encode the velocity of fin bending as well as respond to cyclic stimuli of biologically relevant frequencies with the mechanosensory system being capable of providing stroke by stroke feedback [36]. Beyond flow sensing, it has been shown that fish can use proprioception to improve their efficiency of swimming with improved energy harvesting from the flow [37]. Such recent research in the proprioceptive ability of fish fins suggests that in the context of bioinspired robots, useful information about the flow, and in particular the vortex field around a robot, can be inferred from the kinematics of the robot or a part of it such as its tail, which can passively improve the robot's sensing capabilities in conjunction with existing lateral-line based approaches.
Extracting useful information about the fluid vortex field using even direct measurements of the fluid velocity field is, in general, a non-trivial problem [38][39][40][41]; extracting such information using only the kinematic information of a body immersed in the fluid is even more challenging due to the inherent complexities associated with coupled fluid-body dynamics. However, the body immersed in the fluid acts as a reservoir computer, with the input being the hydrodynamic forcing due to a vortex wake and its output being the resultant kinematics. Recent research into the computing power of dynamic systems in the context of reservoir computing [42,43], finds that applying complex time-series information as forcing to a complex nonlinear system (referred to as a 'reservoir') can result in information about the original system being encoded in simplified form in the states of the reservoir. In previous work [44] we introduce a rigid hydrofoil which is pinned at its leading edge in the vortex wake of a pitching upstream body as a physical reservoir, and showed that the one degree-of-freedom kinematics of its rotation contains enough information to accurately classify the wake Strouhal number without the need for sophisticated sensors. Here we extend that work by showing that a similarly pinned body with an additional freely rotating tail can serve as a more effective physical reservoir and encode more information about the flow into its two-dimensional velocity kinematics, allowing more accurate classification of wake parameters. Performing classification of the resulting time-series data is a well-researched problem in artificial intelligence with multiple viable solutions. In related prior work [44], a shallow dense artificial neural network (DNN) was used to perform a classification of time-series data obtained from the kinematics of the rigid hydrofoil in a vortex wake. Specialist architectures designed to exploit the specific characteristics of time-series data, such as convolutional neural networks (CNNs) and recursive neural networks (RNNs), have been found to match or exceed the performance of the state-ofthe-art non-deep learning algorithms on time series classification [45]. RNNs however are more complex than the feed forward CNNs with many more variables in the architecture leading to classification results that are more difficult to explain. CNNs and DNNs with a simpler architecture suffer less from this problem and can enable future physics informed learning. Therefore in this work we use a multipleinput CNN architecture for feature extraction on the kinematic data before using a DNN to classify flow parameters from those features.
We show that the body in the wake encodes more information in its kinematics and thus is a more effective reservoir if the body has a passive tail. We tested the classifying ability based on four cases of different time series data from bodies in a vortex wake. The first is the angular velocity of the large pinned body (head) with a passive tail, the second is the angular velocity of only the passive tail attached to the head, the third is the angular velocity of a the head and tail attached together in a fixed assembly, the fourth case is one where the angular velocities of both the head and tail are fed as inputs to the CNN, the fifth is where the head is tested without a coupled tail, and the sixth where the tail is tested without a coupled head. Not surprisingly, the classification of the wakes was the most accurate in case four when using two inputs, but very surprisingly in both the second and third cases classification shows a significant improvement over the first case. The mere presence of a passive tail on a body can improve its hydrodynamic sensing ability even if data on the tail dynamics is not directly used in the classification. This result implies that the passive tail modulates the motion of the rest of the body to better encode information about the vortex wake.

Experiments
We designed an experimental setup with two hydrofoils placed in a water tunnel as shown in figure 1(a). The experiments were conducted in an Engineering Laboratory Flow Visualization Tunnel. The water tunnel has a working length of 152 cm and a testing cross-section of 232 cm 2 and is capable of producing laminar flow at speed u ∞ up to 1 m s −1 . The leading (upstream) hydrofoil has a chord length of 7.24 cm. In the first set of experiments, the downstream body consists of a smaller NACA 0045 airfoil of length 5.39 cm pinned to an ellipse with a major axis of length 7.00 cm and a minor axis length of 5.54 cm, which we refer to as the 'pinned assembly' . In the second set, the same body is considered but with the pin locked, resulting in a streamlined body that we refer to as the 'fixed assembly' . In the third and fourth sets of experiments, the downstream body from previous experiments is separated into its two constituent bodies, and each is tested individually. In each experiment, the trailing body is tethered from its leading edge to a bar of extruded aluminum which does not contact the water, so the movement of the hydrofoil is solely the result of its body interacting with the vortex wake. The tether is a lightweight fishing line that is 1 centimeter in length to limit the heaving motion in the trailing hydrofoil, effectively acting as a very lowfriction pin. The upstream hydrofoil is actuated to perform periodic pitch oscillations about its centroid by a Spektrum H6210 servo motor controlled by a Raspberry Pi Pico. This generates a reverse Kármán vortex street as shown in the supplementary video 1. Supplementary videos 2-5 show sample dynamics of the pinned-assembly, fixed assembly, the head and the tail respectively. The distance D between the leading edges of the upstream and downstream foils can prescribed by moving the downstream foil's supporting structure. With such an arrangement each experiment is repeated for D ∈ {16.5, 30.0} cm.
The leading hydrofoil is actuated to execute prescribed yaw oscillations of angular amplitude A and time period T in the prescribed flow of the water tunnel. The trailing hydrofoil is free to execute yaw oscillations in response to the hydrodynamic forcing of the vortex wake created by the leading hydrofoil. This combination of A, T, and u ∞ constitute the control parameters of this experiment.
The oscillations of the leading hydrofoil are programmed to generate nominally periodic motion corresponding to a square wave of angular position. A perfect servo would then generate thin peaks of very high angular velocity, which results in an unrealistic wake. To mitigate this effect, the magnitude of the angular velocity is limited by the controller to 200 deg s −1 . As a result, the angular velocity takes the form of broad peaks which can be described as the superposition of multiple frequencies (or harmonics). Furthermore, torque constraints and jitter of the forcing servo cause deviations from periodic angular velocity producing small changes in both the amplitude and frequency of oscillations. The angular motion of upstream and downstream hydrofoils is measured using overhead cameras. Multiple circles are drawn on the top of each body (indicated in figures 1(a) and (b) by pale dots), whose positions are recorded by a video camera at 30 Hz. The positions of the centers of the pale circles are then identified in each frame of the video using a circular Hough transform [46], which are then used to calculate body angles θ 1 , θ 2 , θ 3 , and θ 4 . The angular velocities ω 1 =θ 1 , ω 2 =θ 2 , ω 3 =θ 3 , and ω 4 =θ 4 are then computed using first-order forward finite differences. This numerical derivative introduces noise to the velocity data, but this step would not be necessary in an autonomous robot, which would likely have angular velocities calculated by an accelerometer. To preserve as much of the information as possible, no filtering is applied to the data. Figure 2 shows a sample time series of these processed kinematics.
Each experiment was performed for a period of 10 min, giving angular velocity time series of the same This indicates that the angular velocity has encoded information that will enable the period of the upstream body to be classified. A histogram for the root-mean-square value of the angular velocity of each sample is also shown for the corresponding experiments for (c) the coupled head and (d) the coupled tail, where the rms value is computed for each a total of 120 successive 5 s snapshots for each of the two overlayed time series. These results show significant variation between different snapshots of the same dataset, which indicates that features may not cluster clearly and so may not be amenable to traditional classification approaches. duration at intervals of 1/30 s. A discrete Fourier transform is performed on these time series of the angular velocity of the downstream hydrofoil. The calculated peaks in the frequency domain are converted to time periods and shown in figures 3(a) and (b). The peaks in this figure correspond to the prescribed nominal forcing periods and their harmonics, though small noise is present about each peak, showing the small variations in frequency. The post processed angular velocity data is broken down into smaller time series, each of duration 5 s. The measured amplitudes of velocities of the trailing body are calculated for each of these 5 s windows. A histogram of the overlayed distribution of amplitudes from two experiments measured in this manner is shown in figures 3(c) and (d). The amplitudes are broadly distributed within each experiment, and there is overlap between the amplitudes of the two experiments, which indicates that this classification problem is non-trivial.
The angular position θ 1 of the leading hydrofoil with respect to the free stream can therefore be described by the equation where A(t) is the oscillation amplitude, F(t) is time periodic function of unit amplitude with T as the time period of oscillations. While the actual amplitudes and time periods vary, these are clustered around the nominal prescribed values given by the ordered label sets, While all of the three parameters above are needed to uniquely specify a wake, a single non-dimensional number reflecting the wake structure is useful to assign as a label to a wake generated by the leading hydrofoil. The Strouhal number, St, is suitable for this purpose, and is defined as where f = 1 T , θ M is the mean one-sided oscillation amplitude for a dataset, and L is the distance from the center of rotation to the trailing edge of the leading hydrofoil, as shown in figure 1(a). The nondimensional Strouhal number, frequently used in fluid mechanics and specifically in the context of fish swimming to describe the periodic motion of flapping bodies and the associated vortex wakes [47], is a suitable label that encodes information about the motion of the body generating the vortex wake for body Reynolds numbers ranging from 10 3 to 10 8 [3]. The Strouhal number, in this case, contains information about the frequency of the vortex shedding, the distance between the consecutively shed vortices, and the length of the source body that creates the vorticity.

Wake classification and neural network architecture
Though multiple minutes of data are available for every wake, our objective is to develop a classifier that can determine changes in the wake in near real time, requiring high performance on only a portion of the available data. Previous work [44] with a different network architecture found that, for a rigid hydrofoil, windows of input exceeding 5 s in length do not show significant increases in classification accuracy, so we adopt a 5 s time series, or 150 points at 30 Hz, as the input. We will denote the set of all 5 s windows of time series extracted from the experimental measurements as X. Corresponding to each data window are three known parameters: T, A, and u ∞ . The objective of classification is to assign a probabilistic labelT, A,ū ∞ andSt and compare these with the known values of the labels T, A, u ∞ and St respectively. However, letting the classifier make this assignment directly does not allow the classifier to express uncertainty, and is difficult to train because the gradient of accuracy with respect to classifier parameters is either 0 or infinite throughout the parameter space, because each classification is either correct or incorrect with no partial credit. To generate smoother error gradients, we instead consider estimation of discrete probability distributions. For instance, given a 5 s time series input x i we estimate a discrete probability dis-tributionT(x i ) : X → p t ∈ R 3 such that p t (j) ⩾ 0 and ∑ 3 j p t (j) = 1. ThereforeT(x i ) is a probabilistic vector of size 3 (since there 3 labels in the set T) with the jth entry in this vector being the probability that the time period associated with the time series input x i is T j . The probabilistic vector labelsĀ(x i ) (of length 3), u ∞ (x i ) (of length 3) andSt (of length 27) are defined similarly. With the classifier output defined as a probability distribution, we can redefine the objective of our classifier: instead of trying to maximize accuracy directly, we attempt to find a classifier that minimizes the cross-entropy for every segment of data.
EstimatingT(x),Ā(x),ū ∞ (x) andSt(x) given 150 points of time series data is a general time-series classification problem, and many algorithms exist that can perform such a classification. Recent advances in automatic differentiation have led to the dominance of neural networks for this task, which have recently reached parity with state-of-the-art non-neural network classifiers [45]. In particular, CNNs are wellsuited to time series classification for their tolerance of shifted input [48].
Though CNNs were originally based on the visual cortex of cats and designed primarily in the context of visual processing [48], their properties have recently led to their increased adoption in time-series classification [49]. CNNs apply a constant kernel to sequential windows of data, with each sequential input window mapping to a sequential element of the output. This leads to a tolerance of shifted inputs: a change in the phase of the input layer will only cause a change in the phase of the output layer, a property known as shift equivariance. This property is attractive for this problem because the overall phase of the time series carries no information about which experiment it corresponds to.
Pooling functions are often used to sequentially reduce the data dimension and introduce nonlinearity during each iteration. Here we use max pooling with width two, reducing the dimensionality of the transformed data by half per layer. Convolving and pooling are repeated iteratively until information is condensed into a 'feature vector' of much lower dimension than the input, and the feature vector is typically an input to a DNN which performs the estimation. The specific network used here is illustrated in figure 4. We use four convolution iterations with 5 kernels in each iteration, with a kernel size of 7 and step of 1. The dense ReLU-activated neural network has four layers, with 200, 100, 100, and 50 neurons, listed in order from input to output. Directly densely connected to the 50 unit layer are three output layers each with three units corresponding toT(x),Ā(x), andū ∞ (x) respectively, each using a softmax activation function to ensure that the probability vector sums to 1. The Strouhal number St is predicted by a separate network of the same architecture, except with a different output layer corresponding to the 27 labels, also with a softmax function. For the case where time series from both the head and tail are considered simultaneously, a CNN is created for each of the two inputs, with each allowed to have different weights. The two resulting feature vectors are concatentated and input to a single DNN of the same dimensions as above. Evaluating this network for a single time series takes a total of 14 ms on an Intel generation 8 i5 processor, which is fast enough to be run real-time on an onboard microprocessor.
Because of the limited amount of training data and the complexity of the network it is susceptible to overtraining, so we use a modified version of the early-stopping algorithm [50] which determines a stopping time based on generalization loss and selects the correct network based on its performance on a separate validation data set, which prevents overfitting to the training data. In every iteration, a new 75 data vectors are derived from the designated training data for each experiment, and stochastic gradient descent is used to minimize the loss. The location that these 5 s time series data vectors are extracted from in the longer training time series is random and allowed to overlap with past and future data vectors, so the total number of 5 s windows that can be constructed from 10 min of experiment data is vastly greater than the 120 that would be allowed without overlap. Every five training iterations, cross-entropy loss is then calculated for 150 time series from the validation data and the lowest value seen so far, denoted e i , is stored with its corresponding network weights. The training is terminated at iteration k when e k > 1.2 e i , which indicates that the validation error has passed its minimum and overfitting has begun. The network weights corresponding to the lowest validation loss value are taken as the optimal classifier. This methodology has two potential problems that make comparison for the results between different classifiers difficult: it is possible that the stochastic gradient optimization becomes trapped in a local minimum, and noise is introduced into the validation error because the error calculated on a subset of the validation data is similar but not equal to the true validation error over the entire set of possible validation data vectors, which is too expensive to compute. These problems are both mitigated by repeating this procedure 30 times, and selecting the best of the 30 resulting network weights by their performance on a large validation data set of 60 000 overlapping time series vectors.
To perform the above procedure, three data sets are needed: training, validation, and testing. These must be selected carefully to avoid overlap between windows, as the classifier will likely be more accurate on data that it has already seen, even if only partially, and that could lead to overconfidence in the classifier's accuracy on truly new data. The simplest method to avoid overlap is to split the time series into three portions of different lengths, from which each type of data can be drawn. We designated the first 70% of each experiment as training data, the next 20% as testing data, and the final 10% as validation data.

Wake classification results
Training on the different datasets resulted in different training rates and convergence to significantly different loss values, as demonstrated in figure 5, where the evolution of the distribution of loss values for the 30 trained networks for each dataset are shown. The fixed assembly data causes convergence to a high loss value, the head and tail data classifiers both converge to a similar medium loss value, and the classifier using both head and tail data as input reaches the lowest overall loss. However, this loss is the sum of the time period, amplitude, and flow velocity losses, and it also gives no indication what specific parts of the parameter space each classifier has difficulty with. To visualize these details, we use confusion matrices.
The performance of each deep network is quantified through a confusion matrix C whose elements C i,j represents the fraction of wake i sample that were classified as wake j. The diagonal elements of this matrix C i,i represent the fraction of correctly classified wakes of label i. By definition 0 ⩽ C ij ⩽ 1 for all i and j. For an effective classifier, large values are found along a diagonal, and for our axis labeling convention that axis is from the top-left to bottom-right. Values off of this diagonal represent incorrect classification; a large C ij means that wake label i is frequently confused to be wake label j.

Time period, amplitude, and flow velocity
Wakes can be categorized by the parameters of the motion of the leading hydrofoil generating these wakes. Separate confusion matrices for the classification of wakes based only on the individual parameters of time period T, amplitude A and free stream velocity u ∞ were created. The confusion matrices for the forcing period are shown in figure 6. Accuracy appears roughly consistent for the different forcing periods, and no individual forcing period has an accuracy of less than 99%. The only bodies to not accurately classify the period to within rounding distance of 100% are the coupled and uncoupled tails, which can likely be attributed to their small inertia and streamlined shape causing excessive sensitivity to components of the wake with frequencies different than the driving frequency. Given the clearly evident forcing period in frequency spectra in figure 3, the upstream forcing period was clearly encoded into the kinematics of the body, so the result that the CNN was able to extract the forcing period from the kinematics with high accuracy is not surprising.
The oscillation amplitude of the upstream hydrofoil is more prone to classification errors, with accuracies for the classification accuracy varying between 78% and 97% as shown in figure 7. The kinematics of the head body with the attached tail enable better classification for every amplitude than either the uncoupled head or uncoupled tail, indicating that the richer dynamics induced by the coupling between the bodies allows more effective encoding of wake structures corresponding to amplitude into the kinematics. When data from both the head and tail kinematics are used simultaneously, the accuracy improves further for every label, indicating that while the kinematics are coupled, the kinematics of each body still holds information that cannot be easily extracted from the other. Free stream velocity classification has a higher accuracy than that of classifying amplitude but is less accurate than that of forcing period classification, with figure 8 showing peak classification accuracy for each dataset falling between 91% and 99%. Similar to the amplitude classification problem, the head kinematics and the tail kinematics yield similar classifications, but the data encoded by each is different enough that the combined classification can greatly outperform both. The coupling of the two bodies yields benefits to both, though is effect is most clear for the coupled head link, which has higher accuracy for all amplitude values than either uncoupled body or the fixed assembly. The coupling increases the classification accuracy of the tail on the low amplitude, but decreases it on the medium and large amplitudes. This may indicate that the tail is oversensitive to the flow due to its small mass and sharp edge, and the additional information passed through the coupling is counterproductive when the high amplitude upstream oscillations are already inducing rich tail oscillations. However, when the forcing amplitude and corresponding tail oscillation amplitude are lower, the increased sensitivity to the wake provided by the coupling provides a net benefit.

Strouhal number classification
In this problem, the desired information about the upstream hydrofoil is encoded twice: it is first encoded into the wake structure, which then encodes it into the downstream body through a complex fluid-body interaction. In the previous section we demonstrated that different downstream bodies have substantially different classification accuracies when placed in identical wakes, indicating a loss of information in the second encoding step. There is also a loss of information in the first step, because a given wake Strouhal number, which is a scalar quantification of wake structure, cannot be mapped back to a unique combination of parameters for the upstream hydrofoil. Performing a classification of St directly should reduce the loss of information loss in the first encoding step and allow the second encoding step, which is the main interest of this work, to be investigated more directly.
We repeat the classification procedure from the previous section, with identical network hyperparameters (excluding the output layer) and with 30 networks trained per kinematic dataset. As each unique set of (A, T, u ∞ ) has a unique Strouhal number, there are half as many labels as there are experiment runs (due to the distance parameter D not affecting St), Figure 7. Confusion matrices for the forcing amplitude of the upstream hydrofoil in degrees, divided into three labels. Kinematic data is derived from from (a) the head and (b) the tail of the pinned assembly, (c) both the head and the tail, (d) the fixed assembly, (e) the uncoupled head, and (f) the uncoupled tail. The higher amplitudes appear more difficult to classify than the lower amplitude, with most mistakes involving mistaking the high amplitude as medium, and conversely the medium amplitude as high.
which are not necessarily distributed evenly. Confusion between cases with similar Strouhal numbers but dissimilar underlying parameters indicates confusion due to similar wake structures, and the ability of different classifiers to discern the differences between similar wakes can be observed.
The confusion matrices for the Strouhal number classification are provided in condensed form in figure 9, while the expanded images with exact probabilities are given in the appendix as figures 12-17. A more compact representation of the classification is via the accuracy defined as the diagonal elements of a confusion matrix. This accuracy of the Strouhal number classification for all the 6 cases, shown in figure 10, is lower than the accuracy of classifying any of the other three parameters, as in the best case identifying the specific experiment run correctly is equivalent to simultaneously identifying all of the other parameters correctly. Additionally, because there are many labels, experiments with very similar Strouhal numbers must be differentiated, which increases misattribution error.
The variation in overall St classification accuracy between the sets of kinematic data follows a familiar pattern: the classification based on the fixed assembly kinematics is substantially less accurate than that based on either the coupled head or coupled tail kinematics, again indicating that the existence of the tail passively improves the encoding of wake data into the head kinematics on average. Additionally, the coupled head kinematics enable higher accuracy than either the uncoupled head or tail, indicating the positive effect of the coupling.
The results shown in the confusion matrices are using the best neural network, selected from the 30 networks trained from random weights on the each of the kinematic datasets. To represent and rank the results of the classification for such a large number of networks more compactly we choose a single number: the average accuracy for each confusion matrix generated from each neural network. The average accuracy is merely the average value of the diagonal elements of a confusion matrix, and the mean of the average accuracy of the T, A, and u ∞ confusion matrices is used to select the representative 'best' overall network, which was used to generate the results in figures 6-8. This average accuracy of classification of time period, amplitude, free stream velocity and Strouhal number by each of the networks is shown by a scatter plot in figure 11. With this measure, the classification results, provided in figure 11 show that every network yields an accuracy of at least 79% for any parameter on any of the downstream bodies. The accuracy does vary substantially depending on the parameter The larger bodies (uncoupled head and rigid foil) appear to be the least accurate at this task, but the coupling appears to improve the classification accuracy from the coupled head to be higher than that of either uncoupled body. measured: all of the kinematics can be used to classify frequency with accuracy greater than 98% and flow velocity with accuracy greater than 91%, but the lower bounds for the accuracy in classifying forcing amplitude is 79% and for Strouhal number it is 77%. The high accuracy of the frequency estimation was expected because of the efficient transmission of frequency information by the wake between the forcing hydrofoil and the forced hydrofoil: the forcing hydrofoil sets the dominant frequency of the wake, which in turn sets the dominant frequency of the downstream hydrofoil. This dominant frequency can be seen clearly in figure 3. By comparison, the encoding of A and u ∞ on the wake manifests itself in a change in wake structure and intensity, which has a far more complex and nonlinear effect on the downstream hydrofoil making it more difficult to classify based on the angular motion alone of the trailing hydrofoil.
The significant qualitative result that emerges from the average accuracy of multiple networks is that the angular velocity data of just the head of the two segment hydrofoil encodes more information about the ambient wakes than the angular velocity of a single segment hydrofoil, or of either segment of the two segment hydrofoil when tested individually and uncoupled. Using two time series data, that of both the coupled head and coupled tail, further improves the classification accuracy. We further performed statistical hypothesis testing on the classification shown in figure 11 to see how likely the improved classification was just a lucky outcome. We assumed the null hypothesis, H 0 to be that 'the angular velocity of only a head segment of the pinned assembly does not result in higher average classification accuracy of the wake Strouhal number than does the angular velocity of the fixed assembly' . We used the p-value to accept or reject the null hypothesis with the significance level, α set at 5%, which is very commonly used. Using the χ 2 test, the p-value was found to be p ≈ 0 which is below the chosen α level, therefore the null hypothesis is rejected. The very low value p ≈ 0 is due to the fact that the classification accuracy of St the best network for each dataset are substantially different, with average accuracy 0.94 and 0.88 for the head data and fixed assembly data, respectively. Combined with a very large number of test data sets (48 000) used to evaluate these accuracies, the probability that the underlying networks are not significantly different becomes negligibly small.   . The accuracy of each of the 30 trained networks for the coupled and uncoupled head and tail, combined data for the pinned body, and fixed assembly classification tasks on the testing dataset, shown for forcing period, amplitude, flow velocity, and Strouhal number from left to right. Accuracy is defined here as the ratio of time-series vectors for which the highest probability corresponds to the known correct value. The large difference in outcomes for networks trained with the same procedure but different random weights and stochastic gradient descent choices indicates the non-convexity of this problem, with classifiers in some local minima having roughly double the loss of the best found classifier, even trained and evaluated on the same sets of kinematic data.

Discussion and conclusion
The results in this paper show that the hydrofoil with a passive tail acts as a superior reservoir that generates a kinematic response encoding more useful information about the ambient wake than an equivalent hydrofoil without the additional degree of freedom. This is not just because the additional degree of freedom provides more kinematic information: the mere presence of a tail modulates the response of the head in a manner such that the head's own kinematics encode more information about the wake. The classification results are based on training 30 neural networks, each of which were randomly initialized. The qualitative result, that the kinematics of the body with two segments can better classify wakes, is therefore independent of any one particular network. This result has significance to further understanding the role of passive tail or fin like segments on a fish-like robot; the resulting kinematics of these passive segments can provide useful information about the ambient flow to the robot and enhance real-time multi-modal sensing by underwater robots.
The results in this paper as well as in [44] are valid under the assumption that the the sensing body is directly behind the forced one in the flow. Some lateral displacement (offset) of the trailing body from the center line of the reverse Kármán vortex wake created by the leading body, occurs naturally in the experiments due to the short tether with which the trailing foil is connected to the water tunnel. However, the more general setting where the trailing body has significant lateral displacement from the center line of the vortex wake can make the classification more challenging, and may first require an estimation of the lateral offset distance such as in [51]. While it is likely that passive degrees of freedom will yet confer a sensing advantage in that setting, we leave further investigation to future work.
Future work on sensing and classification of wakes and identification of flows in water can be in the direction of combining models and data driven methods using operator methods such as in [51], physics informed machine learning by making use of concepts like Local Interpretable Model-Agnostic Explanation [52], Layer-wise Relevance Propogation or Taylor Decomposition [53], where the parts of the signal and their combination that led to the classification and the evolution of the weights and layers in the network can be identified. Uncertainties (epistemic and aleatory) due to real flow conditions can be better handled by Bayesian neural networks and the work in this paper can be a starting point towards such Bayesian classification under uncertainties.
Besides the aspects of machine learning, the problem of proprioceptive wake identification and similar sensing problems can lead to questions and a verifiable pathway for experimental investigation of the role of the shape, placement and stiffness of tails or fins of a fish. Such structural or morphological aspects have usually been investigated from the perspective of swimming efficiency, speed and agility and less so from their role in sensing flow structures.

Data availability statement
The data that support the findings of this study will be openly available following an embargo at the following URL/DOI: http://people.clemson.edu/ ptallap. Data will be available from 30 October 2023. Figure 14. Confusion matrix for the classification of the Strouhal number using combined coupled head and coupled tail kinematics. The additional data allows better classification accuracy than using either set of kinematic data alone.