Performance of neural networks for localizing moving objects with an artificial lateral line

Luuk H Boulogne; Ben J Wolf; Marco A Wiering; Sietse M van Netten

doi:10.1088/1748-3190/aa7fcb

1. Introduction

Along the sides of their body, fish have a mechanoreceptive lateral line organ that enables them to detect nearby moving underwater objects producing local water flow (Dijkgraaf 1963, Coombs et al 1988, pp 554–5).

This sensory lateral line organ consists of an array of individual detectors, called neuromasts, that are sensitive to water flow velocity. Each neuromast contains hair cells that detect the movement of the water at the location of that neuromast (Flock and Wersäll 1962). The lateral line organ is used for a variety of different tasks. It allows fish, also when no light is available, to detect e.g. prey (Hoekstra and Janssen 1985) and predators and facilitates schooling (Partridge and Pitcher 1980). The related sensory modality is sometimes described as in between touch and hearing and is sensitive to the near field component of pressure gradients (Kalmijn 1988, van Netten 2006).

The present work is inspired by the lateral line organ and is intended to be used in the signal processing and interpretation of excitation profiles measured along artificial arrays of individual flow velocity sensors to efficiently localize moving objects.

In previous research, dipole fluid flow models are used which predict excitation patterns along a stationary array, given a source that is vibrating in a direction with a specific angle with respect to the array (Ćurčić-Blake and van Netten 2006, Goulet et al 2008, Dagamseh et al 2010, Abdulsadda and Tan 2013b) or moving in a specific direction (Franosch et al 2005), all under conditions of potential flow.

Using these models, excitation patterns for different locations and directions of a moving spherical source can be generated. Several neural network architectures are considered in the present work for their ability to accurately decode the location x, y of a moving object from the excitation patterns along a stationary artificial sensor array.

For both simulated and physical artificial lateral line arrays, several algorithms have been put forth to decode a dipole-like source location from excitation patterns. A data-matching approach where a measured excitation pattern is compared to a large set of templates was used by Pandya et al (2006). In later studies (Yang et al 2006) these excitation patterns were matched to a gaussian which is a crude approximation of its wavelet nature (Ćurčić-Blake and van Netten 2006). This matched (gaussian) filter approach was later shown to be outperformed by Capon's beamforming algorthm (Yang et al 2010).

In Abdulsadda and Tan (2012), a relatively small multilayer perceptron (MLP) neural network with maximally 24 hidden nodes and 6 input nodes was found able to decode a dipole source location. With the length of the array as BL (body length) and in a $2BL\times BL$ area, their reconstructed location has an average Euclidian error of 1.5% BL on a sparse ( $<$ 100 samples) data set. From the supplied typical sensor signals (Abdulsadda and Tan 2012, p 234), we estimate a signal to noise ratio of about 30 dB. In later research, the authors remark that ‘due to the black-box nature, that approach requires a lot of training data unless the dipole vibration amplitude and orientation are known’ (Abdulsadda and Tan 2013a), but concerns a limited parameter set and optimization scheme.

The present work focuses on processing lateral line excitation patterns using neural networks. By exploring and optimizing several neural network architectures for a generic stationary velocity sensor array, the results may therefore be considered to be independent from particular stationary sensor array characteristics.

An attractive property of neural networks when used in combination with operational velocity sensing arrays, is that the neural network also can take into account and correct for variations in the individual physical sensors characteristics and noise. Furthermore, these biomimetic signal processing methods also allows for rectified parallel processing as observed in fish (Chagnaud and Coombs 2013).

The network types used in this research are multilayer perceptrons (MLP) trained with the back-propagation algorithm (Rumelhart et al 1985), extreme learning machines (ELM) (Huang et al 2006) and echo state networks (ESN) (Jaeger 2002).

We have selected these three different neural network methods, because they are well established and have different advantages and disadvantages, which allows for interesting comparisons. Multilayer Perceptrons have the advantage that the features which the hidden units extract from the inputs are learned by using the back-propagation algorithm. This makes them slower than Extreme Learning Machines, which initialize the input to hidden unit weights to random values which are then not trained further. The Echo State Network was chosen, because it has the property that it can use previous inputs from the time-series signals and therefore it can use more information. This is at the cost of being governed by more complex dynamics than the other models. With these choices, we will have the opportunity to observe which method performs best and whether partial training or memory affects localization performance.

2. Methods

2.1. Data generation

The MLPs, ELMs and ESNs are trained using a training set and their performance assessed by a test set. With the MLPs, a third set of data (validation set) is also used to avoid overfitting during training. For these three sets, three different trajectories of source object movement are used, see section 2.1.2.

2.1.1. Computing water velocities.

The data sets used in this research resemble the information perceived by the artificial lateral line organ and consist of simulated hydrodynamic data. For the construction of the data sets the fluid velocities caused by a source object (sphere) moving in a 2D plane in a 3D volume through water are calculated. Figure 1 shows this scheme.

**Figure 1.** Top view of geometry. The spherical source moves with a fixed velocity in the x,y-plane in a direction that has a variable angle ϕ with the direction of the stationary artificial lateral line array. The position of a sensor on the simulated array is denoted by s.
Download figure:
Standard image High-resolution image

The sensor array is located along the trajectory running from coordinate ( $-D$ , 0) to (D, 0). The sensors are equally spaced along this line. The first sensor is located at coordinate ( $-D$ , 0) and the last sensor at (D, 0).

An experimental justification for using potential flow in a similar set up as assumed in the present study has been obtained in studies on fish. Both the shape of the wavelets and specifically the distance coding in its spatial characteristics predicted by potential flow have been observed in the biological lateral line responses of fish (Ćurčić-Blake and van Netten 2006). It was shown that the boundary layer of the stationary fish was not affecting the flow field to a high extent, which can also be expected using artificial lateral line sensors. This observation is supported by a theoretical study by Goulet et al (2008) and by a computational fluid dynamics study (Rapo et al 2009).

In Franosch et al (2005) a similar method to find the fluid velocity distribution for a moving source object (sphere) with a constant speed is considered. This method too was used for the case of inviscous potential flow. The velocity field distribution described in that work can be shown to be equal with the distribution in equation (2.1). This implies that the present work can equally well be applied to a vibrating source at different locations as well as moving sources.

The fluid velocity component parallel to the array at position s on the sensor array, $v(s)$ , is given by

$\begin{eqnarray}&&v(s)=C\left({{\psi}_{o}}\sin \phi -{{\psi}_{e}}\cos \phi \right),\end{eqnarray} \tag{ 2.1 }$

with

$\begin{eqnarray}&&C=\frac{W{{a}^{3}}}{2{{y}^{3}}},\end{eqnarray} \tag{ 2.2 }$

where W is the velocity of the source object and a is the radius of the sphere. The angle of the source with respect to the sensor array in radians (see figure 1) is ϕ, and the even wavelet ${{\psi}_{e}}$ and the odd wavelet ${{\psi}_{o}}$ are described respectively by

$\begin{eqnarray}&&{{\psi}_{e}}\left(s,x,y\right)=\frac{1-2{{\left(\frac{s-x}{y}\right)}^{2}}}{{{\left[1+{{\left(\frac{s-x}{y}\right)}^{2}}\right]}^{\frac{5}{2}}}},\end{eqnarray} \tag{ 2.3 }$

$\begin{eqnarray}&&{{\psi}_{o}}\left(s,x,y\right)=\frac{-3\left(\frac{s-x}{y}\right)}{{{\left[1+{{\left(\frac{s-x}{y}\right)}^{2}}\right]}^{\frac{5}{2}}}}.\end{eqnarray} \tag{ 2.4 }$

Here (x,y) denotes the instantaneous coordinate of the moving object. The equations (2.3) and (2.4) show that the shape of the even and odd wavelets solely depends on the location of the source object with respect to the sensor location. The shape of these wavelets is shown in figure 2.

**Figure 2.** The even and odd wavelets ${{\psi}_{e}}$ and ${{\psi}_{o}}$ for a moving source located at coordinate (0, 0.2D). The wavelets are defined in equations (2.3) and (2.4) respectively and scale spatially with the distance to the array y.
Download figure:
Standard image High-resolution image

**Figure 2.** The even and odd wavelets ${{\psi}_{e}}$ and ${{\psi}_{o}}$ for a moving source located at coordinate (0, 0.2D). The wavelets are defined in equations (2.3) and (2.4) respectively and scale spatially with the distance to the array y.
Download figure:
Standard image High-resolution image

In Ćurčić-Blake and van Netten (2006, p 1551) it was noted that the spatial variations along the x direction, as described by the even and odd wavelet functions, scale linearly with the distance of the source y. It was also shown that the maximum amplitude of the even wavelet is reached at the point of the lateral line that is closest to the source and that the odd wavelet is zero at this position. This is the location on the lateral line that is equal to the x coordinate of the source object.

2.1.2. Paths of sphere movement in the data set.

Given the implicit memory present in the ESN architecture, the hydrodynamic data for all architectures is presented in trajectories rather than discrete locations so that for predicting the current location, the internal representation of past detections can be used.

The source object starts at time $t=0$ at a random location (x₀,y₀) in the Cartesian system where x₀ is taken from a uniform distribution with range $\left[-D,D\right]$ and y₀ is taken from the uniform distribution with range $\left[0,D\right]$ . The object remains located in a $2D\times D$ area to one side of the lateral line (see figure 1).

For the data sets it is assumed that the source object moves with a constant velocity of 0.1D per time step. The direction ${{\phi}_{t}}$ in radians in which the object moves at time step t is

$\begin{eqnarray}&&{{\phi}_{t+1}}={{\phi}_{t}}+A.\end{eqnarray} \tag{ 2.5 }$

Here, A, in radians, is taken from the uniform distribution with range $\left[-1,1\right]$ . The change in angle per time step is therefore limited to about 60 degrees.

The next location at time $t+1$ is selected by moving the sphere in the direction denoted by angle ${{\phi}_{t}}$ over a distance 0.1D.

When the source object would move outside of the area boundaries, the next movement direction is altered as shown in figure 3. If the object would still cross a boundary when it is moved in the resulting direction, the movement direction is similarly changed again.

**Figure 3.** The source object at time t (gray sphere) would cross the field boundaries if it would be moved in the original movement direction along the dashed movement vector. Therefore, instead it is moved in a different direction along the different solid movement vector. This new (solid) movement vector is obtained by mirroring the original (dashed) movement vector, where the mirror (grey vertical) line is centered on the source object and parallel to the field boundary that is about to be crossed. The solid white sphere shows the location of the sphere at time $t+1$ .
Download figure:
Standard image High-resolution image

2.1.3. Splitting water velocities into rectified inputs.

Each data point in the data sets represents the fluid velocities detected by sensors that are equally spaced along a one-dimensional sensor array. This concatenation of fluid velocities is used as input for the neural networks.

In fish neuromasts, two types of velocity detecting hair cells are present. The first type only detects water flow in one direction, while the second type only detects flow in the opposite direction (Flock and Wersäll 1962). The two sorts of information perceived by the two types might be transmitted separately to the central nervous system (Münz 1989, p 290). If this is so, the fish is able to discern between positive and negative fluid velocities.

As a preprocessing step, this biological parallellarization of information is imitated; each sensor reading is represented using two values. The first value only represents positive velocities. It is zero for negative velocities. For the second value, the same applies, but vice versa. Since lateral lines with 16 and 32 neuromasts are simulated, this results in using 32 and 64 inputs. We used this input doubling because of the observed enhanced performance in source localization.

2.1.4. Adding noise.

To test the noise robustness of the networks, different levels of noise are added to the fluid velocities computed for test sets.

The noise is taken from a normal distribution with a mean of zero. Different noise levels are applied via variation of the standard deviation of the noise. The noise level is defined in this paper in terms of the signal noise ratio (SNR), which is given as:

$\begin{eqnarray}&&\text{SNR}=10{{\log}_{10}}\frac{{{A}^{2}}}{{{\sigma}^{2}}}~\text{dB}\end{eqnarray} \tag{ 2.6 }$

In this equation, A is the maximal magnitude of deviation from zero of an input excitation pattern after scaling (section 2.1.5) has been applied. ${{\sigma}^{2}}$ is the variance of the normal distribution out of which the noise values are taken.

The distance from the source to the artificial lateral line y affects the amplitude of the signal in the excitation pattern, see equation (2.2). To investigate the effects of noise independent of y, noise is added to all sensor signal inputs to obtain the required signal to noise ratio and then scaled, according to equation (2.7).

Adding noise affects the range of the values in an input excitation pattern. In real-life applications in which noise quantities are unknown, all excitation patterns would be scaled to the same value ranges, regardless of the quantity of noise that is present. As a consequence, the contribution and level of the original signal to the normalized excitation pattern differs for each noise level.

2.1.5. Scaling the input.

In order to make the localization process robust to amplitude variance and to let it focus on the spatial characteristics of the excitation pattern, it is first scaled before the input is presented so that the largest magnitude in the excitation pattern becomes 1. For this, all excitation pattern values are changed according to equation (2.7).

$\begin{eqnarray}&&\text{new}\;q[n]=\frac{q[n]}{\max \mid q\mid}\end{eqnarray} \tag{ 2.7 }$

The discrete signal q is the excitation pattern at each individual sensor calculated with equation (2.1) and index n is the sensor number. This scaling causes the values of q to always be within the range [ $-1$ ,1]. This causes information about the y coordinate to be present only in the spatial scaling of the normalized excitation pattern, while information about the x coordinate is present in the location of the pattern along the sensor array.

2.2. Neural network algorithms

2.2.1. Multilayer perceptron.

The MLP used in this paper is a fully connected network (see figure 4). The input layer has a variable size (32 or 64 nodes) and the output layer has two nodes to represent the location x, y of the source. Equation (2.8) shows the way in which the node activation values are computed.

$\begin{eqnarray}&&x_{i}^{m+1}=f\left(\underset{j=1}{\overset{{{N}^{m}}}{\sum}}\,\left(w_{ij}^{m}x_{j}^{m}\right)+b_{i}^{m+1}\right)\end{eqnarray} \tag{ 2.8 }$

**Figure 4.** Visual representation of the feed forward networks used in this research (MLP and ELM). Weights are represented by arrows. The activation functions are indicated in the nodes. The grey nodes represent bias nodes with a fixed value of 1.
Download figure:
Standard image High-resolution image

Here, $x_{i}^{m}$ denotes the activation of the ith node of the mth layer, where counting starts at the input layer. $w_{ij}^{m}$ denotes the weight connecting the jth node in the mth layer to the ith node in layer $m+1$ . $b_{i}^{m}$ denotes the bias of the ith node in the mth layer. N^m is the number of nodes in the mth layer. The activation function $f(x)=tanh(x)$ is used to calculate the activation of the hidden nodes. For the output nodes, the identity function is used as activation function (see figure 4). During training, when computing the activation value of a node in the hidden layer, some noise sampled from a uniform distrubution with range $\left[-{{10}^{-3}},{{10}^{-3}}\right]$ is added, which amounts to 46 dB.

The network was trained with the incremental learning version of the backpropagation algorithm (Rumelhart et al 1985). The weights are initialized according to the normalized initialization procedure described in Goulet and Bengio (2010). The bias weights are initialized to zero.

When overfitting occurs, a network is trained to specifics of a training set instead of general features. This causes the network to perform better on the specific training set, but worse in general cases.

To determine whether overfitting occurs, during training, the Mean Squared Error (MSE) on the training and a validation set was monitored. This is the usual error that is minimized during training neural networks. The MSE for a network on a specific data set is

$\begin{eqnarray}&&\text{MSE}=\frac{1}{{{D}^{2}}M}\underset{n=1}{\overset{M}{\sum}}\,\frac{1}{N}\underset{i=1}{\overset{N}{\sum}}\,{{\left({{t}_{i}}(n)-{{o}_{i}}(n)\right)}^{2}}.\end{eqnarray} \tag{ 2.9 }$

In this equation D is half the length of the array, M is the number of samples in the data set, $N=2$ is the dimensionality of an output sample, t is the target output and o is the network output.

An additional measure for reporting is the Mean Euclidian Distance (MED):

$\begin{eqnarray}&&\text{MED}=\frac{1}{DM}\underset{n=1}{\overset{M}{\sum}}\,\sqrt{\underset{i=1}{\overset{N}{\sum}}\,{{\left({{t}_{i}}(n)-{{o}_{i}}(n)\right)}^{2}}}.\end{eqnarray} \tag{ 2.10 }$

The MED provides a more intuitive relative distance measure of error as it is defined as a fraction of the depth of field D.

2.2.2. Extreme learning machine.

Like an MLP, an ELM is a feedforward neural network (see figure 4). It differs from the MLP in that only the weights from the hidden to output layer are trained. Weights from the input to the hidden layer are initialized randomly and not altered during training (Huang et al 2006).

An ELM with a perfect performance on the training set would have a weight matrix W, that describes the weights from the hidden layer towards the output layer, that satisfies:

$\begin{eqnarray}&&WH=T.\end{eqnarray} \tag{ 2.11 }$

Here, the teacher output T is the matrix that is built up from a consecutive series of columns, in which each column consists of the correct output for all output nodes to the training pattern. Hidden layer output matrix H is built up from the activation values of the units in the hidden layer at all the time instances during the training that are the result of presenting the training data to the ELM.

Training of the ELM consists of finding a least squares solution for W from this linear system, given H and T. This is computed using:

$\begin{eqnarray}&&W=T{{H}^{\dagger}},\end{eqnarray} \tag{ 2.12 }$

where ${{H}^{\dagger}}$ is the MoorePenrose generalized inverse of matrix H.

Because determining the matrix W is very fast, ELMs can be efficiently trained in little time compared to MLPs (Huang et al 2006).

2.2.3. Echo state networks.

While experimentally determining parameter settings, ELMs with internal recurrent connections were also tested. The resulting ESN (Jaeger 2002) architecture effectively introduces memory into the neural network, which could help in predicting the current position of a source based on past detections.

Figure 5 shows the schematic representation of an ESN that consists of an input layer, a dynamic resevoir (DR) and an output layer. Each input node connects to each node in the DR and each node in the DR connects to the output layer through weights. The hidden nodes in the DR are sparsely connected with each other and themselves. This causes the DR to contain a high dimensional representation of the input.

The weight matrix representing the internal weights of the DR must be constructed so that the spectral radius, which is the largest absolute eigenvalue of the weight matrix, is smaller than 1. As a result input is echoed and dies out over time, which is called the Echo State Property. This gives the ESN short term memory.

Due to the short term memory, the output of the first few steps is often inaccurate. To wash out the effects of the initial network state, the network output of the first 50 samples is discarded during training and testing.

3. Results

To optimize the meta-parameters of the neural network models, we performed many trials with different values and selected the best parameters for the final experiments. After determining the optimal meta-parameter settings of the networks, new data sets from new trajectories were generated to train and test the networks that produced the results in section 3. The training and test sets used each contain 15000 samples. A large number of samples was chosen to make sure that enough source locations were well represented. The validation set contains 10000 samples.

Table 1 indicates the time required for training the neural network; the time it takes for the test set to be parsed and finally the calculation time for a single excitation pattern on a 2.5 GHz Intel i5 core processor.

Table 1. Computation times for network architectures on a 2.5 GHz Intel i5 core processor. The train and test sets both contain 15 000 samples (excitation profiles). The average computation time per sample for a trained network is listed as t_update.

Network	t_train	t_test (s)	t_update (ms)
$\text{ML}{{\text{P}}_{16}}$	9 h	4.6	0.3
$\text{ML}{{\text{P}}_{32}}$	9 h	4.7	0.3
$\text{EL}{{\text{M}}_{16}}$	100 s	2	0.1
$\text{EL}{{\text{M}}_{32}}$	240 s	3	0.1
$\text{ES}{{\text{N}}_{16}}$	160 s	48	3.2
$\text{ES}{{\text{N}}_{32}}$	382 s	100	6.6

3.1. Parameter settings

Below, we first show the performance of networks with the best parameter settings found in the parameter study for high signal to noise data (46dB).

3.1.1. MLPs.

For optimizing the MLPs, the varied parameters were the learning rate, number of hidden layers and layer sizes. All MLPs in the initial parameter study were trained for 300 epochs. The best performing networks for both 16 and 32 sensors were then trained with a learning rate of 0.01. They had two hidden layers of 80 and 40 nodes respectively. For these optimal parameter settings, new networks were trained for 3000 epochs.

3.1.2. ELMs and ESNs.

The best performing and thus chosen hidden layer sizes were 5000 and 7000 nodes for ELMs with 16 and 32 sensors respectively. The performance of the ESN architecture increased when the recurrent connection weights were chosen smaller. Since the performance was best in a neural network architectures without recurrent connections, the best performing ESNs are effectively ELMs, and only ELMs are discussed for high signal to noise input.

In order to ascertain and compare the noise robustness of the ESN architecture, ESNs with DR sizes of 5000 for 16 inputs and 7000 nodes for 32 inputs were also tested with a spectral radius of $0.1$ .

3.2. Overall performance on high signal to noise input

To investigate the differences in performance between the different network architectures and different array sizes, 5 ELMs and 5 MLPs for both 16 and 32 input sizes, indicated with a subscript, with the best performing parameter settings were trained and tested with new data sets.

The random initialization of the weights of the networks resulted in different networks per combination of network type and input size. Using multiple networks per combination of input size and network type, maps this variety to the output error per individually generated and tested network.

The boxplots in figure 6 show the MSE distribution per network type on respectively the x and y coordinates as well as the average MSE. The network instances with the lowest average error are listed in table 2. For comparison, when forcing a spectral radius of 0.1 on an ESN, the MED localization error compared to the ELM approximately doubled to $0.71 \% D$ .

Table 2. Best performing localization results for the MLP and ELM and their respective MEDs.

Network	MSE	MED
$\text{ML}{{\text{P}}_{32}}$	$5.23\ast {{10}^{-5}}$	$0.68 \% D$
$\text{EL}{{\text{M}}_{32}}$	$3.66\ast {{10}^{-5}}$	$0.41 \% D$

Figure 7 gives an indication of source localization performance of the best performing ELM and MLP under high signal to noise conditions.

**Figure 7.** Actual source location and absolute Euclidian error (as a fraction of D) on a part of the test set of individual networks of the $\text{EL}{{\text{M}}_{32}}$ and $\text{ML}{{\text{P}}_{32}}$ sets for x and y coordinate. To enable a clear visual comparison of the neural network errors, the absolute errors of the MLP and ELM are shown in mirrored y-axes. The peak in error around sample 1700 coincides with the source being located in a corner. The MLP is less affected by this corner effect than its ELM counterpart. The ELM error is however generally lower.
Download figure:
Standard image High-resolution image

In figure 8, the effect of the source location on the MED performance is separately shown for both the x and y coordinate.

3.3. MLP overfitting

For all $\text{ML}{{\text{P}}_{16}}$ and $\text{ML}{{\text{P}}_{32}}$ networks, the MSEs on the training set were compared with the MSEs on the validation set for every epoch during training. An example of such a comparison can be seen in figure 9. For no MLP trained and tested with the new data set, the MSE of the validation set increased continuously, or to a stable value at any point in training. Therefore, we conclude that overfitting does not occur in these networks.

**Figure 9.** An example of the MSE on the validation set and on the training set for the best performing $\text{ML}{{\text{P}}_{32}}$ .
Download figure:
Standard image High-resolution image

3.4. Noise robustness

The MSE of all three network architectures was found for different SNRs. The SNRs were chosen to lie in the interval ( $-6$ dB, 46 dB). The results are plotted in figure 10(a).

Figures 10(b) and (c) show the noise robustness separately for respectively the x and y coordinates of the source object.

For comparison, figure 10(d) shows the noise robustness of ELMs and ESNs with a smaller hidden layer size than optimal (see section 4.2).

4. Discussion

4.1. Practical implementation neural networks

From the results shown in figure 7 it can be concluded that neural networks are able to reliably extract information about the location of a moving source in a field with a depth D equal to half the sensor array length with an accuracy of the order of percents of D, given the simulated dipole fields.

On average, network architectures with 32 sensors outperformed those with 16 sensors, as seen in figure 6. The limited variance of performance within the network types suggests that the networks don't suffer from local minima.

Depending on the computing power and response requirements, trained networks may be used under real time conditions. On our system, in combination with either an ELM, MLP, or ESN we have typical response times of 0.2, 0.3, or 6 ms respectively for updating a source location.

The consequence of having a finite update time is that estimates on source location can only be generated with a delay t_update, which results in the source having moved a distance $\Delta d$ , equal to $\Delta d=V\ast {{t}_{\text{update}}}$ , with respect to the position it was during the array velocity measurement. It is useful to compare this distance $\Delta d$ , with the spatial accuracy of the sources location. If we accept a similar inaccuracy of distance $\Delta d$ , as the inaccuracy produced by the neural network, given by the MED (i.e. $\Delta d=$ MED $\ast D$ ) we arrive at an upper bound of velocities ( ${{V}_{\max}}$ ) which may reliably be detected:

$\begin{eqnarray}&&{{V}_{\text{max}}}=\frac{\text{MED}\ast D}{{{t}_{\text{update}}}}.\end{eqnarray} \tag{ 4.1 }$

Clearly this velocity is proportional to array length (2D) and inversely proportional to the update time.

For instance, with a MED of 0.05 (at least feasible with an ELM and a SNR velocity input of 22 dB and considering ${{t}_{\text{update}}}=0.2$ ms with our computer platform) we arrive at an upper bound for the velocity equal to ${{V}_{\max}}=250D/s$ . This example indicates that a neural network under realistic conditions will allow the processing of source velocities two orders of magnitude higher than the array length per second, which entails that most likely other factors than the update speed will be limiting in the detection of a source's position.

4.2. Overall performance on x and y coordinates

Given that x and y information of a source is encoded differently in the excitation pattern (Ćurčić-Blake and van Netten 2006), performance differences in the detection of the x and y coordinate can be expected. In all cases, in noiseless and noisy conditions (see figure 6), detection of the source distance y is more reliably determined than its lateral position x.

The MED error is dependent on the x and y coordinate of the sphere, as can be seen in figures 8(a)–(d). The error is larger at the left and right edges of the $2D\times D$ field and relatively high when the source is close to the array. These two effects can be explained by the following observations.

The shape of the wavelets as shown in figure 2, and thus wavelet ψ, is only partially detected when the source moves close to either side of the field, because the center of ψ is equal to the x coordinate of the sphere. The partially observed wavelets lead to less input information and thus leads to a decrease in performance. Additionally, given the way of sampling training data using trajectories through the sampling space, locations at the borders are slightly underrepresented in train, test and validation data sets.

As can be seen in figures 8(a)–(d), the MED increases significantly when the sphere moves close to the sensor array. This result might be partially explained by the distance between the sensors. In (Ćurčić-Blake and van Netten (2006), p 11) it is shown that a source should not be closer than approximately twice the inter-sensor distance to be detected, since the sampling of spatial details in the excitation profile becomes inadequate. For a lateral line with length $2D$ , this minimum distance is $\frac{2D}{16}\ast 2=0.25D$ for an array of 16 sensors and $\frac{2D}{32}\ast 2=0.125D$ for an array of 32 sensors. Figures 8(a)–(d) do indeed show decreased accuracy when the y coordinate becomes smaller than these values.

4.3. Noise robustness

The MLPs show better noise robustness than the ELMs and ESNs. This is at least partially due to the fact that there are less weights present in MLPs. Figure 10(d) shows that a smaller hidden layer size results in better noise robustness for ELMs and ESNs comparable to MLPs, at the cost of lost precision for high signal to noise data. Furthermore, networks with 32 sensors are slightly more robust to noise than networks with 16 sensors.

5. Conclusion

The current work is one of the very first attempts to investigate the optimal parameter settings for multiple neural network architectures for hydrodynamic imaging using a stationary artificial lateral line. A smaller version of an MLP was previously tested for a sensor array consisting of 6 sensors (Abdulsadda and Tan 2012). Under similar noise levels, but a smaller area, our networks yields a comparable result with an average MED of 2% of D.

5.1. Practical implementation neural networks

The practical implementation of the neural networks investigated here, when processing flow data measured with an array of velocity sensitive flow detectors, may follow various scenarios. Depending on the particular choice of flow detection of the individual sensors and specifically their signal-to-noise performance using the present results, the optimally performing network may be selected.

To utilize the predictive nature of previously perceived excitation patterns, we explored the use of ESNs containing recurrent connections which outperformed otherwise identical ELMs in noisy conditions. This indicates that memory might help in predicting source locations in simple architectures. Nevertheless, the ESN is outperformed by the MLP, which has no form of (short term) memory. Other types of neural network architectures with recurrent connections may also be helpful in this respect. The network input could for example be extended with excitation patterns of previous time steps or the optional recurrent connections from the output layer to the input layer could be added.

All considered neural network architectures, once trained on simulated data, may respond very fast to velocity input signals of the array and perform localization in real time on standard hardware. In case of the feed forward architectures this is at a rate of several thousand Hz and in case of the ESN several hundred Hz.

5.2. Further research on single source localization

A specific task that was not trained in the networks investigated so far, is to also estimate the angle ϕ at each time step. The information on ϕ is present in the excitation pattern and provides instantaneous input for estimating the next position.

Extending the input layer with nodes of which the activation is determined by something else than the measured water velocities might also improve the network performance. These indicator nodes can hold information about e.g. the width of the wavelet or the location of minima and maxima in the excitation pattern.

Therefore, other neural networks than considered here also may be considered as alternatives for source localization.

In this research, the network output only gives the 2D location of a source, because it only uses water velocities in the direction parallel to the simulated lateral line. If the excitation pattern along an additional orthogonal lateral line is added to the input, the output can be extended to the location of a moving source in a 3D volume. This orthogonality of neuromast arrays can also be found on the heads and sometimes along the sides of fish (Coombs et al 1988). On the heads, canals with neuromasts and arrays of superficial neuromasts with different angles to each other are present. Also, fish approach behaviour (Coombs and Conley 1997) suggests that fish tend to zig–zag towards a source to sense orthogonal projections of flow.

This work cannot readily be used for detecting multiple sources, although it can be altered in at least two ways that might make multiple source localization possible. Firstly, a network for every plausible amount of moving sources could be trained. A problem of this approach is that an accurate criterion for which network output should be used is needed. Secondly, instead of producing the coordinates of the source, the networks could be trained to output a 2D grid in which only nodes that represent locations where a moving source is present have large activation values.

A common additional limitation for source localization algorithms, inluding these neural networks, is that they assume a stationary artificial lateral line for monitoring sources, rather than a lateral line mounted on a moving body. This latter case requires a different approach since the viscous boundary layer plays an important role (Windsor and McHenry 2009) and may affect the locally perceived fluid flow (DeVries et al 2015).

We expect that the use of neural networks will be greatly extended, especially in parallel with the development of alternative and more extended biomimetic velocity sensitive arrays than those reported so far.

Acknowledgments

This research has been partly supported by the Lakhsmi project (BJW, SMvN) that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 635568.

Performance of neural networks for localizing moving objects with an artificial lateral line

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. Data generation

2.1.1. Computing water velocities.

2.1.2. Paths of sphere movement in the data set.

2.1.3. Splitting water velocities into rectified inputs.

2.1.4. Adding noise.

2.1.5. Scaling the input.

2.2. Neural network algorithms

2.2.1. Multilayer perceptron.

2.2.2. Extreme learning machine.

2.2.3. Echo state networks.

3. Results

3.1. Parameter settings

3.1.1. MLPs.

3.1.2. ELMs and ESNs.

3.2. Overall performance on high signal to noise input

3.3. MLP overfitting

3.4. Noise robustness

4. Discussion

4.1. Practical implementation neural networks

4.2. Overall performance on x and y coordinates

4.3. Noise robustness

5. Conclusion

5.1. Practical implementation neural networks

5.2. Further research on single source localization

Acknowledgments

Performance of neural networks for localizing moving objects with an artificial lateral line

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. Data generation

2.1.1. Computing water velocities.

2.1.2. Paths of sphere movement in the data set.

2.1.3. Splitting water velocities into rectified inputs.

2.1.4. Adding noise.

2.1.5. Scaling the input.

2.2. Neural network algorithms

2.2.1. Multilayer perceptron.

2.2.2. Extreme learning machine.

2.2.3. Echo state networks.

3. Results

3.1. Parameter settings

3.1.1. MLPs.

3.1.2. ELMs and ESNs.

3.2. Overall performance on high signal to noise input

3.3. MLP overfitting

3.4. Noise robustness

4. Discussion

4.1. Practical implementation neural networks

4.2. Overall performance on x and y coordinates

4.3. Noise robustness

5. Conclusion

5.1. Practical implementation neural networks

5.2. Further research on single source localization

Acknowledgments