Paper The following article is Open access

Classifying continuous, real-time e-nose sensor data using a bio-inspired spiking network modelled on the insect olfactory system

, , , and

Published 18 February 2016 © 2016 IOP Publishing Ltd
, , Citation A Diamond et al 2016 Bioinspir. Biomim. 11 026002 DOI 10.1088/1748-3190/11/2/026002

1748-3190/11/2/026002

Abstract

In many application domains, conventional e-noses are frequently outperformed in both speed and accuracy by their biological counterparts. Exploring potential bio-inspired improvements, we note a number of neuronal network models have demonstrated some success in classifying static datasets by abstracting the insect olfactory system. However, these designs remain largely unproven in practical settings, where sensor data is real-time, continuous, potentially noisy, lacks a precise onset signal and accurate classification requires the inclusion of temporal aspects into the feature set. This investigation therefore seeks to inform and develop the potential and suitability of biomimetic classifiers for use with typical real-world sensor data. Taking a generic classifier design inspired by the inhibition and competition in the insect antennal lobe, we apply it to identifying 20 individual chemical odours from the timeseries of responses of metal oxide sensors. We show that four out of twelve available sensors and the first 30 s (10%) of the sensors' continuous response are sufficient to deliver 92% accurate classification without access to an odour onset signal. In contrast to previous approaches, once training is complete, sensor signals can be fed continuously into the classifier without requiring discretization. We conclude that for continuous data there may be a conceptual advantage in using spiking networks, in particular where time is an essential component of computation. Classification was achieved in real time using a GPU-accelerated spiking neural network simulator developed in our group.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

1.1. Background

Chemosensing electronic nose ('e-nose') technology has great potential for applications in everyday life, ranging from drug detection to food quality assessment [1] and even diagnosis of illness [2, 3]. A typical e-nose device might employ doped tin oxide sensors that generate time series of resistance values that change selectively in the presence of certain chemicals. Typically, pre-processing and encoding methods are applied to convert the collected time series into discrete numerical representations of each odour that are amenable to separation by classical classifier algorithms such as k-means clustering or support vector machines (SVMs) [4]. However, e-nose devices have had limited success in some applications because odour detection and classification comprise a highly challenging domain characterised by high dimensionality, unknown organisation of the vast 'odour space' of all volatile chemicals and complex dynamics of odour plumes [5]. To compound these difficulties, current sensor technology continues to exhibit distinct shortcomings in speed, sensitivity, selectivity, recovery, and drift avoidance [3]. As a result, e-nose devices remain substantially outperformed in both speed and accuracy by biological noses, both mammalian and insect. This is particularly true in realistic field settings outside the controlled environment of the laboratory. Whilst development of novel bio-based sensors has begun to show promise [68] we focus here on another challenge, the classification of continuous incoming olfactory data, leveraging the intrinsic temporal nature of spiking network approaches. In the work presented here, we use a benchmark data set of 20 chemicals measured by 12 metal oxide e-nose sensors [4] to investigate an odour classifier employing a spiking network design abstracted from the insect olfactory system.

1.2. The insect olfactory system

Olfactory regions in the insect CNS (figure 1(a)) follow a common organisation, comprising three distinct stages of sensing, transformation and association (classifying) [9]. At the sensing stage, neural activity (spiking) is triggered in olfactory receptor neurons (ORNs) located along the antennae. Each ORN expresses one type of olfactory receptor (OR) protein which binds a different and specific range of ligands that may appear in a variety of airborne chemicals [5, 9, 10]. The net effect is that each odour will trigger a fast, widespread spiking response across the set of ORNs, with varying intensity in each ORN type, depending on the odour's chemical constituents. It is likely that this distribution of response profiles has been tuned through evolution to best aid discrimination between odours.

Figure 1.

Figure 1. (a) Schematic of the insect antennal lobe (AL). Olfactory receptor neurons (ORN) in the antennae innervate 'glomeruli' clusters of projection neurons (PN) in the AL. Local inhibitory neurons (LN) trigger competition between glomeruli (two illustrated) causing odour-modulated activation patterns of spiking to be passed to the mushroom body (MB) for association. (b) Conceptual bio-inspired classifier design abstracted from the insect olfactory system. Virtual receptors (VRs) distributed across input space simulate the wide-field response of the olfactory receptors (see text). Higher regions abstracted as association neuron clusters representing the classes present in the dataset. (c) Classifier model implemented as three populations of neurons on a GPU based spiking simulation. Clustering is implemented using appropriate connectivity matrices to demarcate divisions within a population. The action of LN neurons is abstracted out as inhibitory connections between clusters. Weight plasticity is actioned on the workstation.

Standard image High-resolution image

The transformation stage takes place in the antennal lobe (AL) structure where axons from the ORNs synapse onto a limited set (e.g. 43 in Drosophila) of spatially distinct sites of high synaptic interaction called glomeruli [11, 12]. ORNs that express the same type of receptor [9] provide excitatory input to subpopulations of projection neurons (PNs) and local interneurons (LNs) within a given glomerulus. PNs generate the output from the AL to higher brain centres. LNs, in contrast, inhibit PNs in other glomeruli in the AL, shaping patterns of glomerular activation as they compete. Whilst the exact coding scheme enacted by this competition remains contentious [1315] it is evident that the broadly tuned ORN responses are modulated by this lateral inhibition to generate a sparser and more granular encoding intended to enable separation of even closely related odours.

Activity from the AL is projected to higher brain centres like the mushroom body (MB) and the lateral horn, where classification and multisensory integration take place [16]. Here, the activation patterns of the glomeruli are ultimately associated with an output that is relevant (useful) to the insect, for example the presence of a foodstuff.

1.3. Bio-inspired and biomimetic spiking neural classifiers

A number of neuronal network classifiers based on the insect olfactory system have been developed and have demonstrated some success in classifying static datasets comprising either standard benchmark sets [14, 17] or artificial data [18]. However, these models remain largely unproven in a practical setting where input sensor data is real-time, continuous, potentially noisy, lacks a precise onset signal and accurate classification requires the inclusion of temporal aspects into the feature set. This investigation seeks to inform and develop the potential and suitability of biomimetic classifiers for use with typical real-world e-nose sensor data.

2. Model design

We base our spiking neuronal network model design on the approach of Schmuker et al [14] which offers a straightforward and re-useable abstraction of some key features, in particular of the AL, whilst having a demonstrated ability to classify several standard datasets. Although not claiming a high degree of biorealism, our model (figure 1(b)) comprises the same three stages described above for the insect CNS: sensing, transformation and association.

The general aim behind this design is to provide a generic spiking network model for multivariate pattern classification. Since the model operates in the firing rate domain, it can only deal with non-negative, bounded input variables. The first challenge in the model design therefore is to transform, in a generic way, real-valued data into non-negative, bounded representation. This is achieved in the 'sensing' stage using 'virtual receptors' (VRs), in analogy to ORs in insects. These VRs operate like a radial-basis function encoding of the multivariate data set. The 'receptive field' of a VR is defined by an exponentially decaying function (see methods). In order to eliminate 'gaps' in data space that are not covered by any VR, we chose these receptive fields to be very wide.

The centroids of the VR receptive fields are placed in data space by unsupervised learning with a self-organising process (a neural gas [19]). This is necessary to ensure that the VR centroids are located in proximity to the data they should encode, which is particularly important when dealing with high-dimensional data. Note that due to the purely unsupervised nature of this process, no class information of the data is used in this stage. This process mimics the evolution of insect odorant receptors over time, since they evolve to cover the wide space of odorants that are relevant for the organism's survival.

A subpopulation ('cluster') of receptor neurons (RNs) is assigned to each VR, which respond collectively to a given input by producing a net spiking rate modulated by the proximity of the receptor point to the input. The net result is an abstraction of the insect OR/ORN architecture, namely a limited number of overlapping wide-field receptors with response profiles distributed over the high-dimensional input space.

For the transformation stage of our model, each RN cluster is set to excite a corresponding PN cluster—a 'glomerulus'. Mutual inhibition in a soft winner-take-all (WTA) configuration is then added between these glomeruli. This acts to suppress excitation invoked by more distant VRs, focusing activity onto the closest VRs, whilst emphasising their response differences. The result is a glomerular (i.e. PN cluster) level activation pattern that is amenable to an associative learning rule.

The final association stage of the model heavily abstracts the role of the insect MB and higher brain areas. One cluster of association neurons (AN) is assigned to represent each output class in the data set—the identity of the 20 test chemicals in our case. Spiking activity in these clusters will indicate the classifier's decision, by means of the total spike count over a defined period. Dense plastic connectivity is added between the PN and AN layers and a simple reinforcement-style learning rule is applied while the training dataset is presented. The 'reward' for a correct classification decision comprises the strengthening of synapses linking spiking PN neurons to correctly spiking ANs, whilst conversely, 'punishment' for an incorrect decision weakens synapses between involved active PNs and ANs. The result is an association between activation of particular glomeruli and the activation of a corresponding output class cluster. Finally, to create a clear final 'decision' in the AN layer, a WTA inhibition is applied between the AN clusters.

Figure 1(c) illustrates our implementation of the conceptual model in a fast GPU-based, based spiking model simulator. Note that complete details of the model and learning implementation are provided in the methods and materials section.

3. E-nose recordings

For training and test data we used recordings of the responses to 20 chemical compounds of a combination of six classical doped tin oxide and six novel chromium titanium oxide (five zeolite-coated) sensors, obtained under laboratory conditions using a FOX e-nose [4]. The tested compounds were taken from four chemical groups: alcohols, aldehydes, esters and ketones, with five chemicals per group. Each compound was prepared and recorded ten times as previously described [4] (figure 2(a)). The task we set the neural classifier was the identification of the 20 individual compounds based on the (300 s, 2 Hz) timeseries response of up to 12 sensors at once. Furthermore, given the impractical slowness of the full 5 min response curve, a secondary goal was to investigate the potential to bring the decision point significantly forward in time without a large cost in accuracy.

Figure 2.

Figure 2. (a) Representative example of timeseries response of 12 metal oxide sensors mounted in a Fox e-nose upon controlled laboratory exposure to one of 20 chemical classes (see text for details). The y-axis indicates the absolute value of the change in the relative resistance of the sensor, Δr/r0 where r0 is the baseline resistance. (b) Defining a 16 point (dot markers) temporal signature of one response to a chemical exposure using four samples (at 9, 16, 23, 30 s) on a subset of four selected sensors. (c) Using a delay line to provide three delayed versions (dashed traces) of the live input (solid trace) allows all 16 data points to be captured in each timestep (four per sensor). This approach is used to enable classification using continuous sensor input with no onset signal. The figure illustrates this principle acting on one sensor to capture four samples.

Standard image High-resolution image

3.1. Encoding for classification

Previous classifying work with this data set [4] has shown that downsampling the data but retaining temporal features in the encoding process produces compact representations of the data that allow successful recognition of odours by a linear SVM classifier. This approach also out-performed a range of non-temporal encodings [4]. For example, creating a single 16-dimensional vector representation by sampling a response at only four time points on just 4 sensors (figure 2(b)) was sufficient to classify the dataset to more than 99% accuracy using the SVM. Furthermore, restricting the sample times to the early stages of response—for example the first 30 s out of 300 s total—caused very little reduction in performance [4]. Based on these results, we chose a similar encoding approach to create a static multivariate representation as input to our own bio-inspired neural model, namely, four sensors, sampled at four time points from odour onset (figure 2(b)).

We generated our set of VR points to represent the 16-dimensional input space by using a self-organising 'neural gas' process [19] against the complete training dataset. Note that this is unsupervised, i.e. makes no use of class labelling. A representative example of the mapping results is provided in figure 3(a).

Figure 3.

Figure 3. (a) The distribution of a training set of 180 e-nose recordings after encoding each as a single 16-dimensional vector, plotted using the two principal components of the set. Matching markers imply recordings of the same class. Red crosses denote the locations of 43 virtual receptor (VR) points selected by the self-organising process. The red and green circles show the loci of the receptive field around a VR at 0.5 and 0.2 attenuation. (b) The response function (0,1] of the VR receptors with distance (adjusted for the two-dimensions).

Standard image High-resolution image

3.2. Classifying continuous data without onset timing

One of the practical problems introduced by this timed-sample encoding strategy is that, in the field, an e-nose device will rarely have access to accurate odour onset timing. The requirement for acquiring precisely timed temporal features from data that is continuously varying without timing cues therefore appears particularly problematic.

However, by adding delay lines to the input, it becomes possible to access time-shifted versions of the input signal which correspond to the required intervals between the chosen sample points (figure 2(c)). When the input signals are read the delayed signals are also accessed from the delay line, generating a compound input vector comprising both current and past samples (figure 2(c)). If the employed delays used are the same, then at some point in time, as the sensors' response unfolds, this compound vector will reproduce the same timed sample set used to train the classifier.

This approach is not, we suggest, a purely engineering convenience that diverges from the bio-inspired basis, as a conceptually similar approach to capturing temporal features has also been proposed to form part of the insect strategy for odour classification [20]. For example, the axonal 'delay-line' effect introduced by the spaced layout of the receptor array in the Acheta Domesticus cricket has been shown to modulate the neural coding of odours [20]. Although these delays are clearly far shorter than the sample intervals employed here, it should be noted that the sensor response of insects are orders of magnitude faster than our metal oxide sensors and would indeed require very fast sampling to implement temporal feature encoding.

Although this delay-line strategy can address the sample timing issue with continuous data, there is also the problem of knowing when to appropriately begin and end evaluation of spiking activity at the output. This scenario can clearly occur in a detector or alarm deployment where sensors are left open indefinitely, awaiting the presence of a target odour. To address this, we tested a simple spiking activity-based algorithm, which identifies sufficiently active periods and demarcates appropriate evaluation time windows over which a comparative spike count can be made (figure 4). This allows the classifier to function without any use or knowledge of a global or recording-based time variable.

Figure 4.

Figure 4. Generating an odour onset time-independent evaluation window for classification. The avg. spiking rates of the output class clusters are tracked. When the most active cluster crosses a trigger rate threshold σT (red dot) an evaluation window (grey shaded region) is generated from time t1 to t2, the times where the spike rate crossed the value σT/k (blue dots), denoting a fixed fraction of the trigger rate. In the presented results k was set at 2 with σT = 80 Hz. This dual level approach addresses the issues of (i) too narrow a window if σT is set too high or (ii) premature triggering if σT is set too low.

Standard image High-resolution image

4. Results

4.1. Classifier training

The classifier was trained and tested using a stratified ten-fold cross validation strategy. On each of ten runs, one (different) member of every class is allocated to the test set. The remaining 180 observations (recordings) are randomly drawn for presentation from each class in turn, with random class ordering per run.

The response to the temporal signature (static representation) of the selected recording is applied as input to the Poisson 'receptor' neurons and, with plasticity enabled, the spiking network simulation is run for a (simulated) 500 ms using an integration timestep of 0.5 ms.

Exposure to the training set is repeated a fixed N times until performance on the training set has converged, without straying into overfitting (performance on test set drops). The value of N was set during initial investigations—see figure 6(d).

4.2. Performance without a widefield receptor correlate

Without using a correlate of the widefield receptor response, the AL model acting directly on a stratified 20 class dataset (encoded as RN spiking activity driven by the normalised response per sensor) was able to classify a test dataset to an average 39% (s.d. 2.7%) across ten-fold cross-validation. This suggests that the classifier is functional but that the dataset comprises a non-trivial classification problem that the relatively simple strategy enacted by the AL structure alone does not solve easily.

4.3. Performance using VRs as a widefield receptor correlate

The input was switched to the proximity response to a set of VR points in input space generated independently from each training dataset in the cross validation. All further results were elucidated employing this two stage model.

4.4. Inhibition, competition and learning

Figures 5(a) and (b) provides a representative example of the spiking activity and cluster interaction of the classifier when presented for 500 ms with the static encoding of an unseen sample of isopentylacetate (odour class 13), after training is completed. The pattern of spiking activity in the RN clusters demonstrates how an input elicits a wide ranging response encompassing a large proportion of the VR receptors. The 2D projection of the response (figure 5(c)) illustrates how the response is proportional to the proximity to the input. The activity in the PN clusters demonstrates the action of the inter-cluster mutual inhibition to reduce the activity to the closest VRs whilst emphasising the difference between these. Finally, the AN cluster activity illustrates how the correct class cluster has been associated, via the appropriate strengthening of PN–AN connections during training, with the VRs responding most to samples from this class. The lack of activity in rival AN class clusters also reflects the effect of the WTA inhibition in this layer.

Figure 5.

Figure 5. Spiking and cluster interaction in the network when presented with an unseen sample of isopentylacetate (Class 13) after training is completed. (a) Raster plot showing the spiking behaviour across the RN, PN and AN layers when sample is presented for 500 ms. The banding shows spiking activity within clusters of 60 neurons. The RN and PN clusters correlate with VR activity before and after mutual inhibition. The AN clusters correspond to the odour class selected by the classifier. The spiking indicated with arrows are artifacts resulting from the preceding recording presentation (the network is not reset, input VR response data is simply replaced between timesteps with the new set before simulation continues). (b) Bar graph comparing the spikes counted within each cluster during the sample presentation. (c) 'Heatmap' of response in 2D space. The coloured squares indicate the cluster spiking activity in PN (above) and RN(below) illustrating the VR response with distance from the input (green cross) when projected into two-dimensional space via PCA. See figure 3 for the full 2D projection map of input data and VR points.

Standard image High-resolution image

4.5. Classifier test scenarios

After training, the classifier performance was tested in three scenarios of increasing difficulty. Firstly, plasticity is disabled and, as with the training, the static representations of the test set recordings are simply applied to the classifier in turn for 500 ms. The classifier's winner decision for each recording is awarded to the AN class cluster with the maximum spike count over the 500 ms presentation. This simpler scenario was used primarily for parameter tuning and to characterise the classifier performance.

In the second scenario, performance is tested by replaying the continuous timeseries recording data from the e-nose. The sensors' response was originally captured at 2 Hz (i.e. 500 ms between samples) totalling 600 samples over a 300 s recording. The recording is played into a software delay line to generate three additional delayed versions at 7 s intervals. From these four inputs sampled together the VR response is obtained and applied to the spiking network input for 500 ms. The winner is awarded to the most active class cluster over the whole 300 s presentation.

Finally, performance is tested by adhering to a more realistic 'detector' scenario where there are no preset discrete periods to evaluate over and a positive odour detection (i.e. classifying decision taken) is decided solely on the basis of sufficient activity in the network taking place.

To achieve this, the current average spiking rate of each cluster is calculated on-the-fly and an algorithm is applied to trigger the start and end of an appropriate 'evaluation window' in the time domain (figure 4). The winner is evaluated from the most active class cluster within this window.

4.6. Characterising performance

Figure 6 characterises the performance of the classifier in the initial testing scenario. Three important configuration settings are explored which impact the resource and processing load required by the classifier. Firstly, the number of VRs employed (figure 6(a)) affects the resolution of the input space mapping but each one adds two extra neuron clusters and associated synapses to the model. Results suggest that, for this data set distribution and self-organising algorithm combination, little performance is gained beyond 45 VRs.

Figure 6.

Figure 6. Performance of classifier under a range of configurations. For all graphs, performance is shown as the correctly classified percentage of the test set (20 observations) of a ten-fold stratified cross-validation. The error bars show the standard deviation across the ten repeats. The orange dashed line shows a reference performance achieved with a support vector machine (see [4]) using the same four sensor—four sample encoding scheme (sample times 20, 40, 80, 100 s). (a) Performance change as number of VRs mapping input space increases (cluster size 60, sample window 30 s). (b) Performance variation as the number of neurons per cluster is increased (43 VRs, sample window 30 s). (c) Performance change as the time interval between four sample points is increased. The first sample is taken at 10 s and the last (4th) is taken at the time shown on the x-axis (43 VRs, cluster size 60). (d) Performance improvement when the classifier is re-exposed to the training set a 2nd, 3rd,.., nth time.

Standard image High-resolution image

Secondly, the use of probability-based rate coding of the input means that the cluster size (figure 7(b)) affects the reliability and consistency of the mutual inhibition behaviour that is an important part of the classifier strategy. This comes about through issues such as synchrony artifacts caused by the resolution of the integration timestep (see [14] for a detailed discussion). Results suggest that 30 Poisson neurons is a minimum cluster size and that 50–60 neurons can improve performance further by a few percentage (90 up to 94.55%).

Thirdly, it is clearly advantageous in many situations for an e-nose device to make an identification as soon as possible. Whilst the sensor recordings show that their response does not settle for at least 2 min after onset, results suggest (figure 6(c)) that taking four samples in the period from 10 to 30 s is sufficient for a strong classification performance and relatively little improvement is available by widening this time window.

Finally, performance can be improved by re-exposing the training set a number of times as the association can continue to be refined by the learning rule. However, more repetitions slow the learning process and more importantly, risk over-fitting to the training data over the test data. Results suggest (figure 6(d)) that little performance is gained beyond three repetitions, and that over-fitting sets in at six repeats.

Overall, we conclude that, drawing upon only the first 30 s of this 200 observation, 20-class data set, the described classifier can perform creditably whilst still lagging a SVM (we achieved 94.55% avg. accuracy, 10 × 10-fold stratified cross-validation, against 99% for SVM) when using 43 VRs, a cluster size of 60 neurons and four exposures to the training data.

Considering the question of why our classifier is outperformed by an SVM approach, we examine the specifics of the error cases. We find that 50% of errors are accounted for by the three most common class pairs that are confused when using 30, 43, or 50 VRs (Z2-hexen-1-ol and furfaral, acetone and 2-heptanone, pentanol and hexanal). These can all be shown to involve overlapping or closely positioned clusters in input space (see the figure 4(a) PCA plot for a visualisation).

4.7. Classifying continuous recordings

Figure 7 provides a representative example illustrating the spiking activity in the network when presented with the first 40 s of a 300 s previously unseen continuous timeseries recording of the sensors' responses. As discussed in the methods, the network response is elicited after training has been completed using 500 ms presentations of the static downsampled encodings of the training set, using four samples of four sensors taken at 9, 16, 23 and 30 s after odour onset, (i.e. with a 7 s inter-sample interval).

Figure 7.

Figure 7. With training completed on static encodings, the figure shows the first 40 s of spiking activity in the network when the delay line input is presented in real time with an unseen example of the original 300 s continuous timeseries recording of the sensors' responses, sampled at 2 Hz. (a) Raster plot showing the spiking behaviour across the RN, PN and AN layers. The banding shows spiking activity within clusters of 60 neurons. The RN and PN clusters correlate with VR activity before and after mutual inhibition. The AN clusters correspond to the odour class selected by the classifier. The green band highlights the region where the four sample times used to encode the recording (and subsequently used for training) will align. (b) Heatmap showing the same spiking data as (a) using colour to convey more accurately the relative intensity of cluster spiking within each 500 ms sample presentation. The red square indicates the evaluation window chosen by an algorithm (see figure 4) to identify a zone wherein an odour response is detected and classification made. (c)–(f) Four further representative heatmap examples focusing on the important 30 s mark (grey vertical line). These show how the evaluation window (red square) is shifted and stretched by the algorithm to identify a region across which the correct class should be the most active, even under conditions where the most active cluster is varying over time in both the AN and PN layers. Note that (d) illustrates one example where the algorithm is incorrect.

Standard image High-resolution image

Following training, an unseen continuous recording is tested as input, using a delay line to recreate the sample intervals used in training, and the 4 × 4 result is played as input into the classifier network. From figure 7 it can be clearly observed that the response of the network builds to a peak at the 30 s mark before dying away. This peak corresponds, as expected, to the point where the four input copies should align to the last of the four sample times (30 s) used as the input space for the distribution of VRs and where training of VR-to-class association has taken place. In particular, it can be noted from the heatmap (figure 7(b)) the intensity of response in the AN layer indicating a dominant class cluster. If the 'winning' class is inferred from the AN cluster spike counts over the full available 300 s presentation, this results in a 92% (std.dev. 0.68% across 10 × 10-fold cross-validation assay) accurate classification of the test set.

Disallowing use of these discrete 300 s windows for spike counting make for a more realistic scenario without onset information. In this scenario, windows of high activity from the network are used to infer that a valid classification is being made. The red square in figure 7(b) indicates the evaluation window chosen by the algorithm described in figure 4 and demonstrates the identification of a zone of interest without use of absolute time or onset information. If the 'winning' class is now inferred by the AN cluster spike counts within these evaluation windows, this results in the same 92% classification of the test set (std.dev. 0.56% across 10 × 10-fold cross-validation assay). This result compares favourably with 94.55% achieved using static encoding of the test set, which requires precise onset timing as a pre-requisite.

4.8. Generalisation of classifier design

A final test to validate any classifier and discount effects of inadvertent overfitting due to multiple re-use of data when developing the model is to expose it to an entirely unseen data set and to report the results with no further parameter modifications beyond retraining with the new training data set [21]. Applying the classifier to a separate but directly comparable data set, which was produced one year later using the same procedures resulted in 92.65% (0.74% std.dev., 10 × 10-fold crossvalidation) accuracy for static encoding and 89% (0.61% std.dev., 10 × 10-fold crossvalidation) when continuous input was supplied and the evaluation window was applied. The small but significant drop in the mean performance (confirmed by t-test) suggests that, within bounds of expectation, the classifier generalises well to unseen data.

5. Discussion and conclusion

We used a combination of the bio-inspired concept of wide-field VRs and a spiking network model inspired by the architecture of the insect olfactory system to determine the presence and identity of odorants from e-nose recordings. The classifier design comprising 43 VRs (matching the number of chemical receptor types in Drosophila) and a 6000 spiking neuron, bio-inspired abstracted model of competition and inhibition in the insect AL can deliver 92% accurate classification of 20 different chemicals employing only 30 s of e-nose sensors' continuous responses and without access to an odour onset signal. The observed performance is based on both bio-inspired pre-processing and the spiking neural network classifier. When we used the spiking neural network directly on normalised e-nose measurements, performance dropped to 39%. But it has also been shown previously, that VR filtering paired with a minimal classifier (naïve Bayes) performed less well than with our spiking classifier [14]. In the implementation presented here, VR positions are still calculated 'offline', using conventional computing to map out the data space. Whilst this arguably provides a correlate of the evolution of static biological receptor responses, an e-nose based deployment using a complete neuromorphic-based implementation would require a spiking network that is also able to learn prototypes in multivariate data. A candidate for this task which is compatible with existing neuromorphic hardware, has been presented by Nessler and colleagues [22]. We will investigate the use of such a design in forthcoming work.

When using a non-continuous static encoding of the data, the network achieves a creditable classification performance on this dataset of 94.55% (SVM 99%, see [4]). However, performance aside, for continuous data there is a conceptual advantage in using a spiking network: time is an essential component of computation in spiking networks. Hence, dealing with continuous timeseries data, such as sensor readings, is natural. In contrast to our previous approaches [4, 14], once training is complete, the sensor signals can now be fed continuously into the classifier, with no discretisation required, thus removing one element of abstraction (the specific time of signal onset) between the data and the classification. Therefore, while in this case the network's performance on discretised, static encodings did not surpass what could be obtained with an SVM, this implementation is an important step towards real-time processing of continuous sensor data with spiking networks.

The demonstrated ability to classify based on early response curve data is important because, although metal oxide sensor speed is improving [23], it still lags insect ORN response by at least two orders of magnitude. This difference becomes most apparent in field settings where the spreading of odour plumes from diverse sources will result in rapidly altering concentration pockets reaching a sensor array [24]. Whilst bio-based sensors are now on the horizon [68], the cost and availability of conventional sensors confers a clear advantage.

The success of a parallel GPU-based implementation of this classifier is significant as it offers the potential of a simple and scalable route to tackle real time and continuous engineering problems with a neuromorphic engineering approach. Affordable GPU power continues to rise rapidly, with platforms available from supercomputer clusters to mobile.

However, our approach remains to be tested against more realistic continuous odour data, i.e. chemical mixtures, varied concentrations, masking by uninteresting background odours, intermittent and intermingled odour plumes. These cases are particularly relevant when considering the problems encountered by potential mobile and embedded applications of e-nose systems [25]. Although some interesting results have been achieved with binary mixtures using a similar bio-inspired model [15], it has also been suggested that the components of an odour mixture emanating from separate sources are easier to disentangle when analysing odour space in continuous time, instead of averaging over large temporal windows [24]. In this domain, the fact that time is intrinsic to this classifier could bear real benefits in performance.

6. Methods and materials

6.1. Implementation

Data here refers to the GPU-based implementation (figure 1(c)) of the conceptual classifier model (figure 1(b)).

We use clusters of 60 neurons throughout to ensure consistent averaged spike rate coding and to minimise potential synchrony effects occurring in inter-cluster competition [14]. Using 40 VRs, this implies 6000 neurons and some 6 million active synapses. To achieve real time classification at this scale a GPU-accelerated neural simulator (GeNN, [26, 27]) was chosen, using CUDA C to run at 8 times real-time on a standard NVidia card (GeForce GTX 760–1152 cores). Classifier code is implemented in C++ running on a high end Linux workstation (8-core, 3.7 Ghz Xeon, 32GB RAM) with the g++ compiler. Both model and classifier are available as examples in the GeNN repository [26].

6.2. Input encoding details

E-nose recordings comprise 300 s, 12-sensor timeseries data at 2 Hz sampling. Recordings are zeroed when the chemical is released. A basic smoothing filter is applied using a simple moving average. Timeseries data is then reduced to a static representation comprising a single 16-dimensional vector constructed from four sensors, sampled at four points. Sample time and sensor choice were informed by previous classification work on this data set [4], resulting in a choice of two doped tin oxide sensors and two novel chromium titanium oxide with sample times set at 9, 16, 23, 30 s (see figure 2(b)).

With each recording in the training set now represented by a point in 16-dimensional input space, a set of VR points is selected to representatively span this space, generated for each cross validation dataset using the 'neural gas' algorithm [19].

6.3. VR response

A proximity response r $\in $ (0,1] is elicited from each VR according to its distance d in data space to the current input point. The response r is made unit free by scaling with the average distance davg between all input points in the set. The wide distribution of relatively tight class clusters in data space (figure 4(a)) means that using a linear response function (see [14]) performs poorly on this data set. An exponential decay function is employed instead. The parameters k and m are used to widen the response field whilst emphasising the nearest VR (figure 3(b)). The rate for the ith VR is given by

Equation (1)

where d is calculated as the 'Manhattan' distance between points, i.e. the sum of the absolute coordinate differences. The response curve is shown in figure 3(b), for k = 5, m = 0.7.

6.4. Spiking model details

We model an abstracted insect olfactory system in GeNN [26] as three main 'layers': RNs, PNs and ANs as shown in figure 1(c). Sub-population 'clusters' represent VR activation (RN), glomeruli (PN) and recognised classes (AN).

The RN input neurons implement Poisson rate coding, producing spike trains with an average spike rate that tracks the current response level of their assigned VR. The required average spike rate ρ is set as:-

Equation (2)

where r is the (0,1] response of the associated VR and ρmax = 70 Hz, ρmin = 5 Hz are the maximum and minimum spike rates.

PN and AN layers are implemented as 'map' neurons [28]. This model employs a phenomenological, discrete time dynamical map with a fixed timestep Δt set at 0.5 ms. These can be computed much faster than conductance models (HH) enabling modelling of up to 105 neurons in real time.

Synapses are considered to have zero axonal transmission delay and the synaptic current Isyn at time t flowing from a presynaptic to a postsynaptic neuron is defined as:-

Equation (3)

where gsyn is the synapse conductance set via the connectivity matrix and is potentially updateable through plasticity. S(t) represents the amount of neurotransmitter available at the postsynaptic neuron. Each presynaptic spike delivers a fixed amount which is decreased exponentially on each timestep. Vpost(t) represents the membrane potential of the postsynaptic neuron at time t whilst Vrev denotes the synapse reversal potential (Vrev fixed to 0 mV for excitatory, −92 mV inhibitory).

6.5. Model topology details

Topology comprises RN–PN connectivity, PN–PN, PN–AN and AN–AN (see figure 1). Note that all connectivity is inter-cluster, there are no internal connections between neurons of a single cluster. Fed by a VR, each RN cluster excites its corresponding PN cluster (a notional 'glomerulus') using a common fixed weight with 50% random connectivity.

Competition between glomeruli is implemented by PN–PN inhibitory synapses. PN neurons connect only to those in other glomeruli (random 50% connectivity) using a fixed weight chosen to damp activity outside of the most active clusters.

Associating VR activation with output classes is controlled by the PN–AN synapse population, set at 50% random connectivity. Weights are initialised and modified in training according to the plasticity rule (see next section).

The final synapse population, AN–AN acts to provide a single clear 'winning' class cluster in the AN output layer. A WTA configuration is constructed by lateral inhibition between neuron clusters representing each class. In contrast to lateral inhibition between PNs in different glomeruli the inhibitory weight is set higher in the WTA to eliminate activity outside the winning cluster.

6.6. Plasticity and learning rule

Before training runs all PN–AN synapses are initialised randomly between minimum and maximum weights, wmax and wmin. Simple reinforcement style learning is applied to strengthen PN–AN connections that lead to correct classifications and weaken those producing incorrect classifications. To this end, after each recording presentation, weights are adjusted up or down according to spikes in the 'winning' AN class cluster. For the set Swin comprising every AN neuron that spiked in that cluster, all incoming synapse connections from PN are considered and its weight is adjusted by a fixed amount ±Δw, (w bounded to wmin, max). The sign of the adjustment follows whether the classification decision was correct or not. Thus the weight wij from the ith PN neuron to the jth AN neuron is adjusted as:-

Equation (4)

Acknowledgments

Dr R Binions of Queen Mary, London provided the novel CTO-based sensors used for half of the underlying recordings. This work was supported by the EPSRC (eFuturesXD initiative for collaborative research ID: EFXD13024) and the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreements n° 331892 (Marie Curie IEF to MS) and 604102 (Human Brain Project to TN). The underlying Fox ENose dataset is publicly available at the CSIRO Data Access Portal via the DOI:10.4225/08/552C4424EE51E.

Please wait… references are loading.