Human activity recognition: suitability of a neuromorphic approach for on-edge AIoT applications

Human activity recognition (HAR) is a classification problem involving time-dependent signals produced by body monitoring, and its application domain covers all the aspects of human life, from healthcare to sport, from safety to smart environments. As such, it is naturally well suited for on-edge deployment of personalized point-of-care analyses or other tailored services for the user. However, typical smart and wearable devices suffer from relevant limitations regarding energy consumption, and this significantly hinders the possibility for successful employment of edge computing for tasks like HAR. In this paper, we investigate how this problem can be mitigated by adopting a neuromorphic approach. By comparing optimized classifiers based on traditional deep neural network architectures as well as on recent alternatives like the Legendre memory unit, we show how spiking neural networks can effectively deal with the temporal signals typical of HAR providing high performances at a low energy cost. By carrying out an application-oriented hyperparameter optimization, we also propose a methodology flexible to be extended to different domains, to enlarge the field of neuro-inspired classifier suitable for on-edge artificial intelligence of things applications.


Introduction
Fast growth and widespread availability of smart devices integrating a high number of sensors have significantly changed the idea of body sensor networks (BSNs) during the last few years, emphasizing the actual feasibility of the concept and successfully leading it towards a simplified and non-invasive monitoring of physiological and activity signals.Thanks to the wearable devices increasingly present in our daily life, BSNs consisting of a number of body-worn sensor nodes wirelessly collaborating can be shrunk down to a single device [1][2][3].As a consequence, the coupling of device miniaturization and BSNs dimension reduction opened the way to a wide range of applications for wearable sensors, including, but not limited to: healthcare, elderly assistance, fitness, and gestures recognition [4][5][6][7].Nonetheless, critical challenges still have to be faced, as new constraints must be taken into account when dealing with the limitations imposed by devices intended to be as small and portable as possible [8].As extensively pointed out in [9], the ultimate goal of edge computing for wearable devices requires a change of paradigm materializing in a reduction of the computational efforts.Typical wearable sensors are indeed affected by severe limitations in terms of power, and the conventional approach-based on data transmission to off-chip, remote servers in charge of processing the acquired signals-introduces an additional limitation on the temporal side.Meeting these challenges would mean setting the scene for effective real-time processing of data, suited to enlarge the range of personalized services to be efficiently and widely deployed on smart edge devices [10][11][12][13][14].
Among the number of possible tasks related to body monitoring, human activity recognition (HAR) stands out for its relevance due to the inherent richness of information and the consequent adaptability to different applications.A reliable and responsive classification of ongoing user activity, besides being useful in monitoring activities of daily living, can also be decisive in safety-critical situations [15].As a general definition, HAR is the analysis and classification of signals related to human actions, and it can be divided in categories according to the type of devices and sensors employed to acquire those signals [16].However, due to the wide diffusion of smart devices and to their ease of use with just minor installation constraints, wearable sensors have been attracting the most attention in HAR research over the last decade [17,18].At the same time, from the algorithms standpoint, great efforts have been devoted to developing new machine learning (ML) models for more and more accurate classification, employing either traditional ML techniques or deep learning (DL) methods [19][20][21][22].
In this race to the best classification result, a vast majority of the proposed solutions have focussed only on the accuracy performances, not taking into account the possibility of shifting towards more biologically inspired models and thus leaving aside the alternative perspective offered by a neuromorphic approach.Nevertheless, adopting such neural-inspired paradigm, spiking neural networks (SNNs) [23], thanks to their event-based asynchronous operations, could represent a valuable candidate for energy-efficient solutions [24].SNNs are indeed bio-inspired models offering practicable trade-offs between biological conformity and simulation runtimes, which can provide low-power computation as a result of their temporally sparse activity based on binary spikes [25,26].Such hallmark, which makes SNNs a clean-cut set of artificial neural networks (ANNs) with increased bio-plausibility, can be directly traced back to the distinguishing feature of biological neural networks with respect to non-spiking ANNs: in the brain, neurons communicate by spreading information in form of spikes, which are referred to as action potentials [27][28][29].Relying on spike-based activity, SNNs also feature an intrinsic suitability for temporal information processing, which allows them to treat time as an additional dimension of the input signals [30,31].As a consequence of their brain-like, or at least brain-inspired, properties, these network models are of primary interest as a natural programing paradigm in neuromorphic computing.
Complementary to neural computing, whose primary aim is the implementation of ANNs to deal with practical tasks, neuromorphic computing brings the attention to the mimicking of neural processes in new and alternative computer architectures [32,33], with the direct consequence of driving efforts towards the development of specific neuromorphic hardware [34][35][36][37][38][39][40][41].Besides these platforms, neuromorphic simulators have also attracted significant interest, resulting in powerful tools able to sustain the neuromorphic computing growth even in cases where dedicated hardware is not readily available [42].Among them, an intriguing example is represented by Nengo.Based on the Neural Engineering Framework (NEF) [43], it allows to build networks from single neuron models, providing the keys to access low-level neural archetypes to perform highlevel functional tasks [44].Additionally, the front-end API is designed to make it flexible and easy to adapt to specialized neuromorphic platforms, such as Intel's Loihi [45], as well as to deep learning models [46].
In this work, we present a comparative analysis of different models and architectures for the HAR task.We adopted the Wireless Sensor Data Mining (WISDM) smartphone and smartwatch activity and biometrics dataset [47,48] to investigate a classification approach based on raw data only.Particularly, employing Nengo, we benchmark the beneficial effect of adopting a neuromorphic paradigm alternative to classical deep learning solutions; thus presenting, to the best of our knowledge, the first evaluation of bio-inspired models for HAR directly from raw data.

Background
In recent years, a rich literature has been produced in the HAR domain, and deep learning techniques have been extensively applied in a number of works testing them on different datasets [49].Similarly, neuromorphic computing has been attracting growing interest, leading to an ongoing increase of attention on bio-inspired networks and dedicated hardware [50].

Human activity recognition
Independently of the targeted application, ranging from healthcare to surveillance, body monitoring focussed on movement can be classified according to the sensors or sensor network characteristics.In this regard, during the last decade, following the increasing amount of data easily recorded and collected through personal and non-invasive devices, a natural selection in favour of wearable sensors, generally embedded in smart devices, started to take place in the HAR field.As a result, many works adopting deep learning techniques, with solutions relying on convolutional neural networks (CNNs), recurrent neural networks (RNNs) or their combinations, have been produced for HAR tasks based on, or intended for, portable platforms [51][52][53][54][55][56][57].In a high number of cases, smartphone sensors, typically inertial measurement units (IMUs), are employed in the gathering of datasets, like in the case of WISDM [58] and UCI-HAR [59] (also referred to as SBHAR [60]).Beside these well-established ones, a newer version of the WISDM dataset [47,48] is also attracting growing interest, due to its more balanced classes and to the addition of smartwatch signals.Other wearable devices and sensors are then taken into account by datasets like PAMAP2 [61,62], MHEALTH [63,64], OPPORTUNITY [65,66], PUC-Rio [67], WHARF [68], USC-HAD [69], and UTD-MHAD [70].
An example of benchmarking for network architectures on various datasets can be found in [71], where an extensive analysis is carried out involving different techniques and data.Furthermore, interesting insights into the impact of data segmentation on the classification accuracy are given.Identifying an optimal window size for time-varying signals like those treated in HAR is indeed of key importance from a twofold perspective.It can lead to high levels of accuracy, thus providing more reliable classifiers, while in terms of time-to-classify it can be crucial to assess suitability for real-time applications.
In [72], a summary of representative window sizes employed in the HAR task is reported, showing that typical choices fall within 1 s and 10 s.Exceptions can be found in works by Ordoñez et al [73], by Wan et al [74] and by Xia et al [75], where temporal windows down to 0.25 s are employed for different datasets.Mekruksavanich and co-workers [76,77], as well as Oluwalade et al [78] and Ihianle et al [79], adopted instead a signal segmentation of 10 s.

WISDM dataset
In 2019, the Wireless Sensor Data Mining (WISDM) Lab published the WISDM smartphone and smartwatch activity and biometrics dataset [47,48].Composed of data from 51 subjects performing 18 activities, this dataset collects signals from both the accelerometer and the gyroscope of a smartphone and a smartwatch.Each activity is recorded for 3 min with an acquisition rate of 20 Hz.With respect to the older WISDM dataset [58], this version is not only enriched in number of activities but also improved in terms of class balance, with each activity represented in the dataset with a relative contribution ranging from 5.3% to 5.8% of the 15 630 426 total samples.Additionally, three subsets can be identified within the dataset according to activity type: on-hand-oriented, general hand-oriented and eating hand-oriented.As an example, the kernel density estimation of the 3D smartwatch data from accelerometer and gyrospcope of the 'general, hand-oriented' subset of the WISDM dataset is shown in figure 1.Here, an overlap between the raw signal values can be appreciated.

Benchmarks and IoT applications of neuromorphic solutions
The actual chances of success for neuro-inspired and neuromorphic approaches promising energy efficiency improvements are being increasingly tested since the last few years.In [80], Blouw and co-workers benchmark different hardware performing keyword spotting tasks, showing a significantly reduced energy consumption when using Intel's Loihi chip.Researchers' achievements using Loihi are summarized in [33], and the platform has also been compared with a SpiNNaker 2 prototype by Yan et al [81].
In [82], the effectiveness of an ANN-to-SNN conversion addressing the heartbeat classification task and subsequently deployed on Loihi is evaluated.An extended benchmarking of neuromorphic hardware is then provided by Azghadi et al [83], who tested multiple platforms on biomedical applications.
In [84], the benefits of a neuromorphic approach are highlighted, assessing the computational cost reduction provided by SNNs developed in Nengo with respect to architecturally identical deep neural networks (DNNs).
In [85,86], authors have investigated the advantages of using the SpiNNaker neuromorphic architecture [34] for executing massively parallel general-purpose algorithms such as PageRank and DNA sequence matching, implemented with the MPI paradigm.
Internet of things (IoT) is forecast to be one of the fields which will most benefit from the development of neuromorphic models and technologies.A survey of IoT platforms enabling artificial intelligence (AI) applications has been proposed by Kim et al [87], while the impact of neuromorphic systems on Industry 4.0 has been investigated in [88].The role of edge computing in the artificial intelligence of things (AIoT) field, as well as in healthcare and other smart environments, has been instead reviewed by Chang et al [89].Promising results for event-driven on-edge applications of AI in the IoT field have been shown in [41] presenting the neuromorphic IC μBrain.

Nengo
Relying on the Neural Engineering Framework [43] as the guiding principle to build neural models accounting for functional objectives as well as anatomical constraints, Nengo was built as a simulator able to provide sophisticated networks featuring cognitive abilities starting from single neuron models [44].
The three NEF principles, namely representation, transformation and dynamics, are translated by Nengo into the fundamental units for networks construction, defining three core objects called ensemble, node and connection.Their combinations produce two further objects, network and model, while probe is defined as the object allowing to gather data during simulations.Such set of six front-end objects represents the toolkit to build the neural model to be passed to the simulator, which in turn encloses the back-end logic for the network simulation.
A key feature of Nengo is the flexibility of its simulator, ensured by the possibility of adapting it to specific, and possibly specialized, hardware [46].For instance, NengoLoihi is a specialized backend for running Nengo models on Intel Loihi.Furthermore, as a result of this adaptability, models from different frameworks can be simply integrated through NengoDL's converter, which translates deep learning models by replacing standard activation functions with Nengo's spiking neurons.

Legendre memory unit
Neural communication relies on complex processes resulting in transmission and filtering of spikes through synapses.These mechanisms can be modelled by means of ordinary differential equations (ODEs) integrated over time, which allow to approximate the behaviour of time cells [90,91].The Legendre memory unit (LMU) is a recurrent architecture able to perform such approximation for a continuous-time delay [92].The main property of the LMU network is the capability of decoding a delayed signal u(t − θ ), contained within a sliding window of length θ, through a high-dimensional projection of the input u(t) that is orthogonalized using the shifted Legendre polynomials [93].The ith shifted Legendre polynomial is given by equation ( 1) and it is used to delay the input signal through equation ( 2) where the highest order d − 1 in the series expansion is related to the dimension of the state vector m(t), defined by the input u(t) as it follows in equation ( 3) with A and B representing the ideal state-space matrices derived using the Padé approximants through equations ( 4) and ( 5).
Although little literature has been produced so far on LMU applications, remarkable results have already been reported, showing state-of-the-art outcomes in terms of accuracy and interestingly small numbers of parameters when performing keyword spotting [94].

Methodology
The unprecedentedly huge amount of data produced in the IoT era is posing renewed challenges to cloudbased solutions based on back-and-forth transmission from end devices.Possible ways to face these difficulties are offered by the so-called fog computing and edge computing.The latter particularly aims at bringing data processing close to the sensors, moving computation down from the application layer to edge devices [95].To accomplish this goal, the identification of lighter and less demanding computing solutions is of key importance, and the neuromorphic paradigm can provide well-suited tools.
To investigate this aspect, we propose a comparison of different neural networks of both recurrent and convolutional type, spiking and non-spiking.We also adopt neuro-inspired approaches to the HAR task through innovative solutions like the LMU, pointing out the differences between traditional DNNs and SNNs from a twofold perspective: beside the classification performance, we also evaluate the computational effort and memory demand.Such comparison is performed at the end of the optimization pipeline graphically summarized in figure 2. Here, vertical arrows identify preliminary steps, specifically involving dataset selection (a) and design of the optimization experiment (c) and (d), while horizontal arrows depict the subsequent phases along the backbone of the whole study: the neural network architectures selection (b), the hyperparameters optimization (e) and the final achievement of classifiers specifically tailored to HAR (f).In the following subsections we provide more details about each step of our investigation procedure.

Activity subset and time window
HAR, straightforwardly belonging to classification problems, begins with selection of the data, which can be either acquired on purpose or already collected in a dataset.In this work, we employed the data from smart devices available in the WISDM dataset.Specifically, due the increasing spread of wearable devices and their suitability for tailored applications in different domains, we decided to focus on smartwatch data, glimpsing the opportunity for future adaptation of the proposed neuro-inspired approach to other wrist-worn devices, possibly employed for personalized point-of-care monitoring or other customized purposes to be brought as close as possible to the user.In this perspective, from the whole dataset we selected (step (a) in figure 2) the subset of general, hand-oriented activities: (1) dribbling in basketball, (2) playing catch with a tennis ball, (3) typing, (4) writing, (5) clapping, (6) brushing teeth and (7) folding clothes.With the aim of reducing as much as possible the required computational effort in view of on-edge deployment, the only preprocessing step performed was segmentation.
In figure 3, an example of raw 3-axial accelerometer and gyroscope data available in the WISDM dataset.Here, it is highlighted how classification of the raw input signals is not trivial when they are examined in real time, in the absence of any elaboration, filtering or data aggregation.We divided the signals into temporal windows, without overlap, with length of 2 s.Such choice was the result of an initial exploration including longer windows of 5 s and 10 s.With respect to these, the one with length of 2 s offered a valuable trade-off between the need for a sufficiently high number of temporal data for each sample and the goal of providing fast-response classifiers in an anthropocentric definition of real-time.The resulting 36 201 samples, based on raw data only, without any feature extraction, have then been split into training, validation and test set with a 60:20:20 proportion.

Network architectures
As previously introduced, HAR can be successfully performed employing either convolutional or recurrent architectures.At step (b) in figure 2 we accounted for both of these architecture types in order to benchmark possible alternatives offered by neuro-inspired solutions.The CNN comprises two convolutional layers followed by a max pooling layer, a flattening layer and two dense layers, as it is sketched in figure 4(a).We employed the same structure for both non-spiking and spiking convolutional neural networks (figure 4(b)), in the following referred to also as CNN and sCNN respectively.On the other hand, we implemented a recurrent architecture with a structure consisting of a sequence of two long short-term memory (LSTM) layers, each connected to a dropout layer, followed by a dense layer (figure 4(c)).
Differently from the case of the convolutional architectures, our spiking implementation of recurrent networks does not rely on the same architecture adopted in the non-spiking domain: as it is summarized by figure 4(f), we used the LMU in place of LSTMs, with a single LMU layer instead of the repeated LSTM-dropout pair.To further enrich the network comparison and benchmarking, we employed the LMU in a non-spiking network as well (figure 4(e)).The sCNN, the LMU-based network and its spiking version (sLMU) have been implemented by means of the Nengo neural simulator, employing the NengoDL converter to build the spiking CNN directly from its non-spiking counterpart.
We also worked with Nengo to investigate the impact of a human-inspired feature, borrowed from the auditory system, on networks based on the LMU: by analogy with the cochlea, we introduced a frequency filter (ff) on the input (figures 4(e) and (g)) decomposing the original signals into five channels through the The recurrent architectures have instead different structures in the two domains: LSTM units followed by a dropout layer have been employed for the non-spiking implementation (c), while the recurrent SNN has been obtained using the LMU (f).This latter has been adopted in the non-spiking domain (d) as first.An additional variation with the introduction of a frequency filtering on the input has been explored for both the non-spiking (e) and the spiking (g) LMU-based architectures.application, differently from the biological system, of a Butterworth filter bank.In all of the spiking networks under investigation we adopted the rectified integrate and fire neuron model available in Nengo and supported in the Loihi neuromorphic chip.

Hyperparameter optimization
ANNs can be characterized and described from two complementary perspectives.On the one hand, there is the architecture, namely the number and the type of layers employed and how they are connected to each other; on the other hand, there are the hyperparameters, which specifically identify each network determining its inherent behaviour.Consequently, as it is also pointed out in [96], hyperparameter optimization (HPO) must be accounted for when different network topologies are investigated and compared, especially in cases where unnecessary complexity must be prevented.
Steps from (c) to (e) in figure 2 summarize the procedure for the hyperparameter tuning we performed by means of the neural network intelligence (NNI) toolkit [97], using the built-in annealing algorithm.For each network, we designed an NNI optimization experiment, carried out within a proper search space defined at step (c).Each optimization experiment is composed of 1000 trials, with 4 evenly spaced random re-initializations of the tuner, intended to partially mitigate the problem of local minima affecting the annealing algorithms [98].At the end of every trial, consisting of 100 training epochs, the weights providing the best training accuracy have been extracted to evaluate the test accuracy, defined to be the optimization objective of the experiment.For all the investigated networks, training has been performed employing Adam optimizer with constant learning rate, including optimization of this latter throughout the experiment trials.All these settings for the NNI experiments are performed in step (d) of figure 2. A summary of the hyperparameters contained in the search spaces employed for the HPO is reported in table 1.

Comparison criteria
At the end of the proposed pipeline, labelled as step (f) in figure 2, we obtained a trained classifier with optimized hyperparameters for each network architecture.We then set out to compare these classifiers, with the goal of assessing the advantages offered by neuro-inspired approaches without taking the risk of evaluating them from a narrow perspective mainly focussed on accuracy performances.
In order to make a comprehensive comparison between networks which rely not only on different architectures but also on different inherent working principles, we adopted multiple metrics beside the classification accuracy.The number of parameters and the memory footprint have been considered for all the networks.In Step size for weights update in each learning iteration a The default value in Nengo of 1 ms is used.b All the hyperparameters for the non-spiking LMU are specifically re-optimized for the spiking implementation.
the case of non-spiking networks, we evaluated the number of floating point operations (FLOPs) and the corresponding estimated energy consumption on Intel's Movidius Neural Compute Stick 2. Whereas, for spiking networks we assessed the number of neurons, the number of synaptic operations (SOPs), and the corresponding estimated energy consumption on Intel's Loihi.Our energy evaluations rely on the results presented in [84].

Results and discussion
Neural networks benchmarking, and classifiers comparison in general, is naturally prone to the risk of an oversimplification taking shape in the evaluation of accuracy as the only meaningful metrics; and in the neuromorphic domain, such a simplistic approach can become even more deceptive.Although it certainly plays a decisive role, classification accuracy cannot be considered as independent of other figures of merit like energy consumption or memory footprint.Both of these quantities bring indeed with them crucial information for a deeper evaluation, and knowledge, of neuro-inspired solutions to classification problems.Especially when SNNs achieve classification performances comparable to other non-spiking DNNs, accuracy alone might not be enough to propose a fair comparison and a valuable benchmarking.Consequently, as already introduced, in this work we accounted for multiple metrics.For each network, they have been evaluated taking into account the optimal hyperparameter configuration provided by specifically designed NNI experiments, each of them carried out performing 1000 trials.Thus guarantying a comparable development effort in optimizing the parameters for the different solutions.
Table 2 summarizes the considered metrics together with the corresponding values for each network.The optimized hyperparameters of the different architectures are instead reported in the supplementary material in tables S1 and S2 for the non-spiking and the spiking networks respectively (available online at https://stacks.iop.org/NCE/2/014006/mmedia).  In order to highlight the advantages of a neuromorphic approach to temporal signal classification, an energy vs accuracy diagram can be employed to effectively show how relevant is the gain in terms of energy reduction with respect to a possible drop in classification accuracy.The results here presented highlight that all the investigated SNNs and all the LMU-based networks ensure an energy consumption one order of magnitude at least smaller than that of traditional DNNs.Concerning the memory footprint, similar conclusion can be drawn, with CNN and LSTM turning out to be largest networks.From the accuracy standpoint, instead, spiking LMUs provide performances comparable to both CNN and LSTM, even overcoming the former.
Following its undeniable, although not unrivaled, leading role, classification accuracy is however the first quantity to be taken into account.The best result from this perspective is given by the LSTM-based network (table 2), which scores (96.42 ± 0.03)%.Interestingly, the second highest accuracy is achieved by its spiking counterpart.The recurrent network based on the spiking implementation of the LMU, relying on rectified integrate and fire neurons, provides indeed a test accuracy of (94.51 ± 0.15)%; which turns out to overcome the performances of the convolutional architecture regardless of whether it is adopted in the spiking or nonspiking domain.Similarly, the spiking LMU enriched with a frequency filtering inspired by the auditory system outperforms both spiking and non-spiking CNNs, with a test accuracy of (94.39 ± 0.13)%.To extend these results with reference to the employed dataset, and to complete the picture offered by the classification accuracies, the confusion matrices produced by all the investigated network on the test set are reported in the supplementary material in section S2.Additionally, in section S3 a comparison with other works adopting DL and ML techniques on WISDM dataset is presented, showing that the SNNs investigated in this work can match state-of-the-art results even overcoming them in some cases.
Further along the rows of table 2, the second metrics we considered is the total number of parameters, which directly leads to the memory footprint.From this perspective, what seemed a plausible forecast looking Figure 6.A radar chart of the presented results allows to easily compare the investigated networks focussing on each of the evaluated metrics.It is worth noting that the traditional DNN architectures here considered, namely the LSTM and the CNN, are outperformed by the alternative ones, based on the LMU, in all the metrics related to energy and memory.Similarly, the spiking CNN also provides significant improvements with respect to LSTM and non-spiking CNN in terms of both energy and memory.at the classification accuracy, namely the LSTM-based network as the optimal solution, is overturn.With more than two millions of parameters, this architecture is indeed by far the most demanding in terms of memory footprint, with a size of 8.50 MB.At the opposite end is the network built on the non-spiking LMU, which is more than one order of magnitude smaller with only 0.30 MB of memory footprint.Similar values are found for LMU (ff) and sLMU also, while the spiking LMU with frequency filtering slightly exceeds these values almost reaching those of the convolutional architectures.The relative size of the different networks can be further appreciated from the circles diameter in figure 5.
Combining the information obtained from the results discussed above, it is straightforward to identify the best network in terms of accuracy and the one with the smallest memory footprint.However, these two networks do not coincide and they even are far from each other regarding both metrics, so that only partial conclusions can be drawn unless energy consumption is assessed.Taking such step further, namely quantifying the advantage of adopting a neuromorphic approach for the considered task, a triplet of fundamental quantities is eventually extracted from each network, making the reported benchmarking not only a comparison of values but also a tool to target possible future applications of the proposed neuro-inspired approach.The bottom two rows of table 2 highlight that energy consumption is assessed referring to two different and specialized hardwares: for the non-spiking networks, Intel Movidius Neural Compute Stick 2 is considered, while Intel Loihi is taken into account for the spiking networks.In both cases, quantitative evaluations are made through the results of [84], which provide the energy cost for a single operation.In table 2, both the number of operations and the required energy per inference are reported for all the investigated network.The same results are also presented in figure 5, where the energy is on the y-axis and the number of operations defines the circles colour.As it is clearly shown, and expected, all the spiking networks are less computationally expensive, with the spiking CNN providing the lowest value of 5.49 μJ.It is worth also noting that the assessed energy consumption for all the LMU-based networks is one order of magnitude at least smaller than that of CNN and LSTM.Once again, the highest value is provided by the latter, even though in this case, as well as for the memory footprint, it does not correspond to the best result.With more than 3000 μJ, such architecture is indeed almost three orders of magnitude more energy-hungry than the sCNN.Within the range defined by these two opposite ends, the sLMU turns out to be the one with the highest accuracy at low energy cost: 50.66 μJ to achieve the second best accuracy here reported, which means it is about two orders of magnitude lower in energy but comparable in accuracy with respect to the LSTM-based architecture.
The trade-off between high classification accuracy and small energy consumption offered by the spiking LMU, coupled with its reduced memory footprint, makes this architecture a relevant candidate for possible on-edge applications of neuromorphic classifiers for real-time tasks.In the view of evaluating the different V Fra et al architectures to address specific tasks and applications, the presented results are also summarized in figure 6, where a radar chart is used to further highlight strengths and weaknesses of the investigated networks.

Conclusion
HAR is a time-dependent task whose application domain extends to all the aspects of human life, from healthcare to sport, from safety to smart environments.In this paper, starting from the HAR problem, we have proposed an operational strategy to identify the optimal solution given a target application.Specifically, we have shown how a neuromorphic approach can be adopted to deal with time-varying inputs accounting for possible deployment constraints.By performing multiple optimization experiments, we have investigated the characteristics and the performances of multiple neural networks, highlighting advantages and drawbacks of recurrent and convolutional architectures with both spiking and non-spiking implementations.In this regard, we have reported a significant reduction in energy consumption for all the investigated spiking networks with respect to their non-spiking counterparts.In more detail, among these SNNs, the spiking implementation of the LMU has been pointed out as the optimal solution to achieve high classification accuracies with low energy consumption.With the analysis presented in this work, we hence have shown a suitable procedure to evaluate the possible benefits of a neuromorphic classifier for on-edge AIoT applications.

Figure 1 .
Figure 1.Kernel density estimation of the values recorded from smartwatch on the 6 IMU sensor axes for the 7 classes in the 'general, hand-oriented' subset of the WISDM dataset.

Figure 2 .
Figure 2. The procedure we adopted can be divided into two complementary phases.On the one hand, depicted by the vertical arrows, there are the preliminary steps: dataset selection in (a), hyperparameters search space definition in (c) and optimization experiment configuration in (d).On the other hand, there is the backbone of the pipeline, summarized by the horizontal arrows: neural network architectures selection in (b), hyperparameters optimization in (e) and classifiers evaluation in (f).

Figure 3 .
Figure 3.A comparison of representative 10 s samples recorded by the smartwatch on the 6 IMU sensors for the 7 classes in the 'general, hand-oriented' subset of the WISDM dataset.

Figure 4 .
Figure 4. Summary of the investigated networks.The convolutional architecture adopted in the non-spiking domain (a) has been translated into the spiking domain (b) by means of the converter in NengoDL.The recurrent architectures have instead different structures in the two domains: LSTM units followed by a dropout layer have been employed for the non-spiking implementation (c), while the recurrent SNN has been obtained using the LMU (f).This latter has been adopted in the non-spiking domain (d) as first.An additional variation with the introduction of a frequency filtering on the input has been explored for both the non-spiking (e) and the spiking (g) LMU-based architectures.

Figure 5 .
Figure 5.In order to highlight the advantages of a neuromorphic approach to temporal signal classification, an energy vs accuracy diagram can be employed to effectively show how relevant is the gain in terms of energy reduction with respect to a possible drop in classification accuracy.The results here presented highlight that all the investigated SNNs and all the LMU-based networks ensure an energy consumption one order of magnitude at least smaller than that of traditional DNNs.Concerning the memory footprint, similar conclusion can be drawn, with CNN and LSTM turning out to be largest networks.From the accuracy standpoint, instead, spiking LMUs provide performances comparable to both CNN and LSTM, even overcoming the former.

Table 1 .
Summary and description of the optimized hyperparameters.For the spiking networks, all the hyperparameters reported for the corresponding non-spiking implementation are take into account as well.the synaptic low-pass filter on the input connection of the LMU synapse_out Time constant of the synaptic low-pass filter on the output connection of the LMU Tau Time constant of the discretized synaptic low-pass filter on the internal connections to memory Spiking LMU b n_neurons In place of units, size of the neuron ensembles (whose number is defined by order) synapse_all Time constant of the synaptic low-pass filter on the connections between neuron ensembles max_rate Firing rate for neuron input equal to 1