Deep Unsupervised Learning Using Spike-Timing-Dependent Plasticity

Spike-Timing-Dependent Plasticity (STDP) is an unsupervised learning mechanism for Spiking Neural Networks (SNNs) that has received significant attention from the neuromorphic hardware community. However, scaling such local learning techniques to deeper networks and large-scale tasks has remained elusive. In this work, we investigate a Deep-STDP framework where a rate-based convolutional network, that can be deployed in a neuromorphic setting, is trained in tandem with pseudo-labels generated by the STDP clustering process on the network outputs. We achieve $24.56\%$ higher accuracy and $3.5\times$ faster convergence speed at iso-accuracy on a 10-class subset of the Tiny ImageNet dataset in contrast to a $k$-means clustering approach.


I. INTRODUCTION
W ITH high-quality AI applications permeating our so- ciety and daily lives, unsupervised learning is gaining increased attention as the cost of procuring labeled data has been skyrocketing concurrently.The ever-more data-hungry machine learning models usually require a humongous amount of labeled data, sometimes requiring expert knowledge, to achieve state-of-the-art performance today.Since manual annotation requires a huge investment of resources, unsupervised learning is naturally emerging as the best alternative.
One of the most prominent unsupervised learning methods is clustering.The main concept of clustering is to compress the input data (like images in the case of computer vision problems) into lower dimensions such that the lowdimensional features can be clustered into separable groups.The efficiency of the sample clustering process improves with better representations of the compressed features.Since the quality of features depends only on the dimension reduction algorithm, the design and choice of the clustering method are critical to the success of unsupervised learning.However, most real-world tasks are not easily represented as separable low-dimensional points.Earlier attempts include classical PCA reduction before clustering [1], while others attempt to augment more features with "bags of features" [2]; but mostly constrained to smaller tasks.Recent works like DeepCluster have explored scaling of unsupervised learning approaches by incorporating the k-means clustering algorithm with a standard Convolutional Neural Network (CNN) architecture that can learn complex datasets such as ImageNet without any labels [3].Some works have also proven that pre-training the network, even unsupervised, is beneficial to building the final model in terms of accuracy and convergence speed [4]- [6].
The focus of this article, however, is on scaling unsupervised learning approaches in a relatively nascent, bio-plausible category of neural architectures -Spiking Neural Networks (SNNs).SNNs have been gaining momentum for empowering the next generation of edge intelligence platforms due to their significant power, energy, and latency advantages over conventional machine learning models [7], [8].One of the traditional mechanisms of training SNNs is through Spike-Timing-Dependent Plasticity (STDP) where the model weights are updated locally based on firing patterns of connecting neurons inspired by biological measurements [9].STDP based learning rules have been lucrative for the neuromorphic hardware community where various emerging nanoelectronic devices have been demonstrated to mimic STDP based learning rules through their intrinsic physics, thereby leading to compact and resource-efficient on-chip learning platforms [10].Recent works have also demonstrated that unsupervised STDP can serve as an energy-efficient hardware alternative to conventional clustering algorithms [11].
However, scaling STDP trained SNNs to deeper networks and complex tasks has remained a daunting task.Leveraging insights from hybrid approaches to unsupervised deep learning like DeepCluster [3], we aim to address this missing gap to enable deep unsupervised learning for SNNs.Further, while techniques like DeepCluster have shown promise to enable unsupervised learning at scale, the impact of the choice of the clustering method on the learning capability and computational requirements remains unexplored.The main contributions of the paper can therefore be summarized as follows: (i) We propose a hybrid SNN-compatible unsupervised training approach for deep convolutional networks and demonstrate its performance on complex recognition tasks going beyond toy datasets like MNIST.
(ii) We demonstrate the efficacy of STDP enabled deep clustering of visual features over state-of-the-art k-means clustering approach and provide justification through empirical analysis by using statistical tools, namely Fisher Information Matrix Trace, to prove that STDP learns faster and more accurately.
(iii) We also provide preliminary computational cost estimate comparisons of the STDP enabled Deep Clustering framework against conventional clustering methods and demonstrate the potential of significant energy savings.

II. RELATED WORKS
Deep Learning: Unsupervised learning of deep neural networks is a widely studied area in the machine learning community [12], [13].It can be roughly categorized into two main methods, namely clustering and association.Among many clustering algorithms, k-means [14], or any variant of it [15], [16], is the most well-known and widely used method that groups features according to its similarities.Its applications can be found in practice across different domains [17], [18].Other approaches focus on associations to learn data representations which are described by a set of parameters using architectures such as autoencoders [19], [20] (where the data distribution is learnt by encoding features in latent space).
In more recent works, such unsupervised learning methods have been applied to larger and more complex datasets [3], making them applicable to more difficult problems.Further, recent advances in generative models have also provided opportunities at mapping unlabeled data to its underlying distribution, especially in the domain of image generation using Generative Adversarial Network (GAN) [21] with reconstruction loss directly [22] or using the auto-encoded latent space [22]- [24].Dumoulin et al.'s recent effort at combining GAN and auto-encoder has demonstrated even better performance [22].Bio-Plausible Learning: Visual pattern recognition is also of great interest in the neuromorphic community [25], [26].In addition to standard supervised vision tasks, SNNs offer a unique solution to unsupervised learning -the STDP learning method [9].In this scheme, the neural weight updates depend only on the temporal correlation between spikes without any guiding signals, which makes it essentially unsupervised.While it offers a bio-plausible solution, it is rarely used beyond MNISTlevel tasks [9], [27], [28] and primarily used for single-layered networks.Going beyond conventional STDP based learning, Lee et al. [27] proposed an STDP-based pre-training scheme for deep networks that greedily trained the convolutional layers' weights, locally using STDP, one layer at a time but limited only to MNIST.Similarly, in Ferre et al.'s work [29], the convolutional layers were trained on CIFAR10 and STL-10 with simplified STDP, but the layers were also trained individually with complex mechanisms.Beyond the STDP framework, some studies draw inspiration from alternative biological mechanisms such as local learning (DECOLLE) [30], equilibrium-state-based learning (Equilibrium Propagation) [31]- [33], Implicit Differentiation [34]- [36], among others to achieve bio-plausible learning with much less dependence on the gradient.However, most works involve significantly more complex hardware implementation than STDP based learning approaches.For instance, DECOLLE requires the local computation of loss and backpropagation of errors at each layer, thereby introducing additional overhead.Similarly, Equilibrium Propagation requires the determination of rate of change of the spiking rate of the neurons to perform local weight updates [32].
Our work explores a hybrid algorithm design based on a merger of the above two approaches.Our proposed frame-work provides a global training signal for the CNN using a straightforward and end-to-end STDP-based SNN implementation.We demonstrate significant accuracy improvement and computation savings for VGG-15 architecture on the Tiny ImageNet dataset in contrast to state-of-the-art deep clustering approaches.

III. PRELIMINARIES A. Deep Clustering with k-means Algorithm
Deep Clustering [3] enabled unsupervised training of visual features primarily relies on the ability of clustering algorithms like the k-means to group together similar data points.kmeans is a popular unsupervised algorithm for separating data points into distinct clusters.Given a user-specified value of k, the algorithm will find k clusters such that each data point is assigned to its nearest cluster.The vanilla implementation of the k-means algorithm iteratively calculates the Euclidean distance between points for comparison and updates the cluster centroids to fit the given distribution.
Deep Clustering utilizes the traditional CNN architecture to obtain the features to be used for clustering.The reason behind this feature reduction choice hinges upon the fact that a randomly initialized and untrained CNN outperforms a simple multilayer perceptron network by a considerable margin [37].Driven by this observation, the main idea behind this framework is to bootstrap the better-than-chance signal to teach the network and learn the features.This teaching signal is transformed into a 'pseudo-label' so that the network can learn from it.The 'pseudo-labels' which may or may not be the same as the ground truth labels reflect the direction that the network weights should be updated.By doing so, the feature extraction layers may become slightly better at recognizing certain features and thereby producing more representative features.The improved features can ideally be more separable, thereby generating higher quality 'pseudo-labels'.By repeating this process iteratively, the CNN should ideally converge by learning the 'pseudo-labels' [3].
Note that the CNN layers used for feature-reduction purposes can be converted into SNN layers with various methods as shown in many recent studies [7], [38]- [41], or trained from scratch using backpropagation through time (BPTT) [42], [43] which opens up the potential for adopting the entire featurereduction in a low-power neuromorphic setting.In this work, we therefore do not focus on the CNN-SNN conversion and train it by backpropagation without unrolling through time.

B. STDP Enabled Neuromorphic Clustering
STDP is an unsupervised learning mechanism that learns or unlearns neurons' synaptic connections based on spike timings [44].In particular, the synaptic connection is strengthened when the post-synaptic neuron fires after the pre-synaptic neuron, and the connection is weakened if the post-synaptic neuron fires before the pre-synaptic neuron.The intuition behind STDP follows Hebbian learning philosophy where neurons that are activated together and sequentially are more spatio-temporally correlated and thus form a pattern, and vice versa.This learning rule enables the encoding of complex input distributions temporally without the need for guiding signals such as the label.The weights of the neuronal synapses are updated based on spike timings [9] as follows: where, w is the weight, A +/− are the learning rates, ∆t is the exact time difference between post-neuron and preneuron firing and β +/− are the time-constants for the learning windows.In practical implementations, the exact spike timing is usually replaced with a spike trace (see Section IV-B) that decays over time to reduce memory storage for STDP implementation [45].
STDP training is predominantly explored in Winner-Take-All networks in literature which consists of an excitatory layer of neurons with recurrent inhibitory connections [9] (see "STDP Enabled SNN for Clustering" sub-panel in Fig. 2).Such connections create a mechanism called 'lateral inhibition' where activated neurons inhibit other neurons' activities and therefore assist the activated neurons to accentuate the learning process of its weights.To prevent any neuron from dominating the firing pattern, the second key mechanism is 'homeostasis' which balances the overall activities of the neurons.Homeostasis prevents neurons from runaway excitation or total quiescence.One popular way to achieve this is through adaptive and decaying thresholding in which after every firing event, the firing threshold increases such that the firing neuron requires higher membrane potential to fire again in the future.Consequently, this will provide opportunities for other neurons in the network to fire and learn the synaptic weights.The critical balance of these two mechanisms ensures stable learning of the SNN.Fig. 1 shows an example of STDP-trained weights of the excitatory neuron layer of an SNN where representative digit shapes are learnt without any label information for the MNIST dataset [46].Each neuron in the network represents a cluster.By running inferences on the STDP network, we can cluster the inputs according to their corresponding most activated neuron.The learnt weights of each neuron is equivalent to the centroid of the cluster represented by that neuron.The ConvNet is subsequently trained through backpropagation using the pseudo-labels.

IV. METHODS A. Proposed Deep-STDP Framework
As mentioned previously, the convolutional layers of the network compress the input images to a lower dimensional feature space as a one-dimensional vector.In abstract terms, the framework solves the following optimization problem [3]: such that y ⊺ n 1 k = 1 where, N is the total number of training samples, y n is the n-th optimal neuron assignment encoded as a one-hot vector, f θ is the ConvNet forward pass output parameterized by its weights θ, img n is the n-th input sample, w yn is the STDPlearnt synaptic weight map of the most activated neuron, d is the feature dimension of the ConvNet output and k is the number of neurons/clusters in the network.By minimizing the difference between the weights of the neurons and the patterns of the features, we can obtain an SNN that generates optimal assignments of y n parameterized by weights w, which act as the pseudo-labels for our algorithm.
With the pseudo-labels, the network training can be accomplished through the standard minimization problem of network loss which can be described by: where, θ, ρ are parameters of the ConvNet f θ (•) and classifier g ρ (•) respectively, L(•) is the loss function, img n again is the n-th image input, y * n is the n-th optimal pseudo-label for this iteration.
However, SNNs only accept discrete spikes as input and therefore the ConvNet feature outputs in floating-point representation (after appropriate pre-processing like PCA reduction and l 2 -normalization [3]) are subsequently rate encoded by a Poisson spike train generator, where the feature values are used as the Poisson distribution rate and sampled from the respective distribution.At the end of the pseudo-label assignment, the STDP enabled SNN resets for the next iteration.This is intuitive since after the ConvNet weight update process, the feature distribution gets shifted and hence a new set of neuron/cluster weights should be learnt by the STDP framework.Algorithms 1-2 describe the overall structure of the proposed Deep-STDP framework shown in Fig. 2.

B. STDP Enabled SNN for Clustering
Clustering in the SNN is mediated through the temporal dynamics of Leaky-Integrate-Fire neurons in the excitatory layer.In the absence of any spiking inputs, the membrane potential of neurons in the excitatory layer is represented by V exc at timestep t, or simply V t exc .It initializes with V t=0 exc = V rest and decays as, where, V rest is the resting potential and V decay is the potential decay constant.
Prior works [9] on using SNNs for clustering have mainly dealt with simple datasets without negative-valued features.This is in compliance with the nature of STDP learning for positive valued spikes.However, in our scenario, we consider negative valued spiking inputs as well in order to rate encode the negative features provided as output of the ConvNet.In order to enable STDP learning for negative inputs, we decompose the weight map into positive and negative components to learn positive and negative spike patterns respectively.Therefore, in presence of spikes, the excitatory layer's neuron membrane potential dynamics is updated as, where, the membrane potential is denoted by V t exc at timestep t, and the input spikes and pre-synaptic weights are represented by s pre and w respectively (with their positive and for Neurons with (l == 0) do // Non-refractory 16: V t exc ←Update using Eq. 5 17: Fire spikes when (V t exc > V thr + ϵ) ⇒ s post 18: for Neurons with (V t exc > V thr + ϵ) do Update using Eq.7 and 8 ⇒ ∆w + , ∆w − 25: V t exc excluding itself ← s post • w inh // Lateral inhibition 26: end for 27: return labels pseudo ⇒ most activated excitatory neuron IDs negative counterparts).It is worth mentioning here that preneurons refer to the input neurons and post-neurons refer to the excitatory layer neurons since the synapses joining them are learnt by STDP.Further, there is a refractory period L parameter for every neuron which will only allow execution of Eq. 4 and 5 if the refractory counter, l, equals '0'.A spike will be generated when the membrane potential at the current timestep is greater than the membrane threshold: where, V thr is the membrane threshold to fire a spike, ϵ is the adaptive threshold parameter, l is the refractory period counter which is reset to L upon a firing event and decays by 1 otherwise (thereby preventing neurons from firing for L timesteps after a spike).V t exc resets to V reset after firing a spike.The adaptive threshold parameter acts as a balancer to prevent any neuron from being over-active (homeostasis) and is incremented by parameter α upon a firing event and otherwise decays exponentially at every timestep similar to Eq. 4: exp( 1 ϵ decay )ϵ.Every spike generated by a post-neuron triggers a membrane potential decrement by an amount w inh for all the other neurons except itself.
In the context of our implementation, we used the spike trace τ to represent the temporal distance between two spikes.The spike trace value peaks at its firing to τ o and exponentially decay as time lapses: exp( 1 τ decay )τ .The weight updates are similarly separated into positive and negative parts.Pre-synaptic update: Post-synaptic update: where, ∆w are the weight updates, η pre , η post are the learning rates for pre-and post-synaptic updates respectively, τ is the spike trace, and s is the spiking pattern.Superscript ( pre ), ( post ) indicates whether the trace or spike is from pre-or post-synaptic neuron respectively, and the subscript ( + ), ( − ) indicates whether the operation is for positive or negative input spikes.Note that the negative s pre − can be flipped easily by the distributive property of matrix multiplication.

A. Datasets and Implementation
The proposed method was evaluated on the Tiny ImageNet dataset, which is a center-cropped subset of the large-scale ImageNet dataset [47].Unlike the ImageNet 2012 dataset, which contains 1000 object categories, the Tiny ImageNet dataset comprises of only 200 categories.Due to computation constraints, we selected the first 10 classes from the Tiny ImageNet dataset by the naming order and considered both the training and testing sets for those corresponding classes in this work.All images were normalized to zero mean and unit variance and shuffled to avoid any bias.We chose VGG15 as the baseline network architecture with randomly initialized weights.Simulations were conducted using the PyTorch machine learning library and a modified version of the BindsNet toolbox [45] as the base platform for the experiments.The results reported for the DeepCluster framework [3] were obtained without any modification to the open-source codebase associated with the work, and its hyperparameters were unchanged unless mentioned in this work.The ConvNet learning rate was set to 1e − 2 and the number of clusters was set to 10 times the number of classes (recommended as optimal in Ref. [3] and also found optimal in the Deep-STDP framework).The training was performed for 200 epochs.All results obtained were run on 2 GTX 2080Ti GPUs and the associated hyperparameters used for the Deep-STDP framework can be found in Table I.
Numerous cluster re-assignment frequencies were explored and '1' ('2') was found to be the optimal for Deep-STDP (DeepCluster), i.e. the pseudo-labels were generated by passing the entire dataset once (twice) every epoch.Note that this frequency represents the number of dataset iterations per   [48], we froze all network parameters and trained a linear layer at the output to evaluate the efficiency of the model to capture the distribution of images in the training set as well as its usage as a pre-trained model for general use cases.We fixed the random seeds in each experiment such that the clustering process is deterministic for a particular run.To prevent loss in generality, all accuracy results reported here represent the average value over 5 independent runs with different sets of random seeds.

B. Evaluation Metrics 1)
Fisher Information: The Fisher information (FI) quantitatively measures the amount of information retained in a statistical model after being trained on a given data distribution [49].Many prior works have used this metric to measure different aspects of deep learning models including SNN models [50], [51].Unlike prior works, we use pseudo-labels to generate FI instead of ground-truth labels.FI reflects the impact of weight changes on the ConvNet output.If the FI of model parameters is small, we can conclude that the model's learning efficiency is poor since the weights can be pruned without affecting the output, and vice versa.Therefore, this metric implicitly measures the quality of the pseudo-labels.
Let us consider that the network tries to learn y from a distribution p parametrized by a set of weights θ.Given samples x, the posterior distribution is p θ (y|x).The Fisher information matrix (FIM) is defined as: where, X is the empirical distribution of the actual dataset.However, the exact FIM is usually too large to be computed directly and therefore the value is usually approximated by its trace, which is given by: in which the expectations can be replaced by the averaged observation from the dataset of N samples: where, Tr(F ) is the trace of FIM, ∇ is the partial derivative operator.We follow the same implementation as the algorithm specified in Ref. [51].
2) Normalized Mutual Information : Further, following the Deep Clustering work [3], we also measured the Normalized Mutual Information (NMI) metrics to evaluate mutual information between two consecutive assignments of the STDPenabled SNN, given by Eq. 12. NMI(y p , y p−1 ) = I(y p ; y p−1 ) [H(y p )H(y p−1 )] where, y p , y p−1 are label assignments for epoch p − 1 and p respectively, I(•) is the mutual information function, and H(•) is the entropy function.Since the assignments y p , y p−1 are consecutive and are generated from the same inputs, a high NMI value indicates a high correlation between the two sets of assignments as well as stable assignments of the pseudolabels.4).While both algorithms perform similarly during the initial stages, the accuracy and FIM trace start improving significantly for the Deep-STDP approach over subsequent epochs.Performance evaluation metrics (NMI, FIM and Accuracy) for the two approaches at the end of the training process are tabulated in Table II.As detailed in the previous section, NMI is one of the popular metrics measuring the performance of unsupervised learning methods and we observe 0.29 units higher NMI in the CNN trained using our proposed framework.In addition to obviously better clustering quality, this metric implies a much higher degree of shared information between the learned clustering and ground truth clustering which in turn shows that the model with a higher NMI is better at extracting the underlying pattern.Further, a loss-less conversion from the rate-based CNN to a spiking network is attempted for the Deep-STDP trained network following the process reported in Ref. [39].We achieved a similar accuracy (0.5662) in 200 timesteps.Further co-optimization of the SNN accuracy and inference latency can be performed using prior proposals [40].

C. Performance Evaluation
Metric DeepCluster [3]   an additional linear layer for numerical performance analysis, we also visualized the convolutional filter activations of the CNN trained using our proposed framework.We can observe from Fig. 5 that the network forms distinct filters specialized for completely different visual patterns in different layers without using any ground truth label information.On the other hand, similar visualization performed on the DeepCluster trained network yielded similar simple patterns in the shallow layers without any complex patterns represented in the deeper layers, further substantiating the efficacy of the Deep-STDP approach.L6 L12 L24 L36 Fig. 5: Deep-STDP filter activations of Gaussian random noise from layers 6, 12, 24, and 36.We have used the unit-level visualization method proposed in Ref. [52].

D. Computational Cost Estimation
While a detailed system level hardware analysis for the two approaches is outside the scope of this work, we provide motivation for neuromorphic deep clustering by performing a comparative analysis of the computational cost of the two approaches.
1) Cost of k-means Clustering: To find the new centroid of a particular cluster, the algorithm calculates the averaged center of all the data points assigned to that cluster using the following equation: where, c j is the averaged coordinates of the j-th centroid, |C j | is the number of data points assigned to that corresponding cluster, and x i is the i-th data point.Subsequently, the algorithm calculates the Euclidean distance between every data point and every centroid and assigns each data point to the cluster with the shortest distance to its centroid.The goal is to solve the optimization problem: where, argmin C solves for the optimal centroids and k is the total number of clusters.The above two calculations will be repeated until convergence is achieved or until a maximum number of iterations is reached.Hence, the number of mathematical operations can be summarized as follows: where, d is the number of dimensions in the feature.Hence, the number of multiplications (the number of squaring operations) in order to calculate the Euclidean distance is: and the number of addition operations involved is: where, k is the number of clusters, N is the number of training samples, and it is the number of maximum iterations in the k-means algorithm.In Eq. 17, the k • (d − 1) component arises from the summation of individual distance along each dimension while another k • d component arises from the subtraction operation for distance calculation along each dimension.The last d component arises from updating the new cluster coordinates (which in the worst case will iterate through all data points, see Eq. 13).Given the cost of float ADD operation is 0.9pJ and float MULT operation is 3.7pJ in 45nm CMOS process [53], we estimated the total computational cost in the clustering process for every training epoch to be 14.1mJ (considering it = 20).Considering 175 epochs of DeepCluster training to reach peak accuracy, the total computational cost is 2467.5mJ.
2) Cost of STDP Clustering: In the STDP based clustering approach, the computations can be summarized into the following parts: • Feedforward Step: Integrate input Poisson spike train through the synapses connecting input and excitatory layer • Learning Step: Updating the excitatory layer weights based on pre-and post-synaptic spiking activities • Inhibition Step: Updating the neuron membrane potential based on lateral inhibitory connections • Repeat T times Although multiplication symbols were used in Algo.2, computation with spike signals can always be reduced to summation operation since the spike magnitude is always '0' or '1' [7].Further, the addition operation is conditional upon the receipt of spikes, thereby reducing the computation cost by a significant margin for a highly sparse spike train.For instance, the average spiking probability per neuron per timestep in the excitatory layer of the network is only 0.19%.Hence, the total number of addition operations can be summarized as: It is worth mentioning here that we primarily focus on the computationally expensive portions of both algorithms for these calculations.In Eq. 18, the p input • |w exc | component arises from the feedforward propagation of input spikes, (p input +p exc )•|w exc | component arises from the learning step and p exc • |w inh | arises from the inhibition step.Therefore, the total computational cost for Deep-STDP per epoch is 55.34mJ and considering 50 epochs of training (iso-accuracy comparison as shown in Fig. 3), the total energy consumption is estimated to be 2767.2mJ-comparable to the DeepCluster framework.
3) System Level Cost Comparison:: We note that the STDP based framework does not change the computational load of the clustering framework significantly.However, the computational load at the system level will be also dependent on the computational load for feature extraction in the ConvNet.For instance, Ref. [3] mentions a third of the time during a forward pass is attributed to the clustering algorithm while the remaining is attributed to the deep ConvNet feature extraction.Therefore, we expect the Deep-STDP based framework to be significantly more resource efficient than the DeepCluster based approach due to 3.5× reduction in the number of training epochs -equivalently reducing the ConvNet feature extraction computational cost.

VI. CONCLUSIONS
In conclusion, we proposed an end-to-end hybrid unsupervised framework for training deep CNNs that can be potentially implemented in a neuromorphic setting.We demonstrated significant benefits in terms of accuracy and computational cost by leveraging bio-plausible clustering techniques for deep unsupervised learning of visual features and substantiated our claims by empirical analysis through statistical tools like Fisher Information and Normalized Mutual Information.Our work significantly outperforms prior attempts at scaling bio-inspired learning rules like STDP to deeper networks and complex datasets.Future work can focus on further scaling of the approach and delving deeper into the mathematical underpinnings of the superior performance of STDP as a deep clustering mechanism.

Fig. 1 :
Fig. 1: STDP learns generic features of input patterns (MNIST dataset) in the excitatory synapses of the Winner-Take-All network.Each neuron represents a cluster and its learnt weights represent the corresponding cluster centroid.

Fig. 2 :
Fig. 2: Overall structure of Deep-STDP: The ConvNet compresses the input images to a lower-dimensional feature vector which is mapped by STDP clustering to a pseudo-label.The ConvNet is subsequently trained through backpropagation using the pseudo-labels.

Fig. 3
Fig. 3 demonstrates that Deep-STDP based unsupervised feature learning significantly outperforms DeepCluster approach based on k-means clustering.The superior quality of pseudo-labels generated by Deep-STDP is also explained empirically by the FIM trace variation over the learning epochs (see Fig. 4).While both algorithms perform similarly during the initial stages, the accuracy and FIM trace start improving significantly for the Deep-STDP approach over subsequent [p input • |w exc | + (p input + p exc ) • |w exc | + p exc • |w inh |] • T • N (18)where, p input , p exc are the average (per neuron per timestep averaged over the entire training process) spiking probability of the input and excitatory neuronal layer respectively, |w exc | is the number of synaptic connections between the input and excitatory layer, either |w + | or |w − | since the input can be either positive or negative, |w inh | is the total number of inhibitory connections in the network, T is the number of timesteps used for the STDP training process, and N is the number of training samples.
1 DeepSTDP Table I parameters // See Table I for details Require: timesteps = 100 // Simulation steps Require: f eatureSet // Features to be trained on 1: Initialize w + , w − randomly 2: Initialize k neurons with resting potential V rest

TABLE I :
Hyper-parameters for STDP training epoch.Following the evaluation method proposed by Zhang et.al

TABLE II :
Evaluation Metrics Comparison In addition to training Compute the distance ||x i − c j || 2 2 from every point to every centroid and assign to k clusters • Update Step: Re-center the centroids in new clusters by averaging over |C j | for all clusters • Repeat it times To calculate the distance of a point x i from c j : • Clustering Step: