Spike sorting in the presence of stimulation artifacts: a dynamical control systems approach

Abstract Objective. Bi-directional electronic neural interfaces, capable of both electrical recording and stimulation, communicate with the nervous system to permit precise calibration of electrical inputs by capturing the evoked neural responses. However, one significant challenge is that stimulation artifacts often mask the actual neural signals. To address this issue, we introduce a novel approach that employs dynamical control systems to detect and decipher electrically evoked neural activity despite the presence of electrical artifacts. Approach. Our proposed method leverages the unique spatiotemporal patterns of neural activity and electrical artifacts to distinguish and identify individual neural spikes. We designed distinctive dynamical models for both the stimulation artifact and each neuron observed during spontaneous neural activity. We can estimate which neurons were active by analyzing the recorded voltage responses across multiple electrodes post-stimulation. This technique also allows us to exclude signals from electrodes heavily affected by stimulation artifacts, such as the stimulating electrode itself, yet still accurately differentiate between evoked spikes and electrical artifacts. Main results. We applied our method to high-density multi-electrode recordings from the primate retina in an ex vivo setup, using a grid of 512 electrodes. Through repeated electrical stimulations at varying amplitudes, we were able to construct activation curves for each neuron. The curves obtained with our method closely resembled those derived from manual spike sorting. Additionally, the stimulation thresholds we estimated strongly agreed with those determined through manual analysis, demonstrating high reliability (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^2 = 0.951$\end{document}R2=0.951 for human 1 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^2 = 0.944$\end{document}R2=0.944 for human 2). Significance. Our method can effectively separate evoked neural spikes from stimulation artifacts by exploiting the distinct spatiotemporal propagation patterns captured by a dense, large-scale multi-electrode array. This technique holds promise for future applications in real-time closed-loop stimulation systems and for managing multi-channel stimulation strategies.


Introduction
Bi-directional neural interfaces (BNIs) play an increasingly important role in neurotechnology to communicate with the nervous system using multi-electrode arrays (MEAs).These interfaces promise to revolutionize scientific discovery and clinical therapeutics through closed-loop neuromodulation [14,35,37,49,70].Specially, a BNI performs two major tasks.On the one hand, it stimulates neurons to produce targeted patterns of neural activity that are useful for the scientific or clinical application [20,33,47].On the other hand, it performs electrical recording to observe natural neural activity and to calibrate the activity evoked by the interface [17,36,42,48,59,60,66,73].
A critical challenge in recording electricallyevoked activity is that the voltage produced by injecting the current into the electrode-electrolyte impedance produces a stimulation artifact that is often large enough to obscure the evoked neural signal of interest [28].Because of the large time constants of the electrode-tissue impedance, the artifact can last for several milliseconds after stimulation [23,45] and can thus overlap in time with evoked spikes.This substantially complicates the process of identifying and segregating spikes from different cells (spike sorting).Therefore, the artifact and the neural activity of interest must be distinguished [50,56].
Several approaches have been proposed to use the temporal properties of spikes and artifacts to perform spike sorting [6,8,17,38,58,67,68].In template subtraction methods, the estimated artifacts are subtracted from the measurements to isolate neural activity [13,23,46,65] and identify spikes [40].However, obtaining templates of the artifact in isolation is not always possible [51].
However, relatively little has been done to exploit the distinct spatiotemporal propagation of electrical artifacts and spikes [45,54].Here, we propose a novel approach using dynamical control systems to model the spatiotemporal propagation of spikes and artifacts and exploit their differences to identify evoked neural activity as shown in figure 1.Specifically, we design a unique dynamical model for the stimulation artifact and for each neuron recorded during spontaneous activity.Then, to identify evoked spikes after stimulation, we estimate which combination of dynamical models (i.e. which neurons firing) were most likely to produce the recorded voltage response across all electrodes.Notably, the method does not require recordings from the stimulation electrode itself, which typically has an artifact that saturates recording electronics, enabling lower-power electronics.We demonstrate the effectiveness of the proposed approach on large-scale multi-electrode ex vivo recordings from primate retina, and compare the results to human-supervised spike sorting.

Experimental setup and data description
We analyzed voltage recordings from primate retinal ganglion cells (RGCs) during epiretinal electrical stimulation, from a single retinal preparation.Eyes were obtained from terminally anesthetized macaque monkeys (Macaca mulatta, Macaca fascicularis) used by other researchers, in accordance with Institutional Animal Care and Use Committee guidelines.Further details on the experimental preparation have been described previously [21,22,44].
The retina was isolated from the retinal pigment epithelium and placed RGC side down on a custom high-density MEA system with 512 electrodes (60 µm pitch, 8-15 µm diameter) [28,42].The retina was then stimulated with a white noise visual stimulus using a computer display and lenses while recording on each electrode simultaneously-see figure 2. Raw signals were amplified, filtered , multiplexed, digitized (20 kHz) and stored for offline analysis.A custom spike sorting procedure [42] was applied to the recordings to identify and segregate spikes from individual RGCs.The spike-triggered average (STA) stimulus was then computed for each RGC to classify functionally-distinct cell types [10,11,21,61].For each cell, the electrical image (EI), or the average spatiotemporal pattern of activity associated with a cell's spike, was computed by averaging the voltage traces during the time of each cell's spike [42,57].EIs for 25 distinct neurons recorded over 71 samples (3.55 ms) were examined (figure 3).
To characterize RGC responses to epiretinal electrical stimulation, the retina was electrically stimulated by injecting a brief current pulse (charge balanced, positive first, triphasic, 50 µs per phase in relative ratios of 2:-3:1) through one electrode at a time in a random sequence while recording on all electrodes simultaneously [21,22,33,44].Thus, 39 current amplitudes (0.1-4.1 µA on the second phase, log spacing) were applied, with each amplitude repeated 25 times.
The electrical artifact associated with electrical stimulation (figure 4) precludes the use of standard spike sorting techniques because it is temporally correlated with and occupies a similar frequency band as neuronal spikes [45].Furthermore, particularly on the stimulating electrode, the duration of the artifact is nearly 2 ms (figure 4(a)), exceeding the typical response latency of RGCs to electrical stimulation (0.4-0.6 ms) [45], complicating further the analysis of responses.To establish a human-curated set of labeled responses to electrical stimulation against which to compare the algorithm developed here, a semi-automated procedure was performed by two human observers, as described previously [33].First, recorded traces after stimulus onset (55 samples or 2.75 ms) were considered for analysis.For each cellelectrode pair, clustering was performed on the traces from the trials at a single current level.The trials were grouped into two clusters: one that elicited spikes and another that resembled artifact only.An estimate of the artifact was then calculated from this second cluster and subtracted from trials containing putative spikes.This procedure was repeated for each current amplitude.Then, the artifact-subtracted traces, along with template waveforms obtained from the EI, were visually inspected by each human observer to determine whether the artifact-subtracted data resembled  the template of the cell of interest (i.e. an electricallyelicited spike) or not (i.e.no response).Each human observer analyzed a set of 10 cell-electrode pairs from five different neurons.

Dynamical systems approach
We use dynamical systems to model the spatiotemporal evolution of the voltage recorded by the MEA in the presence of spikes from one or more neurons and in the presence of a stimulation artifact [30,32,52].

EI models
The EI model for a given neuron n is a set of equations (e.g.equation ( 1)) that describes the propagation of the voltage across all E = 512 electrodes when the neuron fires a spike.The model consists of a predefined input vector u t n ∈ R 2 that initiates the dynamical system, the state of the neuron x t n ∈ R En at each point in time defined over a subset of the electrodes relevant for the cell (E n < E, see green region in figure 3), and two matrices A n and B n that describe the evolution of x t n over time.Finally, the output y t n ∈ R E indicates the voltage contribution of neuron n to all electrodes on the array (the output is set to zero for the electrodes that are not relevant for the cell).Thus, the dynamical model of the EI can be described as where the matrix A n defines how the state of the neuron at the next time step depends on the current  state, and captures the spatiotemporal correlation between the electrodes.The matrix B n defines how the next state depends on the input, and C n defines how the output depends on the current state.
The model parameters are learned such that if the input templates r t n are injected, the model generates outputs (i.e. the voltage on each electrode) that approximately match the EI of neuron n.The input templates are two unit-magnitude pulses that trigger rising and falling phases of the spike, respectivelysee appendix A for details on modeling and parameter learning.
The input templates were determined based on the dynamics of extracellular action potentials.Specifically, we observed that the recorded spike exhibits distinct upward and downward patterns due to the opening and closing of potassium and sodium channels in the cell's membrane.To capture these dynamics, we used two heuristic-based templates, drawing upon expected input-output responses of linear systems that align with the shapes of the recorded spike-see figure 5.

Artifact model
The artifact model describes the propagation of voltage across the electrodes caused by artifacts due to electrical stimulation.Similarly to the EI model, the output y t a ∈ R E indicates the voltage recorded on all electrodes immediately after stimulation.The empirically observed artifact propagates radially outward from the stimulating electrode and decays over space.Hence, only the closest electrodes E a < E to the stimulating electrode will record the artifact and are considered as the states of the model, namely x t a ∈ R Ea at time t.Moreover, the stimulating electrode shows discontinuities with respect to the stimulus amplitude due to the design of the stimulation system (see figure 4(b)).Thus, we discard the stimulating electrode from the state variables.As with the EI model, y t a is equal to state variable x t a for electrodes E a , and zero for the other electrodes.As with the EI model, the artifact model is learned such that an approximation to the artifact is observed in the output y a if the input u t a has the template r t a ∈ R 3 (figure B5) linearly scaled by the stimulus amplitude q.Hence, for the stimulus amplitude q, the input to the model is u t a = qr t a .In summary, the dynamical model of the artifact is given by where the vector u t a ∈ R 3 is the input of the model, the B a matrix defines how the next state depends on the input of the model, the matrix A a defines how the next state depends on the current state, and the  matrix C a defines how the output depends on the current state.A a , B a and C a are obtained from the average of stimulation data across-see appendix B for details on modeling and parameter learning.Figure 6(b) demonstrates how the artifact model characterizes the propagation of the artifact on MEAs.
For designing the input templates, we fitted the recorded steady-state response during stimulation in our experimental setup.This resulted in a template with three inputs specific to our experimental setup.A similar curve-fitting exercise should be repeated for a different experimental setup, which might result in different input templates.

Aggregate model
The aggregate model describes the propagation of the voltage across electrodes in the presence of both stimulation artifact and neurons firing.This model is a linear combination of the EI model for all neurons, the artifact model and the measurement noise (superposition assumption in MEA recordings [58]).As shown in figure 7(a), the aggregate model outputs the electrode measurements for multiple input templates.Specifically, the observed outputs on the electrodes are caused by injecting templates of the different sub-models (EI and/or artifact models) in the aggregate model.Thus, the aggregate model is given by ] ⊤ is the aggregate input of the EI models, u t a is the input of the artifact model, and w t ∈ R E denotes the measurement noise.[ Ã, B, B′ , C] are defined such that the aggregate output y t satisfies y t = ∑ N n=1 y t n + y t a + w t -see appendix C for calculation of the matrices.

Input estimator
The aggregate model described in section 2.2.3 determines how the electrodes' voltage is generated based on the input sequence (figure 7(a)).Conversely, the input estimator leverages the aggregate model to infer (or, equivalently, estimate) the input sequence that likely generated the observed data (figure 7(b)).Notably, the estimated input sequence can be compared to the EI templates to perform spike detection and sorting-i.e. if u n = r n , then the neuron n is firing a spike.We propose a method to estimate the set of input templates that led to a specific measurement (see details in section 2.2.4).The input estimator requires the electrode measurements immediately after stimulation, the aggregate model, and the list of corrupted electrodes to ignore (typically, the stimulating and nearby electrodes shown in red in the grid).Since we know a priori that stimulation happened, the artifact template is automatically estimated.By inferring which inputs are active (i.e. they are estimated to be equal to the input template), we can identify which neuron(s) are firing; hence, we can perform spike sorting.
The input estimator utilizes the output data y t:t+L = [ (y t ) ⊤ , . . ., (y t+L ) ⊤ ] ⊤ within a forward window L to estimate ût of ũt , and xt of xt .The estimated EI inputs has the structure ût = [ (û t 1 ) ⊤ , . . ., (û t N ) ⊤ ] ⊤ where ût n denotes estimated inputs corresponding to EI n.Also, it uses the artifact inputs u t a as known inputs since the timing of the stimulation is known.Additionally, I indicates the indices for the corrupted electrodes that we desire to exclude from the input estimator (i.e. the stimulating electrode in our case).Therefore, the input estimation procedure can be described as where the function f(y t:t+L ,x t−1 , u t a , I) is the so-called input estimator-see appendix D for how the input estimator is designed.
To illustrate and validate the performance of the input estimator for spike sorting, we first consider an example using synthetic data for a single neuron firing, generated by feeding the template for the EI model of one cell to the aggregate model.The generated data is then passed to the input estimator to retrieve which neuron(s) fired.As expected, the estimated input template completely overlaps with the input template of the single EI model used to generate the synthetic data (figure 8(a)).Hence, the input estimator can infer from the synthetic data that the neuron was firing a spike.Section 3.3 validates the input estimator with real noisy data and section 3.4 validates the spike sorting method based on the input estimator with real data contaminated by the stimulation artifact.
The proposed methodology can also detect neural activation while multiple neurons are firing.Figure 8(b) shows the performance of the input estimator on synthetic data.Similar to the results in figure 8(a), the input estimator retrieves the inputs without error for both single and multiple neurons' activation.To illustrate the sensitivity of input estimation to the electrodes' noise, we added normal noise to the synthetic data with different standard deviations.As presented in figure 8(b), it is possible to detect three neurons simultaneously despite the fact that higher levels of noise are expected in comparison with single-neuron detection.However, the error also depends on the arrangement of neurons.In particular, the further apart the neurons are, the easier their activation can be detected under possible stimulation artifacts.To perform spike sorting, the input sequence estimated by the input estimator is compared to the input template for each neuron using a similarity function.If its value exceeds a predefined threshold, then we infer that a spike occurred for that specific neuron-see details in appendix G. Figure 8(b) shows that the proposed method is robust The simulation uses synthetic data generated by injecting input templates for one, two, and three neurons (out of 25 total neurons) into the aggregate model.The synthetic data is fed to the input estimator, and the RMSE shows how much the estimated inputs deviate from their templates.The simulation is repeated for different measurement noise levels (normal noise with various standard deviations added to synthetic data).As expected, the input estimator does not introduce errors in the noiseless case, even for multiple neurons firing (similar to (a)).As the measurement noise increases, the RMSE increases.Besides, the results suggest that the input retrieval for multiple neurons is more vulnerable to noise than the single-neuron case.
enough to perform spike sorting when multiple neurons fire simultaneously.

Results
This section shows the performance of EI and artifact models, and the input estimator.First, we provide evidence that the EI models are suitable for representing real data accurately.Next, we show the performance of the artifact model for different stimulus amplitudes.We then provide evidence of the input estimator's ability to predict the correct input sequence with spontaneous activity (i.e.without stimulation artifact) and evoked activity (i.e. with stimulation artifact).Finally, we demonstrate the effectiveness of the proposed spike sorting for different stimulation amplitudes and compare the results to humansupervised spike sorting.

EI models
To analyze the performance of the EI models, we compare the output of the EI model (i.e. the predicted voltage for all electrodes) with the real measurements of the EI for all neurons.To quantify the quality of the proposed EI models, we use spike normalized root mean square error (SNRMSE): the root mean square of the difference between the real output and estimated output divided by the root mean square of the EI signal-see appendix E. Figure 9 shows the prediction results for the neurons with the lowest (9(b)) and the highest (9(a)) SNRMSE.
The generated output from the EI models follows the EI original data for the different electrodes.Notice that the electrode near the soma shows a lower SNRMSE than the other electrodes along the axon.The higher error for the neuron in figure 9(b) may be caused by the location of the corresponding neuron at the edge of the MEA where there are fewer electrodes to capture its dynamical evolution.The right-skewed SNRMSE distribution across all neurons (figure 10) provides evidence that most EI models incur SNRMSE less than the average.

Artifact model
To analyze the performance of the artifact model, we compare the model output to the real measurements for the electrodes close to the stimulating electrode (except for the stimulating electrode itself, which is discarded from our model) for different stimulus amplitudes. Figure 11 shows how the output generated by the simulated model follows the real measurement for two electrodes.At low stimulation amplitude (figure 11(a)), the model follows the average stimulation data for the electrodes.At high stimulation amplitude (figure 11(b)), the model exhibits higher error in a specific interval, possibly due to the presence of an evoked spike superimposed in the recording.This possibility is consistent with the fact that higher stimulation amplitude usually leads to a higher probability of neuron activation.
To quantify the performance of the model, we used artifact normalized root mean square error (ANRMSE): the root mean square of the difference between the real output and estimated output divided by the root mean square of the artifact signal-see appendix F. Figure 12 shows the ANRMSE as a function of the stimulation amplitude.The ANRMSE is highest for low stimulation amplitudes, when the artifact is small, and the output is dominated by noise figure 12.However, ANRMSE is reduced for higher stimulation amplitudes since the artifact becomes the dominant contributor to the output, showing that the model is capable of predicting the artifact shape.

Input estimator
The input estimator can retrieve the input sequences based on the electrodes' measurements.By comparing the input sequence to the input templates for each neuron, we can retrieve which neuron fired a spike, i.e. perform spike sorting.In section 2.2.4,we showed that the input estimator can retrieve the correct input sequence from synthetic data generated by the aggregate model (figure 8).Here, we show that the input estimator can still recover the correct input sequence from real EI data (figure 13).Notice that this analysis is for measurements without a stimulation artifact.In particular, when analyzing data from EI 1 (i.e.data where only neuron 1 is firing a spike), the input estimator retrieves the input for neuron 1 with a similar shape to its predefined template, suggesting that neuron 1 is firing a spike.In contrast, the estimated input for EI 25 is zero, indicating that neuron 25 is not firing a spike.Additionally, when analyzing data from EI 25 (i.e.data where only neuron 25 is firing  Now, let us consider measurements immediately after stimulation that contain a stimulation artifact and can contain one or more spikes (figure 14).In this example, the stimulation electrode is near neuron 25 and farther from neuron 1.Hence, it is expected that electrical stimulation is more likely to elicit a spike in neuron 25 than in neuron 1.At low stimulation amplitude, neither neuron fires a spike.At high stimulation amplitudes, however, the retrieved input sequence for neuron 25 suggests that it fired a spike, while still no spike was retrieved for neuron 1.

Spike sorting with stimulation artifact
Here, we perform spike sorting as described in section 2.2.4-see further details in appendix G.By extending this procedure to all stimulation amplitudes over multiple trials, we obtain the activation curve for each neuron-electrode pair: the probability of a neuron firing as a function of the stimulus amplitude applied on a given electrode.
Figure 15 shows an example of the activation curve of neurons 25 and 1 when stimulating using an electrode close to neuron 25 for both the proposed and human spike sorting.The proposed method results in a similar activation threshold to the human result for neuron 25 (defined as the amplitude for which the spiking probability is 50%) and no activation for neuron 1.The threshold is extracted by fitting a sigmoid curve to the scatter plot.Although the proposed method finds approximately the same threshold as human spike sorting, it presents certain differences in the sigmoid fit: (1) non-zero probability for low amplitudes, (2) higher variability, and (3) shallower curvature.
As explained in section 3.2, the artifact model is based on the average of all the trials in the stimulation data.To confirm that this is not a problem in the validation of the algorithm, we performed a K-fold analysis (K = 5) to validate the algorithm using data unseen during training of the model (see K-fold curve in figure 15(a)).
To show an overview of the performance of the proposed spike sorting, we analyzed 10 different datasets in which there are 5 neurons stimulated by 10 different stimulating electrodes (10 neuron-electrode pairs).Figure 16 shows the activation thresholds obtained by humans and the proposed method.The stimulation thresholds estimated using the approach closely tracked those obtained with two sets of manual analyses (R 2 = 0.951 for    human 1 and R 2 = 0.944 for human 2).For reference, the R 2 between human 1 and human 2 is 0.998.

Spatio-temporal spike sorting
This paper proposes to exploit the different spatiotemporal characteristics of spikes and electrical stimulation artifact (figures A3 and 6(b)) to identify electrically evoked spikes in neural recordings.Our work contrasts with previous work that only exploits the different latency and amplitude of spikes and artifacts, failing to leverage the distinct spatiotemporal progression captured by high-density large-scale MEAs [23,67].We provided evidence that these spatiotemporal characteristics can be effectively modeled using the dynamical systems in (1) and ( 2), and that an input estimator can be used to retrieve the correct input sequence from MEA recordings.The activation thresholds obtained for 10 electrode-neuron pairs were similar to those obtained with human spike sorting (figure 16).

Exploiting redundancy in high-density recordings
The proposed method can also exploit the redundancy in high-density large-scale MEA recordings to remove data from contaminated electrodes while still recovering the spatiotemporal characteristics needed to perform spike sorting.The method made it possible to ignore the stimulating electrode because it presents the largest artifact and instead exploits information available on other electrodes.This approach can increase the power and area efficiency of future implementations since the analog front-end does not need to record the large artifact.Furthermore, our approach overcomes the need for modeling the artifact in the stimulating electrode, which is a non-trivial task and depends on the stimulation setup.For instance, [45] needs a different model for the stimulating electrode because it presents discontinuities and is dramatically different than the other electrodes.
Since the proposed dynamical model has more outputs than inputs, the inputs could be retrieved even in the absence of one or more electrodes.Hence, this approach could generalize to different scenarios where more than one electrode is contaminated.

Caveats 4.2.1. Model selection
Linear models offer computational efficiency, making them appropriate for large datasets and real-time applications.Their simplicity allows for straightforward interpretation, aiding in understanding variable relationships.They also serve as a useful baseline for comparing more complex models.Additionally, linear models demonstrate robustness even when multiple models are considered, thanks to applying the superposition principle in underlying linear dynamical systems.However, they assume linear relationships, which may not always align with true system dynamics, potentially leading to inaccuracies with nonlinear data and limited performance [7].
In future research, exploring nonlinear models holds promise for capturing intricate dynamics that are not captured by linear analysis.However, working with nonlinear models may demand more resources and advanced analytical techniques for interpretation.Therefore, balancing increased complexity with potential gains in explanatory power will be crucial.

Model requirements and limitations
The proposed approach relies on predefined input templates that are used to initiate the dynamical system and generate an output that resembles the EI of a given neuron, figure A2.Additionally, the artifact model requires input templates that fit our specific hardware system, figure B5.These input templates can be flexibly designed and adapted to match the characteristics of a specific hardware system.
We have shown that the EI models can faithfully represent the EI recordings with small errors (figure 10).However, for the neurons located on the edges of the MEA, the model is less accurate.This is likely due to incomplete information on the EI spatiotemporal propagation, which generates discontinuities in the MEA recordings.

Computational efficiency
The low computational complexity for spike sorting is important in real-time and closed-loop applications [69].For instance, the work proposed in [19,45,55,58,75] use computationally complex matching pursuit methods for spike sorting, where whole recordings are compared to the spiking templates of each cell.In contrast, our approach only exploits a short time window of recordings to recover input templates via the input estimator.Hence, the input estimator functions as a computationally efficient filter over the recordings similar to the Wiener filters in [51,65].However, further research is required to adapt this method to a real-time hardware-friendly implementation.

Background noise
Our approach relies on the availability of EIs for each neuron obtained by the spike sorting method in [58] in the absence of stimulation.This paper does not focus on low signal-to-noise ratio (SNR) data because EIs with a peak amplitude less than 30 µV are discarded.Thus, the discarded spikes are present in the background noise and may worsen the performance of the approach [63].At the same time, the background noise is colored with spatiotemporal dependencies [16,74], and the methods in [18,58] can estimate its covariance.Therefore, it provides a chance to research the correlation in the background noise that may be useful in reducing the range of SNR that can be accommodated by the algorithm.
Another issue for the proposed algorithm is bundle activation, which is the result of stimulating a bundle of axons belonging to cells whose somas are outside the region covered by the MEA [22].These spikes are also part of the background noise and may worsen the performance of the proposed method.Furthermore, if the goal is to precisely stimulate neurons at single cell and cell type resolution, then bundle activation needs to be avoided.This can be done by combining the proposed method with the method developed in [71] that detects bundle activation based on its spatiotemporal propagation characteristics.

Artifacts
In modeling the artifact, a crucial concern is that the artifact signal is unsupervised and unknown beforehand.The stimulation data consists the evoked neural activity, the artifact, and the background noise.Thus, we have to recover artifacts from the stimulation data to fit the artifact model.To remove the effect of noise and spontaneous neural activity, we used the average of the stimulation data from different trials similar to [45].This works well for low-amplitude stimulation because it does not systematically lead to evoked spikes.However, as shown in figure 15(a), stimulation with high amplitudes leads to a higher probability of neuron activation.Hence, the artifact cannot be estimated by averaging for high stimulation amplitude.In literature, to model the artifact, many approaches rely on assumptions on artifact timing, lack of saturation, linearity, and decline with distance from the stimulating electrode [54,72,77].Here, we simplify this task by proposing a method that does not need to model the artifact in the stimulating electrode where it is most significant and can be nonlinear.

Hyperparameters
The models, the input estimator, and the spike detector all require hyperparameters.For the models, we experimentally fitted the hyperparameters to minimize the SNMRSE and AMRSE.For the input estimator, we tuned the hyperparameters based on the performance when estimating synthetic and real data (figures 8 and 13).For the spike detector, we relied on human annotations for adjusting hyperparameters such as detection thresholds.Hence, we require multiple human-annotated labels, which are costly and time-consuming [3,45].

Towards closed-loop multi-channel stimulation
We conjecture that spatiotemporal spike sorting is suitable to enable real-time stimulation and recording at single-cell resolution in closed-loop applications.Towards this goal, future work should focus on automating the proposed method (e.g. for the hyperparameter tuning) and designing a real-time hardware implementation.
In addition, we believe that the proposed framework can cope with multi-channel stimulation scenarios.This could have a large impact in basic neuroscience and clinical applications since multi-channel stimulation enables an effective strategy to spatially control the spiking of multiple neurons [34,39].To address the multi-channel stimulation scenario, the proposed method could factor in multiple artifact models in the aggregate model and retrieve the input templates of EIs in the presence of different artifacts simultaneously.
A G would like to acknowledge that this research was supported by the National Institutes of Health (NIH) National Institute of Mental Health Grant T32-MH-020016, NIH National Eye Institute (NEI) Grant F31-EY-033636, the Foundation Bertarelli, and the Stanford Neurosciences Graduate Program.A L would like to acknowledge that this publication is part of the Neural Systems Research Consortium, University of California, Santa Cruz (AML).P H would like to acknowledge that this publication is part of the Polish National Science Centre Grant DEC-2013/10/M/NZ4/00268. E J C would like to acknowledge that this publication is partially supported by the Wu Tsai Neurosciences Institute and NIH grant EY032900.D M would like to acknowledge that this publication is part of the project OCENW.XS22.1.007 of the research program Open Competition Science XS, which is financed by the Dutch Research Council (NWO).

Software routines
To maintain transparency and facilitate reproducibility in line with the FAIR guiding principles, we provide full access to the software routines used to generate the empirical results presented in this paper.
The code repository, hosted on GitHub, can be found at the following link: https://github.com/mohammadshokriacct/electrical_spike_sorting.Our routines are written in Python, leveraging the robust functionalities of TensorFlow and Keras libraries, which are well-suited for complex computational tasks such as those performed in our study.

Appendix A. Electrical image model
The electrical image (EI) of a neuron is the average time-series recording across all electrodes when the neuron is firing a spike.Here, we record from E = 512 electrodes for T = 71 time samples corresponding to 3.55 ms (sampling rate equal to 20 kS s −1 ) for each EI, and we collect EIs from N = 25 distinct neurons.To represent the EI for neuron n, we use the following dynamical system: (A1)

A.1. Input, states and output
Here, we define the EI for neuron n as the EI model output {y t n } T t=1 ∈ R E .The E n < E electrodes that exhibit a maximum absolute value larger than a pre-defined state-threshold θ S n > 0 are selected as the EI model state variables x t n ∈ R En (green region in figure A1).The input to the model is u t n ∈ R 2 .the input u t n is identical to the input template r t n ∈ R 2 (figure A2), the output is the EI of neuron n.The input template is defined by four parameters pertaining to the starting and ending points of the two pulses.These parameters are tuned to minimize the mean square error between the model output and the original EI.

A.2. Matrices A n and B n
A n determines how each state (i.e.selected electrode) influences the future states, including itself.Here, we assume that a given electrode is unlikely to influence further away electrodes due to the slow spatiotemporal progression of the neural signal across the MEA [1,5,29,53,72].Hence, if d ij is the distance between electrode i and j, the element of A n on the row i and column j is set to zero, providing that d ij > θ A n (predefined threshold).
B n determines how the input influences the future states.The input is assumed to influence only the electrodes close to the soma of the neuron.The electrode i ⋆ that records the maximum amplitude in the EI is considered the base electrode.If d i ⋆ j is the distance between the base electrode and the electrode j, the elements of B n on the column j are set to zero, providing that d i ⋆ j > θ B n (predefined threshold).To find A n and B n , we perform system identification using the EI data as states and the input templates as inputs [30,43,52,76].To obtain the parameters, we minimize the error of the dynamical equation for all data points.Notice that this minimization is performed subject to the structure of the matrices, where some elements of A n and B n are set to zero based on the thresholds θ A n and θ B n .Figure A3 illustrates the effect of A n and B n .

A.3. Matrix C n
Here, the output corresponds to the state for the selected electrodes and is approximated to zero for the other electrodes.The matrix C n is an identity matrix for the columns corresponding to the selected electrodes and zero for the columns corresponding to the non-selected electrodes.
The procedure for the EI model is described in algorithm 1.First, we determine the state variables of the dynamical system.Then, we determine the structure of the matrices of the model.Finally, based on the structure of the model, we fit the matrices using the EI data for the states.

Appendix B. Artifact modeling
Stimulation data is a time-series {y t,q k } T t=1 ∈ R E at the stimulation amplitude q ∈ Q for trials k = 1, . . ., K, where K = 20.Q represents the set of 39 different stimulation amplitudes from 0.11 µA to 3.71 µA.To represent the stimulation artifact, we use the following dynamical system: (A2)

B.1. Input, states, and output
To model the artifact, we use the average time-series data across different trials as the output of the system {y t,q } T t=1 = 1 K ∑ K k=1 y t,q k .The average signal is used to mitigate the effect of spontaneous, ubiquitous spikes on the MEA.Here, we select the electrodes close to the stimulation electrode as the states of the model (see the green region in figure B4).Specifically, the states are the electrodes at a distance less than the predefined threshold θ S a that belong to the set E S a with E a members.Hence, x t a ∈ R Ea stands for the states of the artifact model at time t.Notably, the stimulating electrode is discarded from the states because of its nonlinear behavior at different amplitudes.The input to the model is u t a ∈ R 3 .When the input u t a is identical to the input template r t a ∈ R 3 (figure B5), the output is the artifact.The input template is defined by six parameters pertaining to the starting and ending points of the three pulses.These parameters are tuned to minimize the mean square error between the model output and the artifact for the lowest amplitude.For the other amplitudes, the input templates are scaled by the stimulation amplitude.

B.2. Matrices A a , B a and C a
The matrices for the artifact model are obtained similarly to the ones for the EI model.Distance  thresholds θ A a and θ B a are set to define the effect of the current states and inputs on the future states (figure B6).Matrices A a and B a are designed to minimize the error of the output for different stimulation amplitudes using system identification.
The procedure for the artifact model is detailed in algorithm 2. First, we determine the electrodes to be used as states.Then, we determine the structure of the matrices A a and B a .In the end, the matrices of the model are fitted with the state data.
Algorithm 2. System identification for artifact modeling.

Algorithm's inputs
The set of stimuli amplitudes Q Average of all trials for stimulated data {y t,q } T t=1 ∈ R E at all stimuli amplitude q ∈ Q Artifact input template {r t a } T t=1 ∈ R

Appendix C. Aggregate model
The aggregate model describes the propagation of the voltage across electrodes in the presence of both stimulation artifact and neurons firing.This model is a linear combination of the EI model for all neurons, the artifact model and the measurement noise (superposition assumption in MEA recordings [58]).The model output is assumed to be the sum of the output of all EI models, the artifact model and noise (w), i.e. y t = ∑ N n=1 y t n + y t a + w t at time t.We can reformulate the final output as follows: ] ⊤ is the aggregate state.To describe how the aggregate state evolves, we aggregate the dynamics of each model into the following model: ) ] ⊤ is the aggregate input for all neurons, and the matrices are calculated as , and

Appendix D. Input estimator
The input estimator aims to estimate the input sequence and the states that likely generated the electrode recordings using the proposed aggregate model.
, and If data from y t:t+L , xt , and u t:t+L a is available, we can estimate ũt:t+L from (A6) and obtain the state estimation for the next time based on the aggregate model.In [9], this problem is addressed by using an unknown input observer (UIO) that aims to estimate the states when the input signal is unknown.The UIO estimates the next state as follows: where z t+1 is the unconstrained estimation of the states at time t + 1, and E t , F t , and F t a are the filters on the available data for the last estimation, the output sequence, and the artifact input (the derivation of these filters is explained later).Note that the inputs to the artifact model are available, and the inputs to the EI models are the unknown inputs in our setting.Since each EI input is in the interval [0, 1], the unconstrained estimation of the states z t+1 from (A8) may be wrong.Therefore, we obtain the estimated input ût from the following optimization: where N is the number of neurons.The optimization (A9) enforces that the estimated input is the nearest input in the interval [0, 1] that leads to the unconstrained estimated state.From ût , the estimated state at time t + 1 can be calculated as: Algorithm 3 shows the procedure of state and input estimation from the output data.In the algorithm, we considered that we have the matrices E t , F t , and F t a from the UIO design (computed by the function G(I, t, t + L)).I stands for the sets of unavailable measurements.where e t x = xt − xt is the estimation error, and e t+1 z = xt+1 − z t+1 is the unconstrained estimation error.In order to force the error to go to zero regardless of the state and input, the following equations should hold: , and (A12) If (A12) holds, we conclude that E{e t+1 z } = E t E{e t x } for estimation errors according to (A11).To obtain the matrices that satisfy (A12), we follow the procedure of [9] for the fully-observed situation where the data from all electrodes is available.In this scenario, the matrices E t , F t , and F t a can be calculated regardless of time because the other matrices in the equation (A12) are time-invariant.In this work, we first calculate the parameters of the UIO for a fully-observed situation where the matrices are timeindependent (E * , F * and F * a denote the solution), and then, we extend it to a partially-observed situation where some electrodes are excluded.
To solve (A12), the authors in [9] propose a method where the matrices M, S 1 , and S 2 are calculated such that , and where I 2N is the identity matrix with 2N dimension.Hence, the solution of (A12) can be calculated as M, and where X is the degree of freedom for the solution.To find X, we solve the following optimization problem:

Appendix G. Spike sorting
To detect spikes from the estimated inputs, we propose some indices that show the similarity of the estimated inputs and their templates.Since this similarity may happen at any time, we compare the estimated input signal and input templates for T sample interval (for our case, K = 15).To do so, we used the squared error (SE) as a similarity function as follows: where SE t n for neuron n t.The smaller SE t , the higher the similarity between estimated inputs and the templates.the threshold θ SE is considered for the spike detection such that the satisfaction of SE t n < θ SE leads to spike presence in the estimated input.

Figure 1 .
Figure 1.Eye-bird view of spike sorting in the presence of stimulation artifacts.A charge-balanced tri-phasic waveform (indicated in red) is used by the stimulator in this paper.The neural interface records the voltage immediately after the stimulation ends.The recordings include the residual artifact(s), and evoked and/or spontaneous spikes (MEA regions of interest are shown in red and blue, respectively).Post-processing is needed to analyze the recordings and separate the effect of different phenomena (spikes and artifacts).

Figure 2 .
Figure2.Experimental setting illustration.The retina was isolated from the retinal pigment epithelium and placed RGC side down on a custom high-density multi-electrode array system with 512 electrodes (60 µm pitch, 8-15 µm diameter).The retina was then stimulated with a white noise visual stimulus using a computer display and lenses while recording on each electrode simultaneously.

Figure 3 .
Figure 3. EI data for neuron 18: (left) time-series for three different electrodes, (right) MEA heatmap with the neuron depicted in shaded blue and green contours for the relevant electrodes.The heatmap shows the footprint of the neuron on the MEA; the color encodes the root mean square (RMS) of each electrode's recordings when a spike from the putative neuron is present.

Figure 4 .
Figure 4. Recordings of a stimulating electrode and two non-stimulating electrodes for different stimuli amplitudes (color-coded).

Figure 5 .
Figure 5. Input templates and their effect on the EI model.(a) The input template includes two pulses.(b) The red and blue circles show the effect of the input templates on the electrodes, as defined by matrix Bn.The arrows show the spatial propagation of the model across electrodes, as defined by matrix An.

Figure 6 .
Figure 6.Input templates and their effect on the artifact model.(a) The input template includes three pulses.(b) The red and blue circles show the effect of the template inputs on the electrodes, as defined by matrix Bn.The arrows show the spatial propagation of the model across electrodes, as defined by matrix An.

Figure 7 .
Figure 7. Overview of the proposed spike sorting using dynamical systems and input estimator.(a) We assume that the measured output is the superposition of the output of dynamical models for different EIs and artifacts as driven by predefined input templates.The combination of the EI models and artifact models is the aggregate model.(b) We propose a method to estimate the set of input templates that led to a specific measurement (see details in section 2.2.4).The input estimator requires the electrode measurements immediately after stimulation, the aggregate model, and the list of corrupted electrodes to ignore (typically, the stimulating and nearby electrodes shown in red in the grid).Since we know a priori that stimulation happened, the artifact template is automatically estimated.By inferring which inputs are active (i.e. they are estimated to be equal to the input template), we can identify which neuron(s) are firing; hence, we can perform spike sorting.

Figure 8 .
Figure 8.(a) Performance of the input estimator when analyzing synthetic data.The input templates of EI 1 are injected into the aggregate model to generate the synthetic data.The input estimator retrieves inputs that completely overlap with the input templates used to generate the synthetic data (shown on the right).(b) Root mean square error (RMSE) between the input templates and their estimations (top) and spike sorting accuracy (bottom) when one, two, and three neurons fire simultaneously.The simulation uses synthetic data generated by injecting input templates for one, two, and three neurons (out of 25 total neurons) into the aggregate model.The synthetic data is fed to the input estimator, and the RMSE shows how much the estimated inputs deviate from their templates.The simulation is repeated for different measurement noise levels (normal noise with various standard deviations added to synthetic data).As expected, the input estimator does not introduce errors in the noiseless case, even for multiple neurons firing (similar to (a)).As the measurement noise increases, the RMSE increases.Besides, the results suggest that the input retrieval for multiple neurons is more vulnerable to noise than the single-neuron case.

Figure 9 .
Figure 9. Prediction results from the best and worst performing EI models for three different electrodes.The spatial map of the EI is shown on the MEA heatmap.

Figure 10 .
Figure 10.SNRMSE distribution for all EI models.The MEA heatmap shows the neurons' location based on their SNRMSE.The average SNRMSE is depicted with a blue dashed line.

Figure 11 .
Figure 11.Prediction results of the artifact model for two electrodes at low and high amplitudes.The blue circle indicates the position of the stimulating electrode.The model output is compared with the average stimulation data across all trials.For the high amplitude cases, a putative elicited spike is shown in a red dashed line.

Figure 12 .
Figure 12.ANRMSE of the artifact model as a function of the stimulation amplitude.

Figure 13 .
Figure 13.Comparison between retrieved inputs and input templates when ex vivo recordings from EI 1 and EI 25 are fed to the input estimator.

Figure 14 .
Figure 14.Retrieved inputs of neuron 25 and neuron 1 when ex vivo stimulation data is fed to the input estimator at low amplitude (0.45 µA) and high amplitude (3.71 µA).Input templates for each neuron are shown in dashed lines.Neuron 25 is located behind the stimulating electrode (indicated by a circle), and neuron 1 is further away.Using a similarity function, a spike is detected only for neuron 25 at high stimulation amplitude (bottom-right panel).

Figure 15 .
Figure 15.Probability of activation as a function of the stimulation amplitude for neuron 25 and neuron 1.The stimulation threshold (shown with a vertical dashed line) is defined as the amplitude for which the probability of activation is 50% using a sigmoid fit of the scatter plot.The spatial map of the neurons is shown in the MEA, and a blue circle indicates the position of the stimulating electrode.Results are based on the ex vivo dataset described in section 2.1.

Figure 16 .
Figure 16.Comparison of activation thresholds between human spike sorting and the proposed method for two sets of manual analyses.The dashed line shows the identity line.Results are based on the ex vivo dataset described in section 2.1.

Figure A1 .
Figure A1.Recorded and predicted EI for neuron 11.The heatmap shows the footprint of the neuron on the MEA; the color encodes the root mean square (RMS) of each electrode's recordings when a spike from the putative neuron is present.The green region encloses the electrodes selected as states in the model.

Figure A2 .
Figure A2.Input templates r t11 for neuron 11.The blue and red graphs correspond to the first and second entries of the vector r t n , respectively.

Figure A3 .
Figure A3.Modeled dynamics of neuron 11.The shaded red and blue colors on the electrodes show the absolute values of Bn, which determines how the input template influences the dynamics of the model.The arrows show the spatial propagation of the model across electrodes, as defined by matrix An.

Figure B4 .
Figure B4.The heat-map of the artifact on the MEA.The blue circle indicates the stimulating electrode, and the green circle encloses the electrodes considered as part of the state.

Figure B5 .
Figure B5.Input templates r t a of artifact.The blue, red, and green graphs are the first, second, and third entries of the vector r t a , respectively.

Figure B6 .
Figure B6.Dynamics of the artifact on MEAs.The arrows show the direction in which each electrode's measured activity is propagated according to Aax.The shaded colors on the electrodes show the absolute values of Ba, which determines how three input templates of the EI model change the activity of the electrodes.

Algorithm 1 .
System identification for modeling of EI n.
Ea×3where b ij = 0 for j = 1, 2, 3 andelectrodes i ∈ E S a such that d i i ⋆ > θ B a B structure: B ∈ R 2subject to the matrices' structures Set Ca = I E , and remove the columns i / ∈ E S a of Ca such that Ca ∈ R E×Ea [41]estimated inputs are from different models and allow us to differentiate which models (i.e.neurons) contribute to the measurement.The structure of the input estimator is depicted in figure7(b).Here, the input estimator leverages the aggregate model to estimate the input sequence and the states of the system using a window of size L of the output data (i.e.electrode recordings).Based on the aggregate model, we can define the input-to-output relationship as follows: ⊤ is the measurement noise within a forward window L[41].The matrices in (A6) are computed as ⊤ , . . ., (y t+L ) ⊤ ] ⊤ is the output data, ũt:t+L = [ (ũ t ) ⊤ , . . ., (ũ t+L ) ⊤ ] ⊤ , . . ., (w t+L ) ⊤ ]

Algorithm 3 .
Procedure of input estimation.2N ||z − Ãx t Bu − B ′ u t a || 2 = Ãx t + t + B u 1]xt + F t y + F t a t:t+L ût = argmin u∈[0,1] t a as the prediction of the artifact model.Therefore, ANRMSE is calculated as follows: where Y n is a diagonal matrix where its diagonal element is equal to 1, except for the elements regarding the states of neuron n from Ã pattern, and ||.|| F is the Frobenius norm of matrices.Also, y t n is the EI data of neuron n, and α is a positive us define z