Non-linear adaptive control inspired by neuromuscular systems

Current paradigms for neuromorphic computing focus on internal computing mechanisms, for instance using spiking-neuron models. In this study, we propose to exploit what is known about neuro-mechanical control, exploiting the mechanisms of neural ensembles and recruitment, combined with the use of second-order overdamped impulse responses corresponding to the mechanical twitches of muscle-fiber groups. Such systems may be used for controlling any analog process, by realizing three aspects: Timing, output quantity representation and wave-shape approximation. We present an electronic based model implementing a single motor unit for twitch generation. Such units can be used to construct random ensembles, separately for an agonist and antagonist ‘muscle’. Adaptivity is realized by assuming a multi-state memristive system for determining time constants in the circuit. Using SPICE-based simulations, several control tasks were implemented which involved timing, amplitude and wave shape: The inverted pendulum task, the ‘whack-a-mole’ task and a handwriting simulation. The proposed model can be used for both electric-to-electronic as well as electric-to-mechanical tasks. In particular, the ensemble-based approach and local adaptivity may be of use in future multi-fiber polymer or multi-actuator pneumatic artificial muscles, allowing for robust control under varying conditions and fatigue, as is the case in biological muscles.


Introduction
The current successes in deep learning neural networks (Sejnowski 2018) also pose new challenges. Whereas the carbon footprint of a training procedure in artificial neural networks is very high, the required computations are far from efficient on a von Neumann computer architecture. Large vectors containing input and output patterns need to be copied between storage and processor modules. However, the essential operation is a mere matrix-vector multiplication, once the data sits present. The computation of a weighted sum and a resulting final non-linear operation, however, can be realized by patterns streaming through a neuromorphic material, at a fraction of the energy dissipation. The biological brain is an example of such a complex material. However, there are more examples of impressive, analog processing than the neural computation within the neocortex of the brain itself. A large component of neural activity is concerned with the non-linear, analog control of movement. Biological evolution in vertebrates has lead to a versatile system for neuromechanical control (i.e. the muscular system) consisting of contractile fibers and the associated spinal and cerebellar control mechanisms. In this study, we will propose an electronic system for the control of complex analog systems that is inspired by a number of essential properties of its biological counterpart. The basic tenet is not so much in realizing a final mechanical pathway but in proposing possible mechanisms for control, which potentially can be reused at several electrical as well as soft-mechanical levels in a neuromorphic control system performing an analog control task.

Related work in biologically inspired movement control in robotics
In recent literature, there is an increased interest in biologically inspired methods for movement control (DeWolf 2021). Apart from the minimization of energy requirements, biological systems provide highly effective examples of adaptive and faulttolerant control under heterogeneous external conditions and in the face of internal resource depletion. The biological inspiration has an influence on several levels in the general movement-control architecture. In DeWolf et al (2016), a spiking neuron model of the motor cortices and cerebellum of the motor control system for a three-link simulated planar arm is presented. The recurrent error-driven adaptive control hierarchy model addresses the overall control architecture and the spike-based learning process. However, the model does not explicitly incorporate the essential electro-biomechanical nature of the end effector, the muscular system. Spike-based control also has attractive properties for the adaptive control of, e.g. a robotic arm mounted on a wheel chair (Ehrlich et al 2022) in robotic support of activities of daily living. In this study, Intel's Loihi chip (Davies et al 2018) was used. Although it is spike based and interesting because of its adaptive properties, the general approach heavily relies on direct control of electric motors per joint. We believe it is important to explore the possibilities for ballistic, twitch-based control as is the case in both biological muscles and would be required for new contractile materials (Mirfakhrai et al 2007, Leng et al 2021 which allow for the exploitation of intrinsic compliance. The intrinsic stiffness of traditional joint-angle control by electric motors has an advantage in industrial applications. However, the downside is that compliant control (Liang et al 2014, Li et al 2017 for service cobots must be computed in detail, centrally. New actuator technologies will require innovative control schemes. In Abadía et al (2021), the important aspect of movement control and planning in systems with intrinsic delays is addressed. Such time delays play an important role in sensorimotor anticipation and in learning. The paradigm concerns force control of a teleoperated Baxter robot arm, where safety requires an adequate handling of delays. This research (Abadía et al 2021) mainly addresses central control mechanisms. We will also address the issue of time delay in the current study, which is aimed more at peripheral processes. These (DeWolf et al 2016, Abadía et al 2021, Ehrlich et al 2022 and other studies (Ehrlich and Tsur 2021) would be suitable in delivering a central control architecture that acts as a front end for the peripheral biomimetic control proposed in the current study. It should be noted that peripheral end-effector control is not the only possible application domain for the work proposed here. An additional goal of this study is to provide an interface from short-lasting spike events to internal state responses with a prolonged temporal effect, as occurs in the biological system, when a discrete spike induces a smooth gradient in the state of the receiving system. We expect that there are many application possibilities for adaptive, neuromorphic approximation of time functions. In the next sections we will explore properties of the biological motor control system.

Ensemble models in biological motor control
Motor control in vertebrates is characterized by an architecture where spiking control signals from a 'higher' processing level in the central nervous system, i.e. the primary motor cortex, are sent downward (Betts et al 2013) (figure 1). Passing through the spinal cord, the terminal connections end up within a vertebra of the spine, in the ventral horn, where a pool of alpha motoneurons receives the excitation signals from the upper control level. At this point, the motoneuron pool is then activated, leading to (a) recruitment (Liddell and Sherrington 1925) of silent neurons when their activation threshold is exceeded and (b) leading to a modulation of the spiking frequency of neurons that are already active (Kanosue et al 1979, Van Boxtel andSchomaker 1983). These two mechanisms allow for coarse, recruitment-based force control as well as fine regulation of required intermediate-force levels by means of altering the firing rate. The recruitment process is organized according to the 'size principle' (Henneman 1957), with smaller motor neurons connected to small fibers have a low threshold and are resistant to fatigue, whereas large motor neurons have a high threshold, activating high-force, fast-twitch (FT), but fatigable muscle fibers. The force jump that would be caused by a discrete recruitment step of a strong motor unit (MU) can be smoothed, i.e. filled in, by spike, i.e. twitch trains of increasing frequency, from other MUs, yielding smooth control that is realized by means of a discrete-event and discrete-force system. In order to obtain stable and predictable results, the system is also equipped with a negative feedback system, measuring the actual mechanical effect of a motor command in terms of muscle-fiber length variations (Hulliger 1984) as well as the force exerted on muscle tendons (Houk and Henneman 1967). In this way, the system is able to generate the required force, acceleration, speed and positional control patterns, even in the face of external disturbances and resource depletion in the muscle fibers (Sahlin 1986, Prochazka 1996, Schlink et al 2021. In short: The biological neural system has solved the problem of continuous, analog control where the basic actions are discrete short-lasting spike events; It has solved the problem of activating groups of contractile elements while avoiding step-wise jumps in the delivered output force; and Figure 1. Schematic overview of the biological system for motor control. Details are left out and anatomy is not to scale. Spikes from upper motor neurons (a) in the primary motor cortex are sent down the spinal cord (b) to a segment (vertebra) in the spine (c), where an exit cavity is located for the axons towards the target muscle. The excitatory signals from the cortex make synaptic contact with a pool of lower motor neurons (alpha motoneurons). Each motor neuron ((d), orange) is connected with a long axon (e) to a large number of muscle fibers (f), where an arriving electrical spike is converted into mechanical action, the twitch. The combined final system at the bottom, consisting of a single lower motor neuron, its axon, the fan-out branches to neuromuscular junctions and the contractile fibers is called a MU. A muscle can contain hundreds of such MUs. it has solved the problem that output elements may be subject to perturbations and drift during task execution. Last but not least, the biological motor system is adaptive as regards new motor tasks, the effects of resource depletion during task execution, and, at a slower pace, skeletal growth and ageing of the biological systems involved. While not all of these properties are necessary or easy to implement in new neuromorphic systems, we will explore a number of facets that lend themselves to a hybrid emulation in analog electronics and digital simulation.
An early ensemble approach to modeling muscular control in the electrical domain was given by De Luca (1979) 3 . The muscle consists of many separate groups of fibers, MUs, in humans ranging from a few dozen MU in small facial muscles up to 1900 MU in the calf muscle (m. gastrocnemius) (Feinstein et al 1955). A single MU actually consists of the combination of: (1) one motoneuron in the spine; (2) its axon towards the muscle; (3) the branches of the terminal arborization within the muscle; 4) the end plates as 3 ibi, figure 5, p 321. the neuromechanical junctions positioned on 5) the spatially distributed contractile muscle fibers. A single motoneuron excites from 100 to 1000 muscle fibers, depending on the muscle (Buchthal and Schmalbruch 1980). A discharge of the motoneuron in the ventral horn (figure 1) leads to a spike traveling towards the end plates, there leading to a spatiotemporally dispersed depolarisation which can be electrically measured on the surface of the muscle as a MU action potential (MUAP). Mechanically, the spike arrival can be measured as a muscle shortening, in the free-hanging condition, or as a force pulse in the mechanically constricted (isometric) condition. The shape of the MU twitch corresponds to a secondorder overdamped impulse response (Crochetiere 1967, Milner-Brown et al 1973, Tax andvan der Gon 1991) (figure 2). Whereas the summation, over motor units, of the MUAPs yields a signal measured as the electromyogram (EMG), the summation of the twitches (Burke et al 1976, Simony et al 2010, Gerritsen 2020, Smith et al 2020, similarly, leads to the compound force output of the muscle as shown in figure 3. Example of a second-order overdamped impulse response to an input spike (not shown) which occurs at t = 1. This function approximates the summed mechanical force effect of the contracting muscle fibers belonging to a MU, on the arrival of a spike from the alpha motoneuron. Figure 3. The muscle, modeled as a motor-unit pool receiving spikes from an upper controlling level, where each MU can be viewed as a filter operator in the electrical and force domain. The total muscle activity is spatiotemporally summed, leading to an observable EMG and an overall force-time function. The interesting aspect for neuromorphic computing is that the input can be seen as gross multi-channel spike-train control on a pool of units, with an overall functional effect in the force domain characterized by a carefully timed and shaped wave pattern. Short inter-spike intervals, whether within a MU spike train, or as the consequence of spatio-temporal summation over input spike trains, lead to a temporal fusion of twitches, allowing a wide variety of output force patterns.

How to exploit mechanisms in biological control within neuromorphic computing?
The aim of this study is not to model all forms of continuous, sustained control, but to focus on motor tasks involving short-lasting ballistic movements or analog bursting. By using the second-order overdamped impulse response (figure 2) to a spike, i.e. a pulse with a very short duration, the effect of that input event is spread over time, with a smeared-out duration which is useful in relation to the analogcontrol requirements of a particular motor task. In brief, the idea is to use the 2nd order impulse response as a base function for a neuromorphic universal function approximator. In an electronic implementation, this can be realized by putting two 'RC' filters in series, realizing an impulse response: where τ 1 and τ 2 are time constants. Advantages of this approach: • Bridge the gap between short-lasting spikes on the input and long-lasting analog wave output as required by a movement task; • Ability of the twitch impulse response to be used as a base function for general function approximation; • Requiring only simple electronic circuits; • In which the behavior is controlled by adaptive memristors. Similarly, twitches (i.e. 2nd order overdamped impulse responses) will be unipolar assuming a positive input impulse, as required by the electronics. Therefore, modeling using only positive 'humps' is not sufficient, because in control problems, the controller output needs to be signed (i.e. requiring a bipolar control action a ∈ [−s, +s]). Twitches, i.e. 2nd order 'humps' (figure 2) are a very skewed approximation of a Gaussian, but cheap to generate because the only requirement is a cascade of two RC filters which can be realized by a wide range of neuromorphic materials having capacitive, or more generally, statepersistent properties of the leaky-integrator type. The required bipolarity for control can be realized by assuming both an agonist and an antagonist MU pool, with a reversed sign on the base function. In muscular systems, the sign reversal is the natural consequence of a geometric structure that is organized around a joint, the agonist and antagonist muscles being connected to opposing locations around a hinge (joint) in a kinematic chain (Hartenberg and Denavit 1964).

Relation with traditional machine-learning methods for function approximation
We will propose a computing architecture for analog control tasks requiring short-lasting (ballistic) wave shapes with appropriate timing, amplitude, and adaptive wave shapes. The modeling will be done in two levels: (1) an electronic circuit is proposed using NGSPICE (Cox et al 1992, Strasnick et al 2021 to model a single MU, allowing for the timed generation of 2nd-order overdamped impulse responses; (2) an overall architecture is proposed, solving bipolar (positive and negative) control signals by summing outputs over ensembles of MUs; (3) an overall simulation is realized in Python to instantiate the NGSPICE simulations and collect their output; (4) heuristic, STDPinspired learning rules are presented to adapt memristor resistances to a task; (5) several tasks will be used to evaluate the capabilities of this biologically inspired analog control system.
The proposed approach has three desired properties in mind: • The total system of circuits needs to be flexible in the sense that it needs to be able to learn to perform multiple analog control tasks involving timings and wave shapes; • The circuit needs to operate using current memristive devices having a limited set of resistance values, usually only two; • The circuit has low power requirements; To show that ensembles of MU circuits can learn to perform dynamic control tasks, we will test whether it can be trained to control: (i) A cart-pole (inverted pendulum) task (Donaldson 1960, Blitzer 1965, Barto et al 1983 Task focus: (timing and amplitude in 1D); (ii) Playing the whack-a-mole game (Gutiérrez et al 2006, Wikipedia 2022 Task focus: (timing and amplitude in 1D); (iii) We also want to show the ability of the circuit to generate wave shapes representing pen-tip trajectories for handwritten characters (Schomaker 1992) Task focus: (trajectory formation in 2D).
Finally, we will discuss the modeling results and the possible implications for future neuromorphic and biomimetic control systems.

Circuit
The basic component of the approach is an electronic circuit representing a single MU, the MU circuit, for generating a delayed twitch with a particular amplitude. An ensemble of several such circuits is assumed to represent a 'muscle' , i.e. a complete analog controller for one output dimension. Two such ensembles can operate in an agonist/antagonist mode, allowing for bipolar control.

Delayed-pulse generator
The first part of the circuit is the time-delayed spike generator. This part of the circuit will turn a basic activation stimulus (a step pulse, from rest) into a delayed pulse, represented by a narrow, high amplitude block pulse. This brief-lasting spike is assumed   to be a physically realizable version of the idealized (Dirac 1947) pulse, with an area of one and a duration approaching zero seconds, as is used in signals & systems theory. It represents a single input spike of the input trains as shown in figure 3. The delayed pulse generator can be seen in figure 4. A memristor-capacitor combination realizes a charging curve leading to a step response by the first opamp, which is translated into a spike by the second opamp, not unlike a leaky-integrate and fire (LIF) neuron model (Abbott 1999).

Muscle-twitch generator
The second subcircuit is the twitch generator, which converts the short-lasting spikes from the first part of the circuit into longer-lasting spikes that look like the tension over time of a muscle twitch. The conversion from a spike to a twitch is realized by filtering the narrow block pulse with a second-order RC filter. The double RC filter generates a second-order overdamped impulse response. The subcircuit can be seen in figure 5.

Complete single-MU circuit
The complete 'single-MU' twitch generator circuit is shown in figure 6. It has only three memristors. In total, the single circuit has 16 passive components and three standard opamps (LM741). The components have been selected for a proof of concept, where notably some capacitors (C1, C5) are dimensioned to allow for modeling a relatively slow real-time process. When embedded in an ensemble, additional memristors are needed to obtain a weighted-sum output over MU circuits to allow a variation of output patterns.

Memristive devices
The memristor, coined by Chua (1971) and produced in the HP Labs (Strukov et al 2008) is typically a nonlinear device, of which the microscopic internal state can be modified and measured as a change in the resistance. Their ability to co-locate memory and compute in the same element makes them akin to biological synapses.
The memristor is a two-terminal device whose resistance can be switched between several states depending on the applied voltage. The classical memristor consists of two electrodes that sandwiches a binary resistive oxide, such as TiO 2 as seen in figure 7.
Resistive switching is activated when the electric field due to the applied voltage leads to the movement of oxygen vacancies whose density at either interface with Pt can be controlled by the applied voltage. The higher concentration of oxygen vacancies at TiO 2−x (x denoting the vacancy concentration) on the left side leads to the formation of an ohmic contact with Pt and a Schottky interface with Pt on the right side. The barrier height and width modulation related to the movement of the oxygen vacancies, with the applied voltage is associated with the different resistance states that such memristors exhibit.
A non-volatile memristor will retain its resistive state when no voltage is applied. This way, the memristor can be regarded as a component with memory. Non-volatile memristors are usually bistable, although research has been done on memristive systems with a higher number of states (Ielmini and Waser 2016, Sun et al 2021). Bistable memristors have an ON/Low resistive and OFF/High resistive state, also referred to as LRS and HRS respectively.
In neuromorphic systems, the goal is to realize local adaptation at individual synapses. In this study, we first want to investigate whether memristors with a limited number of resistive states can be used for analog tasks. It is not clear, a priori, that a coarse quantization of resistances (read: 'neural-network weights') still allows for sufficiently fine control in such tasks. Therefore, updates to the memristor parameters are, for the time being, realized by using an external controller which allows for a stochastic exploration of resistance configurations. The external controller can control the weight changes of the entire system or subsystems within the system. The controller receives input information from devices that measure the physical properties of the system, which are different across different tasks. These properties are converted into a control signal, based on heuristics, that contain a direction of update (i.e. an increase or decrease in resistance) and what subset of resistors to update. The resistances then change with only a set value. The controller's stochasticity determines which resistors are chosen to be updated, namely; at every instance where updates are performed, depending on the task, the resistors have a 20% or 50% chance of being adjusted. A flowchart of the process can be seen in figure A1.

Ensemble architecture
Similar to the muscle-modeling approach, a summation over individual twitches from MU circuits is used to obtain the driving signal. The architecture used for this contains multiple circuits with a common activating step input, where only the variable resistance values will be different between circuits. The circuits are split up into antagonistic groups, meaning one group of circuits will contribute positively while the other contributes negatively. By means of a long-lasting block pulse stimulus, the circuits in an ensemble are stimulated and will produce differently shaped and delayed second-order overdamped voltage pulses. The individual pulses are summed per group and represent the force an actuator has to exert, i.e. in the simulation the exerted force is assumed to be equal to the voltage (i.e. value Three different ensemble architectures are used; single-bin homogeneous updating (SBHoU) ensemble, multi-bin homogeneous updating (MBHoU) ensemble and the MU recruiting-inspired MBHeU ensemble. The SBHoU ensemble consists of one group of an antagonistic pair of circuits where either all agonist or all antagonist circuits will activate. The MBHoU ensemble consists of eight groups of circuits (four agonists and four antagonists) of which each activates depending on circumstances. The MBHeU ensemble consists of eight groups of circuits in which one or more activate depending on the circumstances. A schematic visualization can be found in figure B1 together with a more detailed explanation in appendix B.

Control tasks
In this section, our main task: the cart-pole or inverted-pendulum task is described. The whack-amole task is described in appendix C and a trajectorygeneration task for pen-tip movement in handwriting is described in appendix D. The control tasks address approximation problems in timing, amplitude and wave-shape domains. The cart-pole task represents a typical invertedpendulum control problem. It is used to evaluate the timing and amplitude control capabilities of our model. The cart-pole system consists of a pendulum attached to a cart using a hinge weight. The pendulum needs to be balanced by the movement of the cart, initiated by either turning the wheels or applying a force to the sides of the cart. In the current model, the task does not require an initial upswing from a down-hanging orientation. The cart-pole is simulated using the OpenAI Gym cart-pole environment (Brockman et al 2016). This environment utilizes the frictionless version of the equations of motions of the cart-pole described in Barto et al (1983), shown in equations (2) and (3).
where g = gravitational acceleration (9.8 (m s −2 )), m c = mass of the cart, m p = mass of the pole, l = distance from hinge to center of mass of the pole, and F = horizontal force applied to the center of mass of the cart. Different machine learning algorithms to control the inverted pendulum have been explored in simulation under which fuzzy logic (Becerikli and Celik 2007), reinforcement learning (Lillicrap et al 2019) and spiking neural networks (Kang and Banerjee 2018). The problem is also tackled in real life by reinforcement learning and a hybrid system of memristors and transistors (Wang et al 2019), where the memristors were used to speed up matrix multiplications and introduce stochasticity. The cart-pole is controlled in a simulated environment. We use the circuit outputs to generate force, and the OpenAI Gym cart-pole environment (Brockman et al 2016) to simulate the cart-pole. The environment with annotations and outputs of force-generating circuits can be seen in figure 8. The cart in this environment is moved by a pushing force that pushes to the left or the right. We call the ensemble pushing right the agonist ensemble and the ensemble pushing left the antagonist ensemble.
When the pole flips from a vertical position to a positive angle compared to this position in a clockwise direction, the agonist circuits that have to push the cart receive a 10 V block pulse until the pendulum flips to the other side. The circuits that are not activated also do not contribute in pushing the cart anymore (e.g. when the pole flips from a positive to a negative angle, the agonist circuit ensemble does not contribute any force anymore).
The pole is assumed to fall, ending a run in a failure, whenever the pole exceeds an angle of 20 • relative to the vertical. A run is assumed to finish successfully whenever the pendulum is balanced for 60 s. Unlike in the common benchmark setups, we do not set a horizontal limit for where the cart is able to move.
Monte Carlo search will be performed to explore the performance of different ensemble schemes in pole balancing. The physical properties are arbitrarily chosen; The mass of the pole is 0.2 kg, the mass of the cart is 0.3 kg and the pole length is 1 m. The activated bins with their corresponding angular velocity ranges can be seen in table B2. One run consists of 90 balancing attempts where the 90 initial conditions for the pole are each of the elements of the Cartesian product between the following two sets; The score of each run is the average balancing time over the attempts. By evaluating the results from the Monte Carlo search we will hypothesize what ensemble could be used best to utilize in combination with a more goaloriented learning rule which we call event timingbased plasticity. We will compare these combinations of ensembles and learning rules by means of two different tasks. The first task is a simple pendulum balancing task where the pendulum has to be balanced from an initial angle of approximately 0 • and an angular velocity between −0.05 and −1.5 or 0.05 and 1.5 (rad s −1 ). The second task is balancing the pendulum with initial positions as seen in the Monte Carlo search (i.e. an angle-angular velocity combination from the set of 90 initial states).
In both tasks, when the pendulum either crosses the vertical again or falls, the resistances are updated. Taking the time between two consecutive pendulum crossings to be t, the resistances update according to the following rules; • Resistances should increase when the pendulum is pushed too fast and too hard, resulting in lower amplitudes and bigger time delays. • Resistances should not change when t is within the ideal region. • Resistances should decrease when the pendulum is pushed too slow and not hard enough, resulting in higher amplitudes and smaller time delays.
An ideal time between two crossings is assumed to be between 0.1 s and 1 s. Resistances are only allowed to change −10 kΩ|10 kΩ or 0 Ω. The probability of change will be determined by the external controller's stochasticity. Afterward, the task in which the pendulum has a vertical initial position is extended to determine the working range of the system. This is done by varying the physical properties of the cartpole system. Besides controlling the pendulum, playing whack a mole and performing handwriting will also be explored. The details of the tasks will be described in appendix B.

Optimization methods
Optimization methods are used to teach or optimize a system with respect to a goal. Several optimization methods are used in this research. This section explains the optimization methods used; Monte-Carlo search (random search), the Widrow-Hoff learning rule (Widrow and Hoff 1960) and a rule inspired by STDP.

Spike-time and event-time dependent plasticity
Neurons are plastic in nature and update their connection strengths based on local information in the form of relative spike timings. Relative time differences in spikes can either cause an increase in connection strength, a decrease in connection strength or can leave the connection strength unchanged. Different STDP schemes are available, of which each has its own advantages and disadvantages. The STDP algorithms avoid global optimization and optimize locally. Local updating is an attractive property because it eliminates the need for additional connections or 'third leads' , into an electronic circuit. However, the current study is first focused on the general principle of neuromuscular-inspired analog control, evaluating it on a number of tasks. However, it is also useful to borrow from STDP the event-time principle as the basis for memristor updates. Actually, there exists a wide range of proposed STDP curves beyond the classic '1/x' shape (Buchanan and Mellor 2010).
In this study, we will use a more abstract version, dubbed event-time dependent plasticity (ETDP), which is based on deriving update signals for a memristor update from relative timing requirements of a particular task. The basic and discrete update actions are: 'resistance up' , 'resistance unchanged' , and 'resistance down' . The assumption of ETDP is that it is possible to determine an event in the task execution, of which the (relative) timing is related to success or failure. Because our proposed MU circuit in itself does not require a large crossbar with many memristors on the crossings, the disadvantage of the circuit-global nature of ETDP is manageable, although admittedly not negligible. For the tasks where ETDP is used, an explicit time event can be detected and used as an update signal.

Overall approach
We can model a single MU by treating it as a mechanism that produces a delayed spike with respect to a stimulus after which it filters this spike using a second-order overdamped transfer function (figure 2). This model is inspired by the neural firing timing property of which the importance is suggested by Mainen and Sejnowski (1995). Controlling a dynamical system using only a single signal however, may not be viable for systems that require fine-grained control. In order to achieve more finegrained control, it is preferable to sum the activation over an ensemble of units. As seen in figure 3, multiple MUs can either be added up in the the electrical domain in terms of MUAPs or they can be added up in terms of their twitch forces. We will use the latter to control our system. By timing the twitch forces we perform ballistic control on the system. We do this by simulating an ensemble of MUs, which work together to control systems. Each MU in our system will be simulated by an NGSPICE model (Cox et al 1992, Nagel et al 2011, which is an open-source tool from the SPICE 4 family of algorithms for detailed simulation of electronic circuits. Similar to the model that adds up multiple twitches to generate more complex force patterns, we will add up multiple twitch-like circuit outputs to generate more complex voltage patterns. These voltage patterns are then digitally transformed into force patterns, used as driving forces in the simulations.

Validation and calibration
Before being able to perform the three major experimental tasks, some validation and calibration steps need to be performed on the MU-circuit ensemble approach.

Validating the timing capability of a single motor-unit circuit
In order to test the basic timing capability of the first stage of a single motor-unit circuit, a number of pilot evaluations were done to verify the dependency on the resistance of memristor R1. The circuit is fed with a step function, from the resting state (0 V). A monitoring controller sets a predefined delay and starts an iterative update of R1 until the timing of the output signal is correct, starting with a predefined initial resistance (100 kΩ). The resistance value for a delay in the range of 0-1200 ms can be estimated in about 20 steps for coarse update steps of + or −10 kΩ. After convergence from 21 steps onward, the obtained MAE was 16.5 ms and the RMS error 20.8 ms. An approximation with |∆R| values which gradually become smaller leads to a more accurate timing but required a longer iteration sequence of 30 steps and assumes a finer memristance resolution. More details can be found in appendix E.1. It was concluded that this performance was a sufficient basis for the actual timing tasks (inverted pendulum and whack-a-mole).

Validating the shape-fitting capability of an ensemble of twitch generators
Similarly, before embarking on more complicated tasks, we needed to evaluate the capability of an ensemble of electronic twitch-generator circuits to approximate an arbitrary wave shape. Five different transient time functions were used. Three unipolar wave shapes: sawtooth pulse (increasing), reverse sawtooth (decreasing) and block pulse. Furthermore, a bipolar wave shape was evaluated, i.e. a damped sinusoidal impulse response. All signals were approximated using random weight initialization and the Widrow-Hoff learning rule. Please note that the basic architecture allows for bipolar signals in all cases because there is both an agonist and antagonist pool. For the unipolar targets, the system needs to learn to suppress negative output. Results show that with 100 circuits (50 for positively signed twitches, 50 for negatively signed twitches, a decent fit can be reached in about 1000 iterations. Figure 9 shows the results for the damped sinusoid. The approximation was also acceptable for the difficult ramp slope and the discrete jumps in the sawtooth and block pulse targets. Table 1 gives the RMS errors for different wave shapes. An overview of these pilot evaluations can be found in appendix E.2.

Validating a basic inverted-pendulum task ('1 s experiment')
The cart-pole task represents a typical invertedpendulum control problem. It is used to evaluate the timing and amplitude control capabilities of our model. The goal was set to push the cart such that the  pole crosses the vertical, in one second from an initial angle of 0.15 (rad), and initial pendulum angular velocity of 0 (rad s −1 ). The goal event time window for the pendulum crossing the vertical was from 1.00 s to 1.01 s. After every attempt, when the pendulum fell or reached the vertical, resistances change either 2 kΩ| − 2 kΩ or 0 Ω with an equal probability (i.e. 50/50). The rules can be seen in table 2. The timing and cumulative circuit output of a single run can be seen in figure 10. After iteration 8, the system has learned to let the pendulum cross the vertical. The final signal of the run can be seen in figure 10(b).
Here we can see that the cart is nudged just a little, in the beginning. Just before the 1 s mark however, at t = 0.8 s, the cart is pushed hard by two peaking forces. In summary, the system initially attempts to push the pendulum to the vertical and when this is achieved it optimizes the timing given the inertia characteristic of the pendulum, yielding a stable performance after ten trials. The experiment is also performed using different numbers of circuits in order to see whether the number of signals increases or decreases convergence. The average convergence time with error bars can be seen in figure E4. Here we  (2) and (3)). can see that with an increasing number of circuits, the convergence time (number of steps) decreases as well as its variation over training sessions. We have now shown that besides timing the delayed step, we can also time and shape the pendulum-control action using an ensemble of circuits. After embarking on the validation of basic functionality with acceptable to good results, we can now address the results for the major tasks.

Inverted-pendulum task: ensemble configurations
All three ensemble architectures are compared. For every ensemble type we performed 3 tests using a different number of circuits. For every run (one ensemble setting), we sampled 50 000 times from the 10 000 randomly generated circuit outputs (section 3.1.2). For every sample we let this combination of outputs balance the pendulum starting with 90 different combinations of angles and angular velocities as mentioned in section 2.3. We thus performed 50 000 * 90 = 4500 000 trials to get an impression of every setting's performance. For every sample (containing 90 trials) the score is equal to the average balancing time over all 90 trials. For each ensemble composition, the histogram of obtained balancing times over all the trials is shown in figure 11. Please note that vertical axes are on a logarithmic scale.
Since we are only interested in balancing the pendulum for a long time, comparing the three different methods is done by looking at the number of occurrences in the higher balancing times (>20 s).
The first histogram shows the SBHoU ensemble and can be seen in figure 11(a). We can see that there are a vast number of trials that end up in the one or two-second balancing time bin. We can also see that especially the 25 circuits per side ensemble resulted in several trials that balanced the pendulum longer than 20 s.
For the MBHoU ensemble, seen in figure 11(b), we can see that again there are a large number of combinations that result in one to two-seconds average balancing time. The 25 circuits ensemble performs best, like in the SBHoU experiment. This time however there were fewer trials resulting in a balancing time higher than 20 s.  I  21  4  2  II  50  8  7  III  56  15  21  Total  127  27  30 The third histogram, seen in figure 11(c), shows the MBHeU ensemble. Also here, many combinations of circuits were capable of reaching a one to twosecond average balancing time. However, the setting with the largest number of circuits (40) scores best but there are only a few solutions yielding balancing times that are higher than 20 s.
The numbers of top balancing times are shown in table 3. Here we can see that in the extreme cases, the SBHoU ensemble performs best, followed by the MBHeU solution, which is in turn followed by the MBHoU ensemble type by only a small margin. We can also see that the setting with the most circuit performs best for every ensemble type.
When selecting from the range of acceptable solutions (min. 20 s of balancing), the singlebin/homogeneous ensemble (SBHoU, panel (a) in figure 11) thus seems to perform best (56 acceptable solutions). Of the multi-bin architectures, the MBHoU approach requires more circuits than the MBHeU approach, but surprisingly the latter approach performs better (21 vs 15 solutions). Therefore, the multi-bin homogeneous ensemble with many circuits and an anticipated high energy demand (MBHoU) will not be chosen, in the next experiments.

Pendulum balancing using ETDP
In this experiment, we will compare two ensemble types on a pendulum-balancing task. As opposed to plain Monte-Carlo search, we will apply a more goaldirected form of plasticity, ETDP. The reason for also exploring the MBHeU scheme while the SBHoU ensemble performed best is not only because of its similarity to the biological process, but also due to the expectation that our event-time approach would potentially be able to learn to balance the pendulum for a wider dynamic range of initial settings (initial angle, initial angular velocity, cart mass, pendulum mass and pendulum length) than is the case in a homogeneous ensemble.
The best scheme for the MBHeU approach is the 25 → 25 + 5 → 25 + 5 + 5 → 25 + 5 + 5 + 5 circuits scheme (see table F1 and appendix B for an explanation). In order to compare the SBHoU approach in a fair manner, we choose the same number of circuits for the SBHoU approach as for the MBHeU approach, which is 80 (40 per side).
Ten different runs consisting of a maximum of 500 resistance change opportunities are performed Table 4. Fraction of perfectly balanced states given initial pendulum condition and circuit ensemble architecture.

SBHoU MBHeU
Vertical start 1 ± 0 1 ± 0 Non-vertical start 0.7 ± 0.2 0.97 ± 0.02 for both a random starting angle and starting angular velocity as well as a vertical start. The pendulum has a length of 1 m, a weight of 0.2 kg and the weight of the cart is 0.3 kg. Resistances are only allowed to change −10 kΩ|10 kΩ or 0 kΩ with a 20% chance of changing, which is determined by an external controller. After every update, the performance is measured by taking the average balancing time, calculated over all 90 states in the case of the random starting state and calculated over 60 different initial angular velocities 5 rad s −1 in the case of the vertical starting state. The best performing combination of circuits (i.e. the combination of circuits of which the average balancing time is the highest) are saved for every run. The number of perfectly balanced (60 s) states is counted for each of those circuits and is then averaged. The comparison can be seen in table 4. We can see that in the vertical start setting, both architectures are able to perfectly balance all test settings. In the non-vertical start setting, however the single bin homogeneous update architecture is able to only balance 2/3rd of the settings while the multi bin heterogenous update architecture is able to balance 97% of the initial conditions. The MBHeU architecture is significantly better in the non-vertical start setting than the SBHoU architecture t(18) = −4.266, p = 0.0005.
From the same experiment, the average balancing time per update is kept track of and is plotted in figure 12.
In the vertical start runs, we can see that the homogeneous updating architecture learns faster than the heterogeneous one, while both will converge to a configuration where the system is able to perfectly balance all starting conditions. In the harder task where the initial pole position is one of 90 states as seen in the Monte Carlo search, the heterogeneous system seems to peak higher and generally seems to have a higher variance. It appears that the working range (pendulum angle and angular velocity) needs to be limited during the training procedure, because configuration which might be impossible to optimize for while also keeping the system optimized for other configurations, may cause divergence in the training due to a competition between opposing controllers.  0.9, 0.95, 1, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3,  1.35, 1.4, 1.45, 1.5}.  This divergence can be seen especially in the heterogeneous system by means of the decreasing trend after peaking. By using two ensembles that interchangeably learn until they perform better than the other while the other does not learn, learning is more stable. Performance of dual ensembles, consisting of combinations of all 10 previous runs for all 4 settings can be seen in figure 13.
In the case of the vertical starting state, we can see that on average the homogeneous updating system learns faster but the heterogeneous updating system does not lag far behind and reaches the 60 s mark approximately 100 updates later. In the case of the 90 starting states, the homogeneous system again learns faster on average but is surpassed by the heterogeneous system after around 350 updates. the heterogeneous architecture on average balances better than the homogeneous one.
A single run consisting of 500 optimization steps with a pendulum length of 1 m, a pendulum weight of 0.2 kg and a cart weight of 0.3 kg is performed. Every optimization step consists of one or more updates, which happen when the pendulum drops or crosses the middle. The resistances are allowed to change −10 kΩ|10 kΩ depending on the required direction or 0Ω with a 20% probability. One step is over when the pendulum falls or is balanced for 60 s. An example of a single run can be seen in appendix F. The starting angles of the pendulum are approximately 0 • and the starting angular velocities are randomly initialized between −1.5 to −0.05 (rad s −1 ) and 0.05 to 1.5 (rad s −1 ). Using the same settings, 30 runs have been performed. The average and standard error of these 30 runs are shown in figure 14. We can see that especially in the first few steps, the pendulum takes some time to be balanced. Later, the learning suddenly speeds up and finally slows down again. Over time, the system is able to balance the pendulum fairly well with an average of the maximally reachable 60 s on a regular basis.  In order to show that the system can also learn to balance the pendulum for a longer amount of time, a single run with the same settings as the previous runs is performed with a cap of one hour (3600 s). This run can be seen in figure F3.
The working range of the system is explored by varying the simulated physical properties of the cart and the pole (Brockman et al 2016). Twentyseven (27) different combinations of parameters are explored. Three parameters are varied; length of the pendulum, mass of the pendulum and mass of the cart. The parameters and their values can be seen in table 5.
For every combination of values, the optimization process is run five times. Meaning, in total 135 simulations are performed. The system is judged on three criteria; consistency, learning speed and average balancing time.
• The consistency is a measure of how well the system performs once it has reached a balancing time of 60 s the first time. This is calculated by dividing the total number of times it reaches 60 s by the total number of trials performed after the first time the system reached 60 s. It thus shows how well the system is able to keep performing once it has found its first solution. • The learning speed is the number of trials the systems needs before it is able to reach 60 s for the first time. • The average balancing time is the average balancing time over the entire run.
The evaluations are plotted in figures 15, G1 and G2. The same data, but instead plotted per pendulum length can be seen in appendix G. These plots are less dense and easier to comprehend. We will review the average balancing time in figure 15 in more detail. We can see that a fairly wide range of physical parameter settings results in an average balancing time of over 30 s. The system starts to fail to learn when the cart mass is 1 kg and the pendulum length is 0.4 m (i.e. where the pendulum length is the longest and the cart mass is the heaviest). This is indicated by green dots on the bottom of the plot. It should be noted that this physical parameter setting requires fast, powerful actions, due to the combination of a short arm and the relatively large mass. On the other side of the spectrum (i.e. a large pendulum and a light cart indicated by red triangles), the system shows signs of a harder time learning.

Handwriting task
The handwriting task is performed on all letters of the alphabet, for 3 different writings per letter, ten times per writing for both an ensemble of block pulses as well as an ensemble of twitches. An example of fitting a single written letter variant can be seen in figure 16.
Here we can see that there is quite some variability in the individual fits. Despite this, the average of all fits comes close to the original letter. We can also see that the letter fit by block pulses has sharp corners, while the letter fit by the twitches appears to be more smooth. A histogram of the euclidean distances which shows that the twitches fit the letters better than the block pulses as well as a comparison between individual letter fits can be seen in appendix I).

Discussion
In this study, we propose to take inspiration from biological neuromuscular control mechanisms, both for 'internal' neuromorphic computing and for external, multi-actuator mechanical control. A basic electronic circuit is introduced (in NGSPICE) as a synthetic MU, which is able to implement timing, amplitude and a 2nd order overdamped impulse response, by means of a limited number of n-state memristors. By combining several such MUs into an ensemble, arbitrary wave shapes for short-lasting control actions can be realized. The problem of bipolar control signals is solved by defining two pools of motor neurons, one for the 'agonist' and one for the 'antagonist' direction. The concept is tested on three motor-control tasks involving timing, amplitude and wave-shape requirements: The inverted-pendulum task, the whack-amole task, and a pen-tip trajectory generator for handwritten characters.

Modeling results
The proposed approach shows, that in simulation, the three tasks are performed well, with a limited number  of motor-unit circuits in an ensemble. Whereas the exact shape of a force burst is less important in the inverted-pendulum task and the whack-a-mole task, the advantage of the 'twitch' modelization as a second-order impulse response becomes clear in the handwriting task, where basic block pulses provided a higher RMS error than the twitch function as the basis function in the ensemble.

Electronics implementation
The current circuit is not a minimal setup and can be simplified in many ways. The goal of this publication is to propose the general principle and leave fundamental improvements to the domain of electronics research. One possibility would be to replace the opamps with field-effect transistors, which also will allow for a good range of usable RC-value combinations due to their high input impedance (10 7 -10 12 Ω). The use of MOSFETS would also reduce the power requirement of the circuit. It would be interesting to find solutions with a minimized component count and optimized routing in a multi-unit design. In analog neuromorphic computing, the temporal task requirements cannot be overlooked. The component parameters (RC-value combinations) are dependent on the speed of optimal system responses in a particular task. Here, the parameters were geared to simulations of relatively slow processes. Notably, the capacitor is an 'expensive' component in a circuit. However, if the spike rates are high and twitch durations are short, low-capacitance MOSCAPs (Arora 1993) could be used in a VLSI design. Alternatively, volatile memristors can be used which have large time-constant values for the decay of a parameter value in the circuit, e.g. resistance (Wang and He 2017).

Implications for systems with adaptive memristive devices
One of the advantages of a 'real' STDP framework is the local learning, on board. In the proposed ETDP model, memristor adaptation needs to be realized by a non-local, external controller. For the MU circuit, five (5) leads would be needed to inject resistance-switching voltages. If these are kept within a reasonable range, damage to the semiconductors can be avoided. An alternative solution would consist of optically-controlled memristors (Hu et al 2021), which have an advantage due to the decoupled operational and training modes and the opportunity for out-of-plane access to the memristive devices on a flat surface, i.e. without additional wiring and routing problems. Since the model is basically concerned with a timed generation of transients, it does not require, in itself, a large-sized 2D crossbar. A 1D linear array of memristors is needed to realize the weighted sum per ensemble, whereas the realization of the bipolar control signal just requires the summation over agonist and agonist ensemble, for a single control output channel. This does not preclude the use of a crossbar at later or earlier processing stages, it just is not needed within the stage of wave-shape composition by timed twitches.

Implications for biomimetic actuator design
Whereas the primary goal of this study is to propose neuromorphic approaches for analog control in general, the availability of ensemble-based motor control will be useful for biomimetic control systems, especially if these involve the use of many small actuators with an uncertain, non-linear behavior. For example, in McKibben actuators (Gaylord 1955) or Hasel actuators (Rothemund et al 2021), especially if implemented with a large number of contractile elements. A wide range of solutions and patents for (electro-)contractile materials are known (Brock 1991, Mirfakhrai et al 2007, Bhatti et al 2020, Kuramaya and Ito 2020, Leng et al 2021, Phan et al 2021, Liu et al 2022. Despite a stable research interest over the years, it is difficult to create a breakthrough, because of the non-linearity of control and possible drift and temperature dependencies. A twitch-based controller for individual thin contractile elements that are grouped in parallel, similar to the biological muscle can theoretically compensate for the imperfections (Moghadam et al 2015) in the material. For some materials, the second-order filter component in our model can be replaced by the electroactive fiber group, while retaining the population-based timing, amplitude and wave-shape control. In Kuramaya and Ito (2020) an example is given of an artificial muscle exhibiting a second-order overdamped impulse response that appears to be very similar to its counterpart in biological muscles. In this study, we did not address the biological feedback control, which uses muscle length-variation sensors (muscle spindles) and tendon force. Ensemble-based control models (Georgopoulos 1991) play a role, both at the output level and the input level. This means that error signals can be directed back into a motor-unit controller, to compensate for locally non-ideal actuation. As an example, an electro-active polymer with a transparent core can be gauged using laser pulsing using Bragg grating (Hill et al 1978, Wolf et al 2018), such that the contractile state can be sensed. Other sensing mechanisms with strain gauges or piezo materials can be envisaged, giving local feedback from individual regions within an artificial muscle. The essence is to design a fine-grained multi-actuator system that is intrinsically prepared for internal wear and tear and external disturbances. This can be realized by point-to-point feedback in an electro-mechanical ensemble system, such that sensors report on the corresponding local regions of contractile elements. This approach is opposed to the traditional technical actuator design which aims at a limited number of presumably perfect components, e.g. using one electromotor per joint. Evidently, a biologically-inspired system will not be the choice if micrometer precision is needed. For many materials, even attaining millimeter precision will be difficult at the scale of humanoid robotics. However, for control of ballistic movements over a wide range of tasks, a spiking analog control model does have attractive properties.

Incorporation of reflexive feedback
In this work, the overall architecture and the tasks that were used address the feedforward, ballistic type of control, i.e. without considering real-time feedback. In the biological motor apparatus, there is a wide range of sensory (afferent) channels, concerning muscle-fiber length changes from the muscle-spindle sensors, absolute force-level information from the Golgi organ, and nociceptive (pain) signals. For a mechanical agonist/antagonist muscle setup, homonymous excitation and reciprocal inhibition allow for a precise control of mechanical impedance. An example of the highly advanced control mechanisms in the biological system is that the sensitivity of the muscle-spindles is controlled from within even the high level of the motor cortex via the γ motor neuron or fusimotor system, encompassing 30% of the total control signal (Macefield and Knellwolf 2018). Such feedforward mechanisms are especially important for anticipatory control in a system with several delays, in signal propagation as well as in the mechanical force buildup. However, for a fully computational neuromorphic application of the paradigm that uses the twitch ensemble for time function approximation, lack of feedback may not be a limitation and it avoids the complexity in an electronic implementation. Interestingly, the sensory information collection in the dorsal root of the spine also appears to obey recruitment principles, similar to the size principle on the output, i.e. motor-neuron side (Idlett et al 2019). Ensemble recruitment also plays a role in central (non-motor) computation (Crowe et al 2010, Carrillo-Reid andYuste 2020). In Khubieh et al (2015), it is shown that the recruitment mechanism increases dynamic range and allows for an exploitation of the noise that is present, in pyramidal cortical neural computation, i.e. not only in efferent motor control but also in internal computational processes. Representing an information quantity using a number of neurons leads to an improved signal-to-noise ratio and dynamic range.

Gross versus detailed control
An aspect of the fine-grained control in the biological motor system has not yet been addressed. Implicitly, the control signals coming from the upper level, i.e. the primary motor cortex, were assumed to have a gross effect on the motoneuron pool. This means that the details of local adaptation in space (the tissue) and time are left to the computational resources within the segmental level of the spine, allowing the usual recruitment order to take place. This view is consistent with the general approach in, e.g. motor planning in robotics, with a separation of 3D task-space and n-df joint-space planning and abstracted set-point values to be reached by the actuator. However, in the biological system, the central controller can exert, apart from gross influence, also a more detailed control in the recruitment of MUs (Basmajian 1963). If needed, the central controller can choose to modulate the actuator in detail, if a task would require so. In Formento et al (2021), subjects were able, after six days of training, to skillfully and independently control three MUs in the biceps to complete a 2D centering task. For intelligent controller systems this means that corrective action for non-ideal components can be realized at two levels: The basic low-level control loop and a higher, strategy-based adaptation of controlling signals, from above. Also here, an ensemble implementation may show its virtue: In the case of many MUs, compensatory strategies by a central controller may be applied with graded effects. The combination of strict hierarchical control in combination with delegated local steering is known as heterarchical control (Cohen 1992).

Future research
We have presented a first step in the direction of ensemble-based generation of control functions. Several important elements of the biological systems may play a role in more advanced versions of the proposed approach. As an example, for reasons of simplicity, the 'size principle' (Henneman 1957) was not deployed in the synthetic MU ensemble. A non-homogeneous distribution of amplitude and time-constant parameter values may allow for achieving a larger dynamic range of responses, both in amplitude and speed of reaction. The distributions of such parameters are highly specific (Elder et al 1982) in the human muscular system. In case of new contractile materials, it is evident that the properties of such materials would determine optimal parameter values that need to be determined. Whether a non-homogeneous pool of twitch generators is also beneficial in the case of generic time-function approximation tasks for internal neuromorphic computation is an open research question. In a small experiment using the 20%/80% rule for m. soleus (Elder et al 1982), for slow (Type 1) vs fast (Type 2) synthetic MU distribution in our inverted-pendulum task, no clear benefit of a heterogeneous composition of the ensemble was found (appendix J). The argument of a beneficial increased dynamic range can be countered by the argument that the learning process is simplified in the linear setup of a homogeneous ensemble, where the contribution of the smallest MUs is not larger than the step sizes required for accurate control or precise function approximation. More research will be needed, because there clearly is a potential for any mechanism that is helpful in the reduction of resources (energy, number of electronic components) in neuromorphic control. In artificial systems, the optimalities will likely not fully overlap with those found in the biological system (Dideriksen et al 2012). Finally, the approach presented here may have useful implications for neuroprosthetics (Unal et al 2018, Lan et al 2023, Zhang et al 2022.

Conclusion
This study provided a proof of concept, using hybrid simulation, for the use of a motor-unit twitch circuit that can be used as the basic component in an ensemble setup for generating analog control signals that are timed, with a specific amplitude and wave shape. Results show, that the circuits were well trainable and performed well on three different analog-control tasks: The inverted pendulum, the 'whack-a-mole' task and the generation of pen-tip trajectories for letters in handwriting. While the results are preliminary, they open a new view on computational processes within neuromorphic computing hardware but also on control modes for multi-fiber artificial muscle actuators.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).

Acknowledgment
This project was funded by the CogniGron Center, University of Groningen.

Appendix B. Ensemble compositions for the cart-pole task
Within the biological muscle, there exist many ways in which MUs are grouped and are involved in a recruitment scheme. For the cart-pole system, we test three different ensembles types, abbreviated as SBHoU, MBHoU and MBHeU.
The SBHoU ensemble of which an example can be seen in figure B1(a), consist of one ensemble for the agonist and one ensemble for the antagonist side. The agonist ensemble will activate when the pole crosses the vertical in the positive angular direction and the antagonist ensemble will activate when the pole crosses the vertical in the negative angular direction.
The MBHoU ensemble ( figure B1(b)) consists of multiple ensemble bins per side, where one bin consists of multiple circuits. When the pendulum flips from one side of the vertical to the other, one of the multiple ensemble bins is activated. Which one is activated depends on the instantaneous angular velocity at the point in time of the pendulum crossing the vertical. In practice this can be seen as follows: the angular velocity will be measured at the point of crossing by a tachometer. This meter generates a certain current. This current then operates a switch. The switch will in turn activate a specific ensemble bin depending on the height of the current, and thus the magnitude of the angular velocity. For the MBHoU ensemble we choose four different circuit bins per side (we thus have eight bins). The first bin will activate when the instantaneous angular velocity is [0-0.5) rad s −1 , the second bin will activate when the instantaneous angular velocity is (0.5-1] rad s −1 , the rest is visible in table B1 whereΘ is the instantaneous angular velocity at the point of crossing the vertical. The third approach is the recruitment-based MBHeU approach ( figure B1(c)). This approach works the same as the MBHoU ensemble approach except that, with an increasing velocity range, an increasing number of bins will activate (e.g. when the angular velocity is between 0 and 0.5 rad s −1 , a subset of circuits will activate. When the angular velocity is between 0.5 and 1 rad s −1 , the same subset will activate together with an additional subset). This method thus activates extra circuits when more force is needed. The activated ensembles per angular velocity range is shown in table B2.  Activated bin number (side)Θ (rad s −1 ) Activated bin number (side)Θ (rad s −1 )

Appendix C. Whack-a-mole task
In this children's game, a mole pops out from a board with holes and has to be hit with a hammer to gain points. We have simplified the game to one dimension, with a single mole that pops up at a fixed distance from the origin. Rather than ensembles of circuits that alternately activate, all circuits activate simultaneously and voltages are added up to generate a net control voltage. This voltage will drive the hammer forward while net positive and backward while net negative. The mole is assumed to be hit when the hammer is located above the mole, however, with the requirement of low velocity, implying that a braking action needs to be learned. The update-rules can be seen in table C1. The resistances are only allowed to change −50 kΩ|50 kΩ or 0 kΩ which will again be determined by the external controller's stochasticity, which allows for a 20% chance of change. Because of the mole's static position, no recruitment is needed to deal with a large dynamic range.  Figure D1. Decomposition of the handwritten letter 'a' into a horizontal stroke and vertical stroke. The handwriting is sampled at 100 Hz and interpolated using cubic interpolation which smooths the letter. Combining the amplitudes at the same points in time results in the original letter. The letter is a smoothed letter from the PluColl dataset (Schomaker 1994).

Appendix D. Handwriting task
In some control tasks, block pulses as control signals might suffice. Using a smooth task like handwriting we explore the differences between the twitch shape and a block pulse. Writing a letter can be decomposed into a horizontal and a vertical stroke pattern. Combining the two results in the original letter. An example of the decomposition can be seen in figure D1. By fitting both the horizontal and the vertical stroke, handwriting can be learnt. Please note that small deviations along either one of the axes can already result in an illegible letter if the differential timing is disturbed (Schomaker et al 1989). A dataset containing handwritten letters is the PluColl dataset (Schomaker 1994, Vuurpijl andSchomaker 1997), which is available on the Zenodo repository. For all letters of the alphabet, 3 different writing instances per alphabet letter are arbitrarily picked out. The letters are then fit using randomly generated pulses and the Widrow-Hoff learning rule, 10 times per writing pattern for both an ensemble of block pulses as well as an ensemble of twitches. The latter is done to evaluate the assumed advantage of the smooth twitches as opposed to the block pulses as the basis function.

Appendix E. Preliminary evaluations E.1. Single-circuit timing capability, R1 adaptation
The signal timing experiment is performed to see whether we could learn to generate a step signal at a specific moment in time. In order to see whether the circuit is able to generate a step delay at multiple different times, multiple timing goals have been set. Two different schemes have been used to update resistance R1. In both cases, the initial resistance is set at 100 kΩ and is updated such that the time delay comes as close as possible to the target time.
The first scheme used is the linear updating scheme where the change in resistance is 10 kΩ per step or iteration. The time delay of the circuit, taught by a linear updating scheme, seen in figure E1, first goes to a value close to the desired time. It then exceeds this time and starts wiggling around the desired time value because the standard update of 10 kΩ does not allow for convergence unless the perfect timing is achieved with a resistance value that is a multiple of 10 kΩ.
The second scheme is the decaying update scheme. In this scheme the initial resistance change is 10 kΩ and this decays linearly over time, shown in equation (E.1).
In the equation, ∆R is the change in resistance, 'iterations' is the maximum number of iterations and 'iteration' is the current iteration number. The timing results of the decaying scheme can be seen in figure E2.
Using this scheme we can see that at first, the system starts with relatively big updates, getting closer to the desired time fast. Over time the update size decreases, which allows the system to converge.

E.2. Evaluation of basic wave-shape fitting
In order to perform a variety of tasks, the system should be able to fit various shapes. Five different shapes have been fit using the Widrow-Hoff learning rule. Before fitting the functions, 10 000 circuit outputs have been generated using random resistance values for R1, R4 and R6. For every waveform, 2000 learning iterations have been performed. For one learning iteration, 100 circuits are sampled, where 50 of the circuits are used as agonist (contributing positively) and 50 as antagonist (contributing negatively). After sampling, the Widrow-Hoff learning rule is performed for 1000 steps to optimize 102 coefficients (100 circuit outputs, one agonist bias and one antagonist bias) that scale the outputs of every circuit respectively. After every coefficient update, the negative weights are clipped to zero in order to make it more physically plausible (i.e. the agonist should strictly only contribute positively and the antagonist should strictly only contribute negatively). An issue in fitting the sawtooth is the straight diagonal line. Since the basic twitches only consist of a rounded shape, straight diagonal or horizontal lines are more difficult to fit. This is also visible in for example the block pulse ( figure E3(d)). Generally however, considering the variety of signals, the signals fit reasonably well.

E.3. Cart-pole task, '1 s experiment'
The '1 s' experiment was performed using a different number of circuits in order to see whether the number of signals increases or decreases convergence. The average convergence time with error bars can be seen in figure E4. The convergence duration improves from over 250 steps at 10 circuits down to just above 50 steps when using 25 circuits. Figure E4. Average time it took to learn to get the pendulum straight exactly after one second including error bars. This average time is taken for four ensembles with 10, 15, 20, and 25 circuits, and is performed ten times per ensemble. Convergence is speeding up and becoming more predictable, with an increasing number of circuits in an ensemble. A single run consisting of 500 optimization steps with a pendulum length of 1 m, a pendulum weight of 0.2 kg and a cart weight of 0.3 kg is performed. Every optimization step consists of one or more updates, which happen when the pendulum drops or crosses the middle. One step is over when the pendulum falls or is balanced for 60 s. The starting angles of the pendulum are approximately 0 • and the starting angular velocities are randomly initialized between −1.5 to −0.05 (rad s −1 ) and 0.05-1.5 (rad s −1 ). Figure F1 shows the update scenario for the ETDP rule in the pendulum task. For vertical-line crossings that occur much too early (t < 0.1 s) the direction of the memristor adaptation obtains the sign −1. Within a dead zone (0.1 s < t < 1.0 s) no updates occur. For t > 1.0 s, the direction of adaptation has the sign +1. The reason 0.1 s is chosen as the lower time limit is that otherwise the learning rule will only allow for an increase in resistance, meaning there is no regularization and it is possible for the resistances to decrease to their minimum value (which results in all circuits generating an ≈8 V voltage spike at t = 0.
The single run can be seen in figure F2. From this run we can see that the system is helpless initially. It then slowly starts learning and it reaches the balancing time of 60 s more frequently, after around step 180 it barely drops the pendulum. At around 300 steps however, it seems to have gotten into a less stable state, which is quickly corrected as can be seen in the density of lines in the bottom graph. We can also see that even though in some trials, the system has reached its maximum balancing time, resistances are still updated during balancing. This means that even though it knows how to balance the pendulum for a given starting condition, it will still keep optimizing toward the ideal 0.1-1 s range as seen in figure F1.

Appendix G. Pendulum working range
From the learning time graph in figure G1, we can see that it takes a relatively long time to learn to balance the pendulum when the cart mass is the highest (1 kg) and the pendulum is shortest (0.4 m). The rest of the settings seem to be learned fairly fast. By looking at the consistency plot in figure G2 we can see that again the highest cart mass and shortest pendulum setting performs worse. Besides this, the setting with the shortest pendulum length and second to highest cart mass also does not seem to be consistent.
Combining these two observations it seems that in the extreme case the system barely learns. Additionally, when the system is less hard to stabilize (i.e. with a cm of 0.5 kg and a pl of 0.4 m, the system does learn fast but is not able to generalize for all starting states. From the consistency plot in figure G2 we can see that for most physical parameter-value combinations the system seems to be consistent: When it has learned to balance the pendulum for 60 s once, it generally is able to keep balancing the pendulum for various starting conditions. Figure G3. Consistency of the system balancing the pendulum for different pendulum lengths: a) 0.4m, b) 1m, c) 2m. Standard error bars are included. The cart mass (0.2, 0.5 and 1 kg) is indicated by color (see legend).

H.1. Whack-a-mole
In the whack-a-mole game the mole, with a length of 0.05 m is put 0.5 m away from the starting point of the hammer. The mole thus covers 0.5 m-0.55 m relative to the starting point of the hammer. The hammer is a 'point in space' with a mass of 0.3 kg. The time limit to reach the mole is 1 s and the position limits are −10 cm and 75 cm. When exceeding the limits, the resistances update. We also ran a simulation of multiple runs using a different number of circuits. The force on the hammer and the trajectory of the hammer of one run, is shown in figure H1.
We can see the hammer speeding up when the force is high, the hammer then overshoots, slows down and is pulled back. This allows the hammer to arrive between 0.5 and 0.55 m with a very low velocity. The comparison between the average of runs with different numbers of circuits can be seen in figure H2. In this figure we can see that although there is a large variability in the attempts, the general trend seems to be that the more circuits are used to learn to play the game, the faster the system  learns to play in terms of the number of attempts. The difference in mean however is not statistically significant when performing the one-way ANOVA test F(2, 27) = 0.183, p = 0.872 calculated as an error measure. Errors of all fits can be seen in figure I1. Here it seems that the distribution of the twitch fits is more left-leaning than the distribution of the block fits. By means of a two-way ANOVA test and the plots in figures I1 and I2, it can be confirmed that the twitches significantly fit the letters better F(1, 1508) = 21.3, p = 4.3 * 10 −6 .
( Figure I2). Here the average and standard error per letter are calculated. We can see that fitting using twitches always has a better or equal performance compared to fitting with block pulses except for the letters i, k and t. Figure I1. Euclidean distances of all fits categorized by fits using block pulses and fits using twitches. Figure I2. Euclidean distance and standard error per letter. The average Euclidean distance shown is the average for a letter over three writers, and 10 model fit attempts per letter, yielding 30 fits. Error bars represent the standard error. Letters differ in their degree of difficulty.

Appendix J. Evaluation of using a heterogeneous pool with slow and FTs on the inverted-pendulum task
As a first step to future research we evaluated the effect of applying the 'size principle' (Henneman 1957) in the composition of the motor-unit pool. Please note that in the case of electronic control there are no constraints that are dictated by the mechanical actuator (e.g. contracting fibers). A distinction was made between FT MUs, using capacitors of 200 nF, and slow-twitch (ST) MUs, using capacitors of 1 µF. The FT MUs had an amplitude range of 0.72-10.2 V. The ST MUs had an amplitude range of 0.1-1.9 V. This check involved a replication of the inverted pendulum task (section 2.3), using 90 possible pendulum starting states and 16 replications. The obtained balancing duration for both a single homogeneous MU pool and a heterogeneous ST/FT pool are measured. Figure J1 shows the result of the simulations for '1 pool' and '2 pools' . At the end of the training (500 updates) there are no clear differences. More research is required to explore the benefits of specific twitch amplitude and duration distributions.