Central pattern generators evolved for real-time adaptation to rhythmic stimuli

For a robot to be both autonomous and collaborative requires the ability to adapt its movement to a variety of external stimuli, whether these come from humans or other robots. Typically, legged robots have oscillation periods explicitly defined as a control parameter, limiting the adaptability of walking gaits. Here we demonstrate a virtual quadruped robot employing a bio-inspired central pattern generator (CPG) that can spontaneously synchronize its movement to a range of rhythmic stimuli. Multi-objective evolutionary algorithms were used to optimize the variation of movement speed and direction as a function of the brain stem drive and the center of mass control respectively. This was followed by optimization of an additional layer of neurons that filters fluctuating inputs. As a result, a range of CPGs were able to adjust their gait pattern and/or frequency to match the input period. We show how this can be used to facilitate coordinated movement despite differences in morphology, as well as to learn new movement patterns.

gait frequency. A cat, for example, may need to walk at a slow pace in order to ambush its prey. For robots, being able to adapt to humans in their vicinity is an important goal, particularly for caring and collaborative applications [11]. For early humans, it is widely thought that the ability to synchronize movements with others was a crucial step in the evolution of social cognition [12,13,14]. This is therefore a relevant capability for socially responsive and intelligent robotics.
In vertebrates, gait pattern and frequency modulation are indirectly controlled by the intensity of the current from the brain stem, to which the spiking rates of locomotor neuron populations are sensitive [15]. Early bio-inspired robotics work attempted to replicate this emergent pattern generation, typically with continuous-time recurrent neural networks [16,17]. This connectionist approach was largely abandoned in favour of more manageable coupled limit-cycle oscillators, where each parameter's effect on the overall behaviour is predictable [1,18,19], particularly if the CPG outputs are mapped to workspace trajectories [20]. Hence, the problem of gait adaptation has become a matter of designing appropriate feedbacks or learning schemes for the designated control parameters. These continuous learning approaches have been highly successful for learning stable locomotion and adapting to unseen physical environments [21,22].
Adapting to a social environment, however, is qualitatively different to adapting to a physical environment. One example of a social adaptation is imitation or learning by demonstration. This is now common in, for example, compliant robotic arms, where there are few constraints on movement [23]. However, for most CPG-based legged robot controllers, the range of available limit cycles is prescribed by design. Therefore, the potential for imitation and synchronization is relatively limited in most current legged robots.
The problem of large parameter spaces that comes with more flexible neuron models can be tackled using the same method by which nature succeeded to make animals roam the earth. Evolutionary methods are a useful tool for robotics, making use of the abundance of computing power now at our disposal [24,25] and new algorithms for promoting diversity of designs [26,27]. This has led to advances in morphology design [28], modular and soft robots [29,30], and generative encodings [31].
While evolution and real-time adaptation may at first glance seem like unrelated processes working at very different time-scales, there are several ways in which evolution can facilitate adaptation. Evolution can, for example, globally optimize in the high-dimensional space of connection weights in recurrent neural networks so that a lower dimensional space of inputs encompasses a wide repertoire of output patterns [32,33]. When applied to locomotion patterns, this dimensionality reduction can therefore simplify the task of online learning of which behaviours are most suitable in which state and environment, as is done in reinforcement learning [34], or the task of Hebbian learning of connection weights for higherlevel control [35,36]. Importantly, reactive controllers can also be optimized in advance for susceptibility to a wide range of inputs, allowing spontaneous compliant motion in an open-loop scenario.
Single-objective evolutionary algorithms were used in early work on connectionist CPGs, generally to optimize some combination of walking speed, regularity and stability measures [17,37]. These measures are often involved in trade-offs, meaning that despite the "handsoff" nature of evolutionary algorithms, fitness functions still needed to be carefully designed to weight each measure appropriately. The advent of multi-objective algorithms [26] allows these measures to be separated into their own fitness functions, so that a diverse range of controllers is generated along the Pareto front of non-dominated solutions [38,39,40]. Therefore, in addition to the advantages of evolutionary algorithms for highly flexible neural controllers, multi-objective evolution in particular also allows for correlational studies about their emergent properties, and for controllers to be hand-picked for different sets of capabilities after a single optimization process.
In this paper, we use multi-objective evolution to test the assumption that evolving CPGs for flexibility can facilitate rapid real-time gait adaptation. This contributes to bio-inspired robotics in two ways. Firstly, we demonstrate novel virtual robot quadrupeds that can entrain their locomotion to a range of rhythmic external inputs without physical coupling and without explicit feedback. This kind of automatic adaptation to social environments via audio and visual perception is common in humans, such as the tendency to synchronize when walking together [41]. While rhythmic entrainment to social partners has been demonstrated in virtual robots [42], this used relatively slow phase-based feedbacks in linear oscillators. Therefore, using neuromorphic CPGs can increase the naturalistic quality of multi-robot and human-robot interaction at the level of basic behaviours.
Secondly, we show how examining correlations between emergent properties of the CPGs can aid future design. Connectionist control systems are once again being increasingly employed in robotics, and these generally require searching through a large parameter space using automated processes such as reinforcement learning or genetic algorithms that target a desired ability [43,44]. Identifying statistical trends between such abilities (costs or fitness functions) and features that can be more easily manipulated prior to optimization can greatly increase the efficiency of this time-consuming process [45].
For example, previous work with disembodied CPG architectures has suggested that both gait type and the sensitivity of oscillation period to neuron bias are important factors for entrainment ability [46]. We further test this correlation in an embodied context, where there is a simultaneous goal of upright walking. To test whether our results are dependent on the morphology of the embodied system, we use two quadrupeds that differ in limb length. Wide applicability is important for interaction between robots optimized for different environments, or those that adapt their morphology in real time [47].
First, using a multi-objective genetic algorithm and a fitness evaluation during which two control parameters are swept, we evolve populations of CPGs for flexible walking speed and direction. Then, for a subset of CPGs that emphasize different components of the objective function, we incrementally evolve robust filters for rhythmic (such as audio) input. We analyze the real-time entrainment performance of the robots as a function of the CPG properties, in particular the flexibility of the gait period and pattern. Finally, we discuss the implications of our results for understanding adaptive and imitative behaviour, as well as the potentials of our approach for multi-robot systems, human-robot interaction and autonomous learning.

Neuron model
The neural model is based on the Matsuoka neuron [48], a simple and popular model for robotics [8,20,49,50,51]. This is a biologically motivated yet abstract two-variable model: where h(u) is a rectified linear unit: h(u) = 0 for u ≤ 0 and h(u) = u for u > 0. Like many biological models, there is a fast "spiking" variable (u i ) and a slow "recovery" variable (v i ). While this model produces patterns of spiking in a simple way, its linearity leads to a poor ability to adapt oscillation frequency [52]. Importantly, unlike biological neurons, the spiking rate is insensitive to changes in tonic input [53]. To address this shortcoming, we add a sigmoidal "deactivation" function to the fast variable u i where S(x) = 1/(1 + exp(x)). In addition we introduce I DC , a control parameter modelling the global brain stem input separately from fluctuating inputs I ACi and a constant offset c i . For a certain parameter range satisfying this reproduces the general nullcline shape, as well as the input-dependent firing rate, of biological neuron models [54]. The fast input is modelled: where I fb,i is sensory feedback and I in is the external input, G i is input sensitivity, and w ij is the synaptic weight and τ ij the threshold for a connection from neuron j to neuron i.

CPG and Filter modules
The layout of the quadruped is shown in Figure 1A. This is a simplified version of the CPG layout in [15] containing one flexor-extensor pair per limb and several interneuron types. Our simplified model consists of three neurons per limb: one interneuron, a leg joint neuron (A) and a knee joint neuron (B). Each limb has identical parameters, and the connection weights are constrained to obey lateral symmetry. The ranges of the 23 neuron parameters and connection weights are given in Supplementary Table S1. For the CPG module, G i = 0 and τ ij = 0 within the CPG. Depending on the connection weights, which could be excitatory or inhibitory, and the brain stem drive, the CPG can autonomously generate coordinated oscillations in the motor neurons. Flexible movement was targeted for this autonomous behaviour in the first stage of evolution (see section 2.4).
A subset of the evolved CPGs with high fitness had filter modules evolved on top (see Section 2.5). The purpose of this layer is to preprocess and distribute descending rhythmic signals I in (t) for the CPG to entrain its oscillations to. This module had only inhibitory connections, no lateral symmetry, no feedback and no brain-stem control (d i = 0). A nonzero threshold τ ij = τ 0 was imposed between the filter outputs and CPG inputs so that the CPG received no input from the filter when I in (t) = 0. The modules are unidirectionally linked by the weight matrix M, with a wider range than the within-module weights w ij (see Supplementary Table S2).

Robot simulation
The quadruped robots were simulated in the Unity game engine on a flat planar surface. The full-scale robot was based on the specifications of the Open Dynamic Robot [55]. The 'short-legged' version was identical, apart from a 40% reduction in upper leg length and a 33% reduction in lower leg length. The controllers were written in Python and interfaced with Unity using the Unity ML-agents package [56].
The CPG used a time interval of 8 ms while the physics simulation used a time interval of 20 ms, with decisions made every ∆t =100 ms. At each decision point, the CPG sent joint positions based on the changes in rectified motor neuron outputs since the previous simulation step ∆h(u A (t)) and ∆h(u B (t)): where S(x) is a logistic function, limiting the half-amplitude of motion of the upper leg and limb joints to θ lim,leg and θ lim,leg , respectively, both of which are set to 90 degrees. The coefficients A and B, and the zero-angles are allowed to evolve, however with leg zero-angles θ 0,leg always positive and θ 0,knee always negative, corresponding to a full-elbow pose. The hip joint, perpendicular to the leg and knee joints, was kept at a constant but evolvable parameter θ 0,hip . See Supplementary Table S3 for the ranges of these evolvable parameters. A control parameter θ C was also added to the leg joint angle so that the forward position of the center of mass could be controlled in real time (see next section). At each decision point, the CPG also received inputs processed from the body's tilt, to use as stabilizing feedback inputs I fb,i . The sideways tilt (the sideways component of the unit vector normal to the top of the body) was input with opposite signs to the left and right limb motor neurons, with separate coefficients q A,side and q B,side for neurons A and B, respectively. Likewise, the front-back tilt (the upwards component of the unit vector normal to the front of the body) was input with opposite signs to the front and back limb inputs, with coefficients q A,front and q B,front . These four coefficients were also evolved along with the CPG. The entire set of 32 parameters for the CPG and body was encoded as a sequence of integers, each taking a value between 1 and 10.

CPG evolution
We used the NSGA3 algorithm [26] in the DEAP Python package [57] to perform multiobjective optimization. This genetic algorithm preferentially selects non-dominated individuals (i.e. those on the Pareto front) for propagation to the next generation. Parameters used for the NSGA3 algorithm are given in Supplementary Table S4.
Unlike in [46] where each CPG was iteratively evaluated to explicitly select for change in period as a function of the brain stem drive I DC , we instead swept I DC during a single evaluation, and partially selected for variability in speed. This was to reduce the computational load of running several evaluations with constant parameters as is required for reliable period estimation, and to ensure stable walking over a wide region of parameter space. In addition, we adjusted the center of mass parameter θ C that is added to the leg standing angle during the evaluation, in order to select for flexibility of movement direction. Measurements of forward and perpendicular distance covered (y j and x j respectively) were made over three stages (j = 1 to j = 3) of length 10 seconds each, as shown in Figure 2A.
We used four fitness functions to simultaneously select for desired capabilities of the robot. The first three correspond to the targeted behaviours for each stage of the evaluation (fast backward motion, steady forward motion and fast forward motion, respectively), while the fourth selects for overall stability. The fitnesses are given as: where x 0 = √ 5 m is a parameter to punish sideways movement, y 0 = 2.5 m is an optimal forward distance for steady motion, defining the maximum F 2 and F 4 , H tot is the mean height over the entire evaluation (normalized for a maximum of ≈ 1), and t tot is the root mean square body tilt (mean length of the cross productn ×ĝ wheren is the unit vector normal to the top of the robot andĝ is the unit vector normal to the ground).
Note that F 1 and F 3 are unbounded and identical but with opposite signs for the forward distance. F 2 , meanwhile, is a quadratic function that is positive only between y 2 = 0 and y 2 = 2y 0 . To optimize all four fitness functions simultaneously, the robot must first walk backwards with a negative center of mass parameter, then walk forwards 2.5 m with a positive center of mass parameter and brain stem input of I DC = 0.5, and then accelerate with I DC increasing.
During the evolution, the evaluation was run three times per individual with random initial u i values, and the median of each fitness was taken. For each morphology, five independent populations of 168 individuals (8 individuals per edge of the reference Pareto front) were evolved for 200 generations. The final populations were then evaluated 15 times with different random number generator seeds and the median fitnesses were calculated again, as well as cross-correlations between limbs, and oscillation periods from the largest autocorrelation peak (see Supplementary Material).

Filter evolution
In order to reduce the total number of evolutions for the filter layer, a subset of CPGs was chosen from each population's Pareto front in order to capture a numerically small but diverse range of solutions. Each population was first reduced to a set of CPGs for which all fitnesses were positive, and then four were chosen from each using the maxima of four weighted fitness functions F * m . These are defined as a combination of F m and the total sum of fitnesses: where z was incremented in intervals of one until the maximum of each F * m was unique. For each of these CPGs, a 6-neuron, a filter module was evolved using the NSGA3 algorithm. The parameters consisted of 6 input weights, 24 output weights, 30 inhibitory connection weights within the layer, a shared bias term c i , and the time constant of a low-pass filter for the initial input.
The filter evolution comprised two stages of increasing complexity. The input consisted of evenly spaced impulses with every fourth impulse missing, over a total of 40 seconds. For the first 50 generations, the timings had no noise, in order to facilitate the random generation of suitable filters. After the 50th generation, a random timing offset (Gaussian distributed with a standard deviation of 2% of the period) was applied to each impulse's timing. Evaluations were made with constant control parameters θ C = 0 and I DC = 0.5.
Three input periods were used for each evaluation: T 0 /φ, T 0 and φT 0 , where T 0 is the CPG period at θ C = 0 and I DC = 0.5, and φ = 0.618. The latter was chosen so that 1/φ ≈ 1 + φ and hence the low-period and high-period inputs are equidistant from an integer multiple of T 0 .
The fitness function for each input period T in,k , with measured walking period T out,k , was calculated as where [.] indicates rounding to the nearest integer, σ 0 is the mean standard deviation of the filter output with no input, and σ t and are scaling thresholds, both set to 0.1 for the current study. Hence, the fitness is maximized for upright walking with a period of a half-integer or integer multiple of the input period. The filter evolution used a population of 92 individuals (12 individuals per edge of the reference Pareto front) for 150 generations. At the final generation, the population was evaluated 5 times and the median fitnesses were calculated. This final evaluation included two additional periods at T 0 / √ φ and √ φT 0 . The filter with the highest k Q k was then chosen for each CPG, conditional on each H tot,k being above 0.75.

CPG evolution
From the CPG evolution, 710 unique individuals were produced. As shown in Figure 2B, the fitnesses with upper limits (F 2 and F 4 ) reached these within a few generations, while the others reached a plateau close to the 200 generation mark.

Gait characteristics
After filtering out CPGs with an average height of < 0.75 (below which robots were typically judged to be crawling rather than walking, see Supplementary Figure S1), CPGs were classified into gait types. Walking gaits were defined as those with maximum inter-limb correlation < 0.3; above this threshold, trotting gaits were defined as those with diagonally opposite limbs maximally correlated, pacing gaits had left or right leg pairs maximally correlated, and bounding gaits were defined as those with front or back limbs maximally correlated. Of the 420 non-crawling individuals at I DC = 0.5, θ C = 0.016, 19.8% were classed as walking, 73.8% as trotting, 2.4% as pacing and 4.0% as bounding. The longer legged robot was more likely to develop a walking gait (29% vs 6%) or bound gait (7% vs 0%), while the short-legged variant was more likely to develop a pacing gait (6% vs 0%).

Predictors of F3
Many CPGs (14%) were able to have all-positive fitnesses, which means that they could walk backwards and then forwards at a controlled speed as the leg standing angle θ C is switched from negative to positive. Of particular interest for entrainment is F 3 , the target for acceleration with increasing I DC . This was found to be significantly larger on average in walking and pacing gaits compared to trotting and bounding, contrary to the typical order of quadruped gaits (see Figure 3A). We examined whether F 3 selected for period or gait flexibility as predicted. For both morphologies, period and F 3 had a non-monotonic relationship with interlimb correlation at I DC = 0.5, as shown by Figure 3B. A clear correlation was seen, however, between F 3 and the change in a CPG's inter-limb correlation as the brain-stem drive was changed from I DC = 0.5 to I DC = 1.0. This was shown by a linear mixed-effect model with sums and differences of the periods and inter-limb correlations as fixed effects, and replicate as a random effect (longlegged: z = −2.17, P = 0.03; short-legged: z = −3.26, P = 0.001, see Supplementary Figure  S3). This indicates that acceleration leading to a high F 3 could be achieved by moving from a correlated trotting gait towards a more efficient walk-like gait. In addition, for the shorter legged morphology, a period that shortens with increasing I DC is associated with higher F 3 score (z = −2.72, P = 0.007, see Supplementary Material).

Direction and speed tuning
The trained robots typically show regions of smooth change of speed and direction within the space of control parameters. These regions often extend outside the region of the control parameter sweeps. Two examples are shown in Figure 4. For the CPG that maximized the average CPG evolution fitnesses (Eqs 9-12), the brain stem drive I DC has relatively little effect on the movement characteristics. This illustrates that high average fitness is itself not a guarantee of general flexibility. Further outside the region of the swept control parameters, the behaviour becomes more unpredictable. For example, the low measured height in Figure  4A for low drive parameter and negative θ C indicates an instability that coincides with a gait transition, as shown by an abrupt change in inter-limb correlation in the same region. Panel (B) shows the total entrainment ability score 3 k=1 Q k /3 for all robots over the height threshold for each k, as a function of period flexibility (gradient in period vs brain stem drive, with ∆I DC = 0.2). The dotted line is the best fit from the linear regression.

Filter evolution
Each morphology had 19 CPGs chosen for filter evolution out of the planned 20. One longlegged replicate only produced three unique CPGs from Equation 13, while one selected CPG from the short-legged replicates had no measurable walking period, so an input period could not be determined.
The evolution of the filter module is shown in Figure 5A. The short-legged morphology converged to the maximum fitness sooner than the long-legged morphology. In general, it was more difficult to entrain to a rhythmic input shorter than the natural walking period, compared to a longer period input.

Predictors of entrainment
When taking the highest eligible mean of the entrainment performance Q k , a correlation was found between the period tunability and entrainment performance (linear model: t = −2.26, P = 0.03, see Figure 5B). Faster oscillation with increasing I DC therefore facilitates entrainment, while faster oscillation with decreasing I DC appears to inhibit this ability. For the highest fitness filter and CPG combinations, entrainment could be generalized to stimulus periods other than those used during evolution, as shown in Figure 6. Adjustment to the stimulus turning on or off typically occurred within a few motion cycles. To show this, time series for the leg joint angles were convolved with a Morlet wavelet at the input period, with a resolution parameter σ = 1.5, and then a Gaussian filter was applied with a width of 0.5 s.
Interestingly, the robot could generate a response that created a polyrhythm with the input (namely, two steps for every three impulses), as shown in Figure 6A.

Discussion
Our highly nonlinear, bio-inspired central pattern generator combined with multi-objective optimization was successful in both generating a variety of gait profiles and properties, as well as flexibility in these gaits for a large subset of individuals. This was despite the fact that the fitness functions did not straightforwardly translate to specific gait properties.
Trots were the most favoured gait type for both morphologies, seemingly due to this gait's ability to transition from backwards to forwards motion. The short-legged robot was more predictable, as evidenced by its smaller range of fitnesses for each gait type, and evolved faster in the case of the filter layer. Hence, it can be used as a starting point for building complex behaviour in stages [58].
Notably, the gait frequency emerged in a self-organized fashion from the interactions in the CPG network and -due to the added nonlinearities of the neural model -was also sensitive to inputs. Oscillation periods could therefore be tuned both manually via the brainstem drive parameter, and automatically via spontaneous entrainment to fluctuating sensory input. Due to the fully self-organized nature of the entrainment, this occurred much more rapidly than feedback-based approaches [5,59,42]. We expect similar results if utilizing other neuron models with input-dependent frequency, such as the Fitzhugh-Nagumo model [60] or the Rowat-Selverston model [36]. Phase synchronization to stimulus may be achieved in the future by adding feedback from reaction forces, such as with the Tegotae approach [61].
The specific results in this article can also be used to further direct the evolution of desired behaviours. For example, entrainment appears to benefit from a period that decreases with input, as occurs in single biological neurons. Restricting parameters to ensure this can increase the speed of evolution and the likelihood of entrainment. While the fitness targeting fast forward movement was significantly correlated with period flexibility, the period ranges exhibited were not as wide as when directly using the latter explicitly as a fitness function, as was done in [46]. Therefore, a combined approach where a disembodied CPG is first evolved may be beneficial in future work.
Rhythmic entrainment is a complex behaviour seen only in very few species [62], and is thought to be an integral part of the evolution of human social behaviour. Our CPG network mediated by a filter layer successfully captures this important example of cortical shaping of cyclical movements. Our heavily bio-inspired approach offers a path towards testing theories of human cognitive processes, such as beat perception, that are still not well understood [63,64]. The results we present show that dynamic attending theory, based on synchronization of endogenous rhythms [65,66], is a viable explanation for beat perception also when involving an entire distributed sensorimotor system.
On the engineering side, this approach is highly relevant to the current push towards adaptive robot behaviour [2,11,23]. In this study, the filter layer was optimized by a genetic algorithm. However, by instead adding Hebbian plasticity [67], reinforcement learning [34] or another mechanism for longer-term adaptation, it may be possible for robots to learn suitable new movement patterns through repeated imitation of robot or human demonstrators in an unsupervised manner, and hence develop useful behaviours autonomously [68]. Future work will implement physical robots with performance metrics addressing movement efficiency and stability in different physical environments. We will also explore mutual adaptation of multiple robots by using foot sensors to transmit impulses to neighbours. Notably, the fact that the communication medium is a simple time series means that coordination can occur across differing morphologies. This is also likely to increase the complexity of the behaviours, as is typical in studies of collective motion [69], and may allow for creative uses, such as human-robot musical ensembles [70].

Supplementary Material
Parameter ranges Table S1 shows both fixed parameters (single numbers) and evolvable ranges (inside square brackets) for the CPG module. Connection weights w ij are zero between limbs, apart from when i and j are both interneurons (see main text Figure 1A).
Parameter ranges for the filter module are shown similarly in Table S2. Here, Γ is the time constant of an exponential low-pass filter for the impulse stimulus.
Finally, parameter ranges for the joint activation and sensory feedback are given in Table  S3. These are evolved in the same genotype as the CPG parameters.
Evolution parameters NSGA-III parameters are shown in Table S4. Here, p c is the crossover probability and p m is the mutation probability per allele.

Period and interlimb correlation
The period was determined using the autocorrelation of a complex-valued time series z(t) = x(t) + iy(t) where x(t) and y(t) are the leg and knee neuron outputs before the sigmoidal transform respectively, and where t begins halfway through the evaluation. The time lag for the maximum correlation was extracted, with a minimum of t 0 ≈ 0.05 seconds. During the filter evolution, a maximum lag of 2.25 times the natural period was imposed. If no peak was found in this region, the CPG was excluded from further analysis.
The location of the highest cluster of average height was determined to be above 0.75, and these CPGs were judged to be upright for the entire trial (see Figure S1). The cluster of lower average heights corresponds to crawling with knee joints touching the ground, while in between were upright for only part of the trial. All CPGs below 0.75 average height were excluded from further analysis.
The interlimb correlation was determined by the Pearson product-moment correlation over the second half of the evaluation period. Only the leg neuron output was used. A four by four matrix C ij of limbs was set up for the correlation between the ith and jth limb, and only off-diagonal elements were calculated, with diagonal elements C ii being manually set to zero. Therefore if all pairs had negative correlations, the CPG gait was classified as a walk (see Figure S2).

Statistics
Statistical tests were performed using the Python statsmodels package. Figure S3 shows the lines of best fit from the linear mixed-effect model for F 3 . Normality of residuals was tested using the Kolmogorov-Smirnov test.