Global system errors to simultaneously improve the identification of subsystems with mixed data Gaussian process regression

This paper explores the use of Gaussian process regression for system identification in control engineering. It introduces two novel approaches that utilize the data from a measured global system error. The paper demonstrates these approaches by identifying a simulated system with three subsystems, a one degree of freedom mass with two antagonist muscles. The first approach uses this whole-system error data alone, achieving accuracy on the same order of magnitude as subsystem-specific data ( 9.28±0.87N vs. 6.96±0.32N of total model errors). This is significant, as it shows that the same data set can be used to identify unique subsystems, as opposed to requiring a set of data descriptive of only a single subsystem. The second approach demonstrated in this paper mixes traditional subsystem-specific data with the whole system error data, achieving up to 98.71% model improvement.


Introduction
In many control engineering problems, a model of the system can be extremely useful.Often, a parametric model is used, in which basis functions are chosen by the engineer and parameters are optimized to best-fit previous observations of the system being modeled.However, this requires some approximation of the system being modeled, which cannot always be obtained or estimated.Gaussian process regression (GPR) is a parameter free modelling technique based in Bayesian statistics [1].Rather than attempting to fit parameters to a predefined model approximation, GPR instead considers all the functions, and seeks relationships between previously observed data.
In practical cases, a variety of subsystems often interact to form a much larger system.Muscles, bones, and tendons all work together to make the human arm.Engine, suspension, transmission, and a chassis create a vehicle.A variety of sources with various fuels all provide power to civilization.In each of these cases, the control of the larger system would benefit from accurate subsystem models.
Subsystem identification is the process of using data to create models of various subsystems which interconnect to create a larger system [2].The identification of subsystems has practical uses in a variety of fields, such as meteorology [3], biology [4], and physics [5].In recent years, subsystem identification has been used many times to study and develop an understanding of the high-level internal human control system as a pilot for vehicles [6][7][8][9] and in human postural control [10].While it is possible to identify various subsystems simultaneously [11], current subsystem identification techniques rely on a previous parametric model of the various subsystems.These parameters are then iteratively updated through interaction with known subsystems [12].For example, the much-lauded sparse identification of nonlinear dynamics (SINDy) regression framework [13] requires an engineer to assemble a 'library' of potential underlying functions, and then parameters are fit based on observational data.To contrast, the method we present in this paper is a simultaneous, parameter-free subsystem identification technique which leverages data from the whole system rather than individual subsystems.
It is sometimes possible to obtain data for an individual subsystem.However, in many cases this is nearly impossible.In [14], researchers made several subsystem models with GPR to control a human arm actuated with electrically stimulated muscles.The goal was to stimulate the muscles of the arm of a paralyzed person using electricity, so that the muscles were able to hold the paralyzed research participant's arm in a pre-determined static arm configuration.The researchers used a subsystem model of the arm's inverse statics, as well as one subsystem model for every electrically stimulated muscle group.The inverse statics subsystem model could use data that was measured directly; it was possible to measure the wrench, or 6 axis forces and torques, required to hold an arm static by measuring the wrench the arm applied to a robot with no electrical stimulation.To contrast, it was impossible to directly and non-invasively measure the force-producing capabilities of the muscles, only the effects of the muscle on the arm held static by a robot.To overcome this limitation, researchers made the models with indirect data combined with engineered model manipulation.
To estimate the force-producing capabilities of the muscles, the force effects of the individual muscle group on the arm held static by a robot were measured, but these measurements contained information of both the muscle group to be modeled and the inherent inverse statics of the human arm, such as gravitational effects and joint stiffness.The researchers subtracted the predicted inverse statics of the arm from the collected muscle and inverse statics data to generate their muscle models, but make no comment on the compounding uncertainties that would result from the subtraction of Gaussian distributions [14].Additionally, this assumes that the length of the force-producing element of the muscles is only correlated with arm position.This is a fair assumption in the static, fully stimulated state of the muscles during system identification, but is potentially a worse assumption during a dynamic movement where the muscles will likely never be fully stimulated.The same researchers used this same static modelling technique to optimize dynamic trajectories for an electrically-stimulated arm [15].In this case, the average wrist (end-effector) error was 8.5 ± 2.8 cm, which is potentially outside the acceptable range of functional reaching accuracy.In both studies, the authors indicated that improved modelling may be critical to improving controller performance [14,15].
Using current GPR modelling methods, data unique to each subsystem must be collected to model each subsystem.In many cases, this is not possible; data can only be collected for the whole system.That being said, often the collected data does not fully represent the system model, instead providing an incomplete or outdated representation of the system.This can happen for myriad reasons; data may be missing from an area of the input space, the environment may have changed around the system, or the system itself may have changed.To address these cases, evolving Gaussian processes (GPs) [16] can be used.In summary, evolving GPs sample new data and tailor the training data-set to be the most informative as possible.Evolving GPs can be used with an output feedback model predictive control scheme for efficient online learning [17].
For evolving GPs to work, the training data collected at different times must be of the same type.For example, if we were modelling the inverse dynamics of a human arm, the inputs to such a model might be the human arm state and the outputs might be the joint torques.To evolve this GP model of the human arm, any additional training data must be of that same type, in which the inputs are arm states and the outputs are measurements of that exact same arm's torques.Similarly, modelling the biceps would require data specific to the biceps, which is not as readily available as global system data.
A method for generating any number of subsystem models using only data from the whole system would be extremely valuable.The error of the system, specifically the difference in the behavior according to the models and the actual behavior of the system, is a logical choice for the new type of training data.The error of the system involves all compounded errors from the subsystems.
In this paper, we demonstrate that system identification can be improved using two correlated types of training data: independent subsystem identification data, and whole-system error.We first show that our novel method of GPR can be used to simultaneously identify a number of subsystems accurately, using only the error of the whole system as training data.Additionally, we illustrate that the error measurements can also be used with data unique to each subsystem, which improve the subsystem models.
First, we present our novel GPR modelling technique using whole-system error to predict all subsystems.Then, we demonstrate our technique by using it to control a simple dynamic simulation, a mass pulled by two antagonistic muscles.We use this simple dynamic simulation in order to show prediction results compared to the ground-truth; the ground-truth is easy to find in a simulation.Lastly, we show that our technique can be extended to include mixed data, where a single prediction uses two different types of data.The prediction of a single subsystem model using mixed data has some training data that is unique to the subsystem and some data that is the whole-system error.
This paper has two goals.The first goal of this paper is to demonstrate that whole-system error can be used to generate models for any numbers of its subsystems.The second goal is to demonstrate that whole-system error can be used with subsystem-specific data to improve the prediction accuracy.The triceps can only pull in the negative direction, the biceps can only pull in the positive direction.

Material and methods
In order to accomplish our goals of demonstrating that system error can be used for discrete subsystem identification, we first summarize modelling with GPR.We then use that summary to derive a method for making the predictions of a discrete subsystem with data that only comes from the error of the whole system.Additionally, we also derive a method for using both whole-system error and subsystem-specific data types to make the same predictions.
We then use our novel prediction technique to identify a dynamic system with three subsystems: a mass actuated with two muscles.It worth noting again that the choice to use a simulation was intentional so it would be possible to calculate improvement relative to a known ground truth.We can only know the ground-truth easily in a simulation.
All dynamic subsystems and models were created using MATLAB 2019b (Mathworks, Natick, MA) on a MacBook Pro with an Apple M1 chip (Apple, Cupertino, CA).

Simulation set-up 2.1.1. Plant
The representation of the simulation can be seen in figure 1.The differential equations governing the musculoskeletal dynamics were integrated using the forward Euler method [18].
In this formulation, a is the acceleration of a mass M with one degree of freedom pulled by two muscles, the biceps and triceps.The muscles are Hill-type muscles [19], modified so the activation dynamics and series elastic element were removed.These modifications were made to simplify the analysis and improve the tractability of the experimental GPR models.Removing activation dynamics negates a need for an additional mapping from stimulation to activation state, as developed in [20].Removing the series elastic element ensures that system state is directly correlated with the biceps and triceps contractile element lengths during dynamic movements.This ensures that the inputs to the predictive model are deterministic with no uncertainty.While unrealistic to a real human arm, these modifications aid in establishing the principles of our novel modelling technique, the goal of this paper.As such, the muscles' capabilities depend on activation, a force-length curve (position x), and velocity ẋ, where the latter are combined such that x = [x, ẋ].The total muscle forces F i are the summation (F i = F KP,i + F CE,i ) of a nonlinear parallel K P,i elastic element and a contractile element which generates force given a muscle activation α i ∈ [0, 1], which acts as the control signal The maximum force a muscle can produce is a constant F Max .The actual force output is also subject to scaling from a force-length relationship h(x) and a force-velocity relationship g(ẋ).The physical constants (F Max , K P,i , x slack , etc) for the simulation were all taken from [21].

Control
To control the mass actuated by two muscles system illustrated in figure 1, 3 subsystem models made with GPR are used, following a similar procedure to [22].In this experiment, we use simulated muscles as our ground-truth, but we will follow a modelling procedure for the identification of a person's electrically-stimulated arm.Two models predict the active component of the muscles that is scaled by the control signal α in (3), or the product of F Max,i h(x)g(ẋ).The third model used to control this system is the inverse dynamics model.This model predicts the forces acting on the mass which are not directly controllable, such as the parallel elastic forces and inertial effects.The controller for this mass actuated by two muscles is illustrated in figure 2, and is similar to that used by [14].A desired state is chosen from an arbitrary trajectory.The inverse dynamics model f * I.D. is used to determine the amount of force required to move the mass along the trajectory from the queried state x * = [x * , ẋ * ].The maximum force each muscle can produce in the queried state x * is determined using the two muscle models: the triceps model f * t and the biceps model f * b .After these three values are predicted using GPR, they are used to solve the following optimization problem: where the muscle activations α = [α t , α b ] ⊤ are calculated such that the predicted muscle capabilities are scaled to ensure that the predicted force required to move the mass is met.Both the triceps and biceps, the two muscles used, have their own model predictions and their own muscle activations associated with them.We solved this optimization problem using quadratic programming [23].
After the muscle activations have been calculated, they are passed to the plant, or the muscle-mass simulation in figure 2. If there is any error in the state of the mass due to uncertainty in the three identified models, an additional force is applied to correct the error.This additional applied force from the simulated caregiver in figure 2 is equal to the error in the force applied to move the mass.
In the simulation, simulated corrector is simply a block which applies the force needed to sum to the ground truth.In reality, force error is hard to measure when the ground truth is unknown, but a human caregiver correcting the mass to a position is potentially a very convenient way to measure the force error.An alternative might be a robotic exoskeleton [24].For the purpose of this simulation, the simulated caregiver will always provide an accurate correction for an accurate force error measurement.

GPR with subsystem-specific training data
In order to facilitate an understanding of modelling with GPR, we will present the procedure for creating predictive models of the simulated system depicted in figure 1.For a full presentation of GPR, please refer to [1].
Generally, GPR is a Bayesian modelling technique that assumes a function where we are able to measure function output y at inputs x, but with the addition of Gaussian noise ϵ.
However, the goal is to learn the underlying function f.We generate our models such that the output y is a force, and the inputs are mass position and velocity such that x = [x, ẋ] as defined in figure 1.
It is worth noting that our inputs are system states, not subsystem states.This is both critical for mixing data types, as we will present later, and convenient.System states tend to be measurable, whereas subsystem states may not be.For example, measuring arm joint angles is fairly straightforward, but measuring muscle lengths during dynamic movements is challenging.
It is assumed that the noise itself in (5) has a Gaussian distribution.
This expresses that ϵ has a Gaussian distribution with mean 0 and variance σ 2 n .We assume a joint prior Gaussian distribution as defined in [1], expressed This is the joint prior of noisy outputs y i measured at inputs X i and function outputs f * i at query inputs X * .The ith model uses only the set of training inputs and outputs for the ith subsystem.We will occasionally call this data set the ith set of data in this paper.For this derivation, we have three subsystems, each with unique system identification training data.The inverse dynamics model predicts the sum external force required to drive the mass along a trajectory.The triceps muscle model predicts the maximum triceps force at a given state, just as the biceps muscle model predicts the maximum biceps force at a given state.These points of data are noisy observations from the underlying 'true' function f, which we predict f * at some query inputs X * .
Conditioning this joint prior Gaussian distribution (7) on the training data yields the prediction equation below, with further details found in appendix A.2 in [1] This is the equation used to make predictions using a GPR model.Note the data is used directly, as GPR is a direct method.
The functions expressed k(•, •) indicate the covariance of the output of the function at two different inputs.Specifically, we use the squared exponential covariance function with automatic relevance detection [25] In this presentation, δ x,x ′ is the Kronecker delta [26] and x ∈ R 1×2 is written as a row vector to match (8).Each unique model also has a set of hyperparameters {{M}, σ 2 f , σ 2 n } associated with it.The matrix M is composed of the length-scale hyperparameters l such that M = diag(l) −2 ∈ R 2×2 .This formulation is what truly implements automatic relevance detection, as the inverse of the length-scale determines how relevant an input is to the prediction.σ 2 f is the signal variance and σ 2 n is the noise-variance, which 'sometimes is not considered a hyperparameter; however it plays an analogous role and is treated in the same way, so we simply consider it a hyperparameter,' [1].
These hyperparameters are optimized during model identification, or training.During model identification, the negative log marginal likelihood is minimized over the hyperparameters.

GPR with whole-system error training data
We have just presented the fundamentals of making predictions with a model made from GPR. Training inputs and outputs, unique to the ith model, are collected.Then hyperparameters are identified, and the data are used directly for predictions.
Our goal is to use an observation of the error as training data.We first need to determine 'error' by determining how our controller will use the subsystem models.The equality constraint in (4) is the perfect candidate for defining a useful error, or an error in the output space of our models.If the constraint equation in ( 4) is satisfied, the following equation should be true Note that for all subsystems, we are expressing the underlying ground-truth function f i rather than the subsystem model prediction If our model predictions are perfectly correct and exactly match the underlying ground-truth subsystems, then (10) will be correct.However, this is often not the case, so there will be a residual.This residual is We can use this error as an additional training pair, where x E is the system state where the error was measured, and y E is the measure of the error itself.This is why it is critical that system states, not subsystem states, are chosen as the input to the models.We express the inputs as x E to emphasize that this is a measured data point that is distinct from query inputs x * .Ultimately, we want an expression analogous to (8) which leverages whole-system error training data.To start, we then assume a joint prior between our measurements of whole-system error and the ith subsystem prediction Note that we use the 'underlying whole-system error function' f E in this formulation.As before, this the the theoretical function that governs our noisy measurements y E .We can condition this joint prior, as in ( 8) on the whole-system error data observations, yielding a new predictor equation.
Using this equation, we are able to make predictions for any subsystem using just the whole system error.Additionally, we can write the new equation for the variance of the predictions.This is a particularly attractive feature of GPR, as cov( We have the expressions to generate new predictions (13) and know the uncertainty in those predictions (14), but cov(f E , f E ) and cov(f E , f * i ) must still be calculated as functions of inputs instead of outputs.These are derived in appendix A, but are presented in summary here: where k i indicates the squared exponential covariance function with automatic relevance detection (9) evaluated using the hyperparameters optimized for the ith model.Likewise, α i is a vector of N muscle activations for the ith model collected for N whole-system error data points.The muscle activations for the biceps α b and triceps α t that were delivered to the muscles during the whole-system error data collection are also used.This gives us some insight that validates the equations presented here.In ( 15) and ( 16), if a muscle is not used during a movement (α i ≈ 0) then it is likely not the source of the whole-system error.However, if a muscles is used significantly (α i ≈ 1) then we know that there should be more weight in the prediction data.Along the same lines, the inverse dynamics subsystem model is always used 'fully' , as springs and inertial forces are not changed with muscle activations.Therefore, every whole-system correction should give information about the inverse dynamics of the whole-system.

GPR with mixed training data: subsystem-specific data and whole-system error observations
We have just presented a method for making predictive models of numerous subsystems using only the whole-system error.In some cases, there is no feasible way of measuring subsystem-specific training data.In other cases, there maybe be some subsystem-specific data.We will now show how to generate predictions by mixing both these types of data together to predict the behavior of an individual subsystem.
We begin by assuming a joint prior distribution with data specific to the ith subsystem, whole-system error, and the ith subsystem prediction for some query inputs.
As before, we can then condition this joint prior on both the initial and the error data observations, yielding a new predictor equation and variance equation.

Experiments
In order to demonstrate our subsystem modelling technique, we compare predictive GPR models of a mass actuated by two muscles presented in section 2.1 to the known simulated ground-truth defined by Hill muscles.This ground-truth can be known in simulation, so we will actually have a quantitative measure of accuracy: prediction error.
The first goal of this paper is to demonstrate that whole-system error can be used to generate models for any numbers of its subsystems.To achieve this first goal, we will show that with increasing data, the prediction error of subsystem models decreases until it is comparable with 'classic' models made with subsystem-specific data.The second goal is to demonstrate that whole-system error can be used with subsystem-specific data to improve the prediction accuracy.In order to demonstrate this, we will show that the prediction error of subsystem models decreases when whole-system error data is included in the prediction.
It is worth noting that each subsystem uses its own set of hyperparameters to be used in the squared-exponential covariance function with automatic relevance detection (9).Across all trials, there is only one unique set of hyperparameters for inverse dynamics subsystem predictions, one unique set of hyperparameters for triceps subsystem predictions, and one unique set of hyperparameters for biceps subsystem predictions.All sets of hyperparameters were calculated outside the experiment in a data-rich (N = 50, 000 each) environment.

Whole-system error for subsystem identification alone
We must demonstrate that the whole-system error can accurately be 'disentangled' into its contributions from each individual subsystem.The system is simulated, so it is possible to know the ground-truth to compare accuracy.To illustrate that our method can correctly use one set of data to learn about several subsystems we show that the sums of all models' root-mean-squared prediction error decreases as the size of the whole-system data set increases.
Using the system shown in figure 1, we defined a desired trajectory that stopped and started with ẋ = 0.The position of the mass in such a trajectory was x d (t) = −0.15+ 0.35(6t 5 − 15t 4 + 10t 3 ) where t ∈ [0, 1] seconds, and the trajectory velocity and acceleration were calculated with simple differentiation.For simulation purposes, we used a time-step size of 0.001 seconds, which meant the trajectory has 1001 time-steps.
We had no prior model, so solving the optimization control law for this system (4) was not possible.Instead, for each time step, α t and α b were drawn from a uniform distribution between 0 and 1 for 0% to 100% muscle activation.This random stimulation drove the mass off the desired trajectory by applying an erroneous force.The error y E in the forces applied to the mass and the system states X E were collected.Gaussian noise with a standard deviation of 3 Newtons was added to each error measurement y E .Note that in a physical experiment the force error y E would be measured with a sensor.In this formulation we assume a perfect correction, as defined in (11).
After collecting the whole-system error data, we generated model predictions using (13) and compared them to the ground truth.These predictions were made using the same trajectory as the test inputs, We increased the sizes of the whole-system error data (y E and X E ) to see if the model prediction error declines with increased data, as expected.
To provide insight and context, we also modelled the three subsystems using 'classic' predictions made with only subsystem-specific data.These also used increasing data, but in a different way.When using whole-system error, all three subsystem models can have a prediction made from a single input-output data pair.However, when using subsystem specific data, each subsystem has a unique data set.Therefore, to generate all three subsystem models we would need three unique, subsystem specific input-output pairs.The specific data set sizes used in the trials are shown in table 1.
There were two independent variables in this experiment: the type of data (whole-system error or subsystem-specific) and the amount of data.For each combination of variables, 1024 trials were performed.We performed a 2-way ANOVA to determine which factors were significant, followed by Tukey's honestly significant difference test [27] to find which means differed significantly.a Note that when performing subsystem identification with subsystem specific data, each model has its own data set.

Whole-system error mixed with subsystem-specific identification data
The second goal is to demonstrate that whole-system error can be used with subsystem-specific data to improve the prediction accuracy.In order to demonstrate this, we will show that the prediction error of subsystem models decreases when whole-system error data is included in the prediction.
We began with a subsystem-specific data set for each of the three models.To obtain our subsystem specific data, we used the model identification procedure presented in [22], section 2.1.Though we have direct access to subsystem data in the simulated environment, we want to make our models using a process that mirrors the use of a real person's electrically stimulated arm.
To obtain the data for the inverse dynamics model, we simulated the force required to move the mass along the same trajectory, x d (t) = −0.15+ 0.35(6t 5 − 15t 4 + 10t 3 ) where t ∈ [0, 1] seconds.There is no gravity in the single degree of freedom, so the active forces on the mass were the spring elements of both muscles and the external 'robot' force driving the mass along the trajectory.There was also an inertial force.
The data for the two muscle subsystem models were obtained using a similar procedure.As the 'robot' drove the simulated mass along the trajectory, the triceps muscle was fully activated during the triceps model data collection and the biceps muscle was fully activated during the biceps model data collection.After we collected the subsystem specific data for our three predictive models, we were able to generate predictions based on system state using (8).
We devised a 2-factor experiment to evaluate the effects of using mixed data for subsystem identification.The first independent variable was the amount of subsystem specific data per model, increasing by orders of magnitude.Each model started with either 1 point, 10 points, 100 points, or 1000 points on subsystemspecific data.Note this is the data that is 'initially' in a model before receiving whole-system error, so we will refer to this as the 'initial subsystem-specific data.'The second independent variable of the experiment was the manner in which the muscles were controlled.This factor had 2 levels; the muscles were either stimulated according to the optimization control law (4), or they were randomly stimulated.In the previous section, in which we are using whole-system error alone for subsystem identification, we had no prior model, as we had no subsystem-specific data.We now are mixing the initial subsystem-specific data with whole-system error data, so it is possible to determine the error based on some initial model.If the constraints of (4) failed and the optimization was infeasible based on the erroneous subsystem model predictions, then no stimulation (α = 0) was used for that time-step.The full 8 experimental trials are presented with factors, levels, and results in table 3.
For each experimental trial, the error y E in the forces applied to the mass and the system states X E were collected.A random 1000 points of this whole-system error data was used in (18) to generate new predictions.Each set of 1000 points was generated between trials to provide the correct whole-system error measurements.The query space was the desired trajectory states such that X * = [x d , ẋd ] calculated at the 1001 time-steps.
We hypothesized that whole-system error can be mixed with subsystem-specific data to generate subsystem model predictions, so models with whole-system error would vary significantly from models which use the same amount of subsystem-specific data and no whole-system error.To test this hypothesis, we used a 2-way ANOVA with 1024 trials per experimental case to first determine which factors were significant.After, we performed Tukey's honestly significant difference test [27] to find which means differed significantly.

Results
In order to test our subsystem modelling technique, we devised two experiments.The first experiment shows that with increasing whole-system error data, the prediction error of subsystem models decreases until it is comparable with 'classic' models made with subsystem-specific data.The second experiment shows that the prediction error of subsystem models decreases when whole-system error data is included in the prediction.

Whole-system error for subsystem identification alone
To demonstrate that the novel subsystem identification method presented here can correctly use one set of data to learn about several subsystems, we show that the sums of all models root-mean-squared prediction  a As a reminder, models made with whole-system error use the same data to generate predictions, so it is possible to make 3 subsystem models with a single point of data.When making models with subsystem-specific data, each model has it is own data set.b We present our results in the form of mean ± standard error of mean.
error decreases as the size of the whole-system data set increases.To provide insight and context, we also modelled the three subsystems using 'classic' predictions made with subsystem-specific data.These results are summarized in figure 3 and presented with more detail in table 2.
The results presented in figure 3 show that with increasing data, the prediction error of models using whole-system error data alone decreases.This aligns with intuition, where increasing data on a system would lead to a better understanding of such a system.The models which use subsystem specific data also decrease, typically resulting in more accurate models.In the plot with the larger vertical axis, figure 3(a), we see the predictive models made with the same 60 points of whole-system error were more accurate than predictive models made with 20 points of subsystem-specific data, on average.In figure 3(a), it seems as the using 210 points of data results in similar error, but looking at figure 3(b), reveals that the models made with subsystem-specific data are slightly more accurate.
The errors of the three subsystem models are presented in table 2 along with the sum total.The 'Total' column in table 2 is plotted against the 'Total data pts.' column to create figure 3 and is the sum of three model errors in each row.Looking at this table, we can validate the results from figure 3 at the model level.The average prediction errors in all subsystem models using limited whole-system error are large, but the average errors decrease with more data.We again see that using subsystem-specific data tends to result in more accurate models.
The results of the 2-way ANOVA indicate that both the data type (p < 0.001) and data amount (p < 0.001) were significant factors.Additionally, the results of the Tukey's test indicate that models made with whole-system error and 210, 1200, or 2100 points of data did not significantly differ from each other.Similarly, models made with subsystem-specific data and 210, 1200, or 2100 points of data did not significantly differ from each other.All other models had significantly different marginal means.

Whole-system error mixed with subsystem-specific identification data
In addition to being used as data for subsystem identification alone, we showed that whole-system error can be mixed with subsystem specific data to generate predictions.This might be necessary in cases where some subsystem specific data was previously gathered, but it may be more efficient to gather whole-system error data, such as performing system identification on an electrically stimulated human arm.To test our technique, we created subsystem-specific data sets for our three subsystems, ranging from 1 to 1000 points of subsystem-specific data per model.We then calculated the prediction error using these subsystem-specific data sets, and prediction error for data sets with the same subsystem-specific data, but with the addition of 1000 points of whole-system error data.The results are presented in table 3, where we show the error reduction with the inclusion of whole-system error data.For these purposes, 'error reduction' is the difference in the prediction error before and after including whole-system error data, normalized by the prediction error before including whole-system error data.In this case, a negative error reduction means the model was made worse by the inclusion of whole-system error data.
When the whole-system error data was collected while the system was randomly stimulated, the subsystem models improve in a manner that generally matches intuition.For example, experimental rows 1-4 in table 3 shows the percent reduction in error after including 1000 points of whole-system error data.In the cases where there was a low amount of initial subsystem-specific data, there is a greater error reduction when the 1000 points of whole-system error are included.If there is less starting data, the addition of data improves the prediction by more.As there is more initial subsystem-specific data, the improvement drops.When the model started off more accurate from a larger starting data set, adding in whole-system error data had less of an impact.
To give insight into model improvement, we present representative results of the effect of mixing whole-system errors with subsystem-specific data for prediction in figure 4.These are models of the three subsystems these experiments try to predict, plotted as model prediction vs. input position.Note the test space of the models includes position and velocity, but it is convenient to view models as two-dimensional.The green models are the ground truths of each subsystem, so exactly covering the green line is indicative of a perfect model prediction.The red models are predictions made with just 10 points of subsystem-specific data per model.The black lines use those same 10 points of subsystem-specific data (per model) with an additional 1000 points of shared whole-system error data.These 3 models are 1 of the 1024 trials represented by row 2 in table 3. The inverse-dynamics model improves throughout the entire predictions space by 78.32%, as the black line is a much closer match to the green line.This is an outlier compared to the 20.93% mean inverse-dynamics model improvement.There are definitely still large prediction errors near the limits of the position input domain, yet its presents a great improvement in calculating the reference force for the controller.The triceps model, which has a much different vertical scale than the inverse-dynamics model, is significantly improved with the addition of whole-system error training data, by 98.68%.This is higher than the 86.38% mean improvement of the triceps models.To contrast, the biceps model prediction, with only 10 points of subsystem-specific data, is fairly close to the ground truth even before including whole-system error data.It is difficult to see with the scaling of the vertical axis, but the prediction is indeed improved with the addition of whole-system error training data by 56.63%, which is lower than the 81.18% mean improvement of the biceps models.
We hypothesized that whole-system error can be mixed with subsystem-specific data to generate subsystem model predictions.We performed a two-way ANOVA with repeated measures.Both the method of muscle activation (p < 0.001) and amount of initial subsystem specific data (p < 0.001) were significant factors in the error reduction from including whole-system error.We can confirm our hypothesis that whole-system error can be mixed with subsystem-specific data to generate subsystem model predictions.The results of Tukey's honestly significant difference showed that when 1000 points of whole-system error, generated while the muscles were randomly stimulated, there was no significant difference in starting with 100 or 1000 points of subsystem data.Additionally, there was no significant difference in the marginal means of 100 and 1000 points of initial data when the whole-system error was generated according to the control law.

Discussion
This paper had two goals.The first goal of this paper was to demonstrate that whole-system error can be used to generate models for all of its subsystems in a system made of 3 subsystems.The second goal is to demonstrate that whole-system error can be used with subsystem-specific data to improve the prediction accuracy.
First, we presented our novel GPR modeling technique.Our technique is different from classic GPR in that we can predict properties of subsystems using information about the error of the whole system.The key to using whole-system error to identify unique subsystems from a single shared data set is being able to 1) analytically define the error as in (11) and 2) find the covariance of this error with the output of any model as in (15).This has the potential to be incredibly useful; it could be possible to identify any number of subsystems from a single set of data.It could also be possible to model subsystems that are not directly measurable, such as an individual muscle's force producing capabilities.We also derived a method for using whole-system error mixed with the traditional subsystem data.This could be useful in a case when there are some observable subsystem properties, but it is more convenient to collect whole-system error.
The method we present here is unique from other common simultaneous subsystem identification techniques in that our method leverages the power of parameter-free data observations.This contrasts the subsystem modelling techniques presented in [6][7][8][9][10][11][12], which rely on an engineer to pre-select models; parameters are then identified from data.If the wrong base model is chosen, the system may be improperly identified.The SINDy regression framework [13] seeks to overcome that limitation by considering a 'library' of basis functions.This allows physical laws to be explicitly modeled, which is not possible with a nonparametric approach such as GPR.However, in [11] the authors note that SINDy cannot handle large libraries of functions robustly.If the correct underlying functions are not included in the library, the system may still be improperly identified.Our technique leverages the nonparametric nature of GPR to remove the influence of possibly flawed basis functions.Additionally, it uses a single, shared data set from the whole-system to simultaneously identify unique subsystems.This is promising for cases where collection of subsystem data is challenging or infeasible.Nonetheless, the method presented here is nonparametric and cannot be used to explicitly identify physical laws.Semiparametric Gaussian procress models may overcome this limitation; they have been shown to perform better at the identification of a person's electrically stimulated arm than a parametric and nonparametric model [22].The method presented in this paper could potentially be used to strengthen this approach.
We tested our technique by modelling a simulated example with a known ground truth to measure accuracy.The biomechanical system was composed of a mass actuated by an antagonist muscle pair moving with one degree of freedom.We used a simulated example so that we could compare our results to a ground truth value to determine accuracy directly.We discuss our results here, however we also use analytical math to facilitate the discussion.In the interest of readability, these analytical examples are presented in appendix B.

The whole-system error must be collected in different control states
We have demonstrated the key to effectively using whole-system error seems to be randomly activating the muscles.In the first experiment, where all whole-system error data was collected during random stimulation, we saw improvement with increasing data.In the second experiment with mixed data, once more we saw improvement when using whole-system error data from random stimulation mixed with low amounts of subsystem-specific data.When using a low amount of subsystem-specific data and 1000 points of whole-system error data that was collected while controlled according to the control law (4), the inverse dynamics model showed improvement.However, the muscle models improved by a lesser degree in these experimental conditions.In fact, even in the case where one might expect the greatest improvement with the addition of data (starting with a single point of subsystem specific data), many muscle models were made significantly worse on average as made evident by the standard deviation of model reduction.As the initial model improves, whole-system error that was generated while the muscle activations were optimally stimulated becomes less and less impactful to the prediction.
A full explanation of this phenomenon is analytically given in appendix B in (29).In summary, a control signal associated with a subsystem model (the triceps muscle activation, for example) is used to make model predictions.Similar control signals cause data from whole-system error observations to cancel out, so the system must be observed in different control signal states.This generally makes sense because the model is learning the contribution of different actuators at different states.If the models only see one state (activated at 100% for example), it will only learn that single state.
In short, we assume whole-system error is a GP which is a linear combination of the underlying GPs of the subsystems (arm inverse dynamics and individual muscles).To 'disentangle' data from this single GP into the several underlying models of interest, it is helpful to collect drastically dissimilar whole-system data.

Global system errors can be used as training data for subsystem identification alone
We showed that in addition to improving models which already had initial subsystem specific data, global system errors can act as training data on its own.This is most evident by the results shown in figure 3, where increased data from global system error data led to increasingly accurate model predictions for three subsystems.This indicates our math is correct; one set of system-level data can be used to identify indirectly observed subsystems.This is a significant contribution of this work; one type of data signal, the global system error, can be correctly untangled into the contribution of multiple subsystems.This is potentially more data efficient, as there is only one set of training data for generating these three models, as opposed to three subsystem unique sets of data.Additionally, this is significant in the many cases where a subsystem can be controlled, but not directly observed, as is the case when controlling muscles with electrical stimulation in a person's arm.

Global system errors as training data can improve the prediction of all subsystem models simultaneously
When the whole-system error training data used for prediction with mixed data was collected while the system was randomly stimulated, the subsystem models improved when the initial subsystem-specific data set was small, e.g. when there was more room for improvement.However, subsystem-specific data still creates more accurate models.The models which started with high amounts of subsystem-specific data had less room for improvement.The addition of whole-system error into these models made them worse, as the models were very accurate before the addition of whole-system error, and the whole-system error data added additional noise.

Not all data are created equal
It is glaringly evident from both experiments that subsystem-specific data generally 'works' better.In the first experiment, the subsystem-specific data generates more accurate predictions for every case except when 60 points of data were used.We expect that the error of the subsystem specific data would converge to 0 if data was continuously added.
The idea that subsystem-specific data gives more information per data point is also evident in the results of the second experiment with mixed data.The amount of subsystem-specific data seems to be the better predictor of the accuracy of the three models.The improvement from including whole-system error (in the case of random muscle activations) is evident in the cases of low amounts of subsystem-specific data, but in cases with high subsystem specific data, the additional noise added by the whole-system error actually made the models worse.We expect that there would be model improvement with additional points of whole-system error data in these 'high initial data' cases.The results of Tukey's test on the mixed data experiment supports that subsystem-specific data is has more 'value' per data point; when there was high amounts of subsystem-specific data, there was virtually no value added by the additional whole-system error data.
The concept of information gain is critical to evolving GPs [16].Generally, the goal of an evolving GP is to trim the least informative data to maintain some performance metric.To determine which data to remove from the set, the information gain of each data point should be evaluated.The data with the lowest scalar information gain score are removed.This concept illustrates that some data are more informative than others.
Using the idea of information gain can help elucidate further questions raised by the work presented here.What is the information gain per point of whole-system error data?What is the effect of the number of subsystems?In this work we had three subsystems; what if we had 10? 50?How many points of whole-system error data have the same information gain as one point of subsystem specific data?
When making a prediction using whole-system error data, the muscle activation is used.In this simulated work, we maintained physiological realism by constraining each muscle activation between 0 and 1.What if we are working with a system where this is not a concern?In such a case, can a single point of whole-system error data have more information gain than a subsystem-specific data point?
Information gain is also potentially critical for connecting the basic concepts modelling technique presented here to a more practical implementation.In this work, stimulation and muscle activation were instantaneous, and therefore synonymous.Additionally, we removed the series elastic element In reality, activating a muscle is a dynamic process that is a function of stimulation, current activation state, and time [19].What are the effects of muscle activation dynamics?In this work, random stimulation is used to collect data from drastically different activation states.How can this be comfortably implemented in an individual?4.5.Limitations 4.5.1.We assumed near-perfect whole-system error measurements We defined the error of the whole system as the violation of our contoller's equality constraint (11).In our simulation, we were able to exactly measure the error and add sensor noise with the same magnitude as the subsystem-specific data.In reality, this error measurement would be generated imperfectly.Potentially the easiest implementation for measuring this error is with the use of a force-torque sensor on the wrist a paralyzed person with an electrically stimulated arm.A caregiver or family member could visually detect the error in the arm's movement, and apply a force to correct the arm at the force-torque sensor, thereby generating an error measurement and also aiding the paralyzed person in completing their reach.In this formulation, the human corrector is likely unaware of each time-step's reference position.Moreover, they are also unlikely to provide the correct force to ensure the arm remains on the trajectory calculated by some high-level controller.The human caregiver may be able to ensure the arm completes its trajectory at the correct final position, but the intermediate time steps will likely be less accurate data.This would likely necessitate the identification of a noise hyperparameter specific to the whole-system error data.
A robotic exoskeleton has been shown to correct the trajectory errors of arms controlled with electrical stimulation [24].The results of this study indicate that a robotic exoskeleton will use less power when combined with electrical stimulation, as the electrically stimulated arm will move itself in part.The authors indicate that the exoskeleton was used in part to correct the arm's movement resultant from imperfect electrical stimulation.The results also show plenty of exoskeleton torque chattering when combined with electrical stimulation, indicating that the exoskeleton was used frequently over the short trajectories to correct the arm.These torque corrections were shown to be very accurate to the trajectory, but were used exclusively to correct movement.With the novel subsystem identification technique presented here, the researchers could have been improving their arm subsystem models with all the torque corrections they implemented.

The plant had physiological simplifications
The removal of the series elastic element in the Hill model was made to remove uncertainty in the inputs of the system.The ground-truth Hill contractile element depends exclusively on it is length and velocity.If an additional spring element were added in series with the contractile element, there would be moments throughout the dynamic trajectory where the muscle length and velocity would change and the mass position and velocity would not change in any significant way for a time-step or two.While GPR can be performed with uncertain inputs (see chapter 9 of [1]), we decided using deterministic inputs would be more beneficial in a study about the early principles of a novel modelling technique.Note that this assumption is implicitly made with the use of static subsystem identification in human arms in [14,15], and [24].We thought it best remove this assumption with the removal of the series elastic element.

Hyperparameters were pre-computed
All trials used a standard set of pre-computed hyperparameters.In a real modelling hyperparameters would be computed to maximize the negative log marginal likelihood [1] of any given set.However, hyperparameters which differ between data sets would introduce an extra dependant variable to both studies and increase the variability between experimental results with the same measures.

Scalability to larger systems
The system used to demonstrate our subsystem identification technique in this paper was composed of three subsystems acting in a single dimension.We used this system to be able to present and discuss specific subsystem results.
We expect this technique to scale directly to systems with larger amounts of uncorrelated subsystems and more dimensions.The key component of the methods presented here is the ability to write down the cross-covariance of the whole-system error with subsystem-specific predictions, like ( 15) and ( 24) in appendix A. To be able to do this, the subsystem models' inputs should be written in terms of the whole system so that the subsystems and whole system share inputs.If the subsystem models are assumed to be uncorrelated, as was the case in the experiments presented here, then the cross-covariance between subsystems is 0 and terms simplify, as shown in ( 23), (24), and (26) in appendix A. A realistic example of such a multi-dimensional system is the electrically stimulated arm of the paralyzed person in [14].This system is a whole arm composed of 10 subsystems: one inverse-dynamics system and nine electrically stimulated muscles.These subsystems' output space are Cartesian forces in R 3 measured at the wrist.The GPR models used in [14] are made such that each model's input is the arm's configuration, and the output is one of the three force directions.Therefore, each subsystem has three models, and a total of 30 models are used to generate arm predictions.To implement our technique in such a system, the whole-system error data would have to be decomposed into each of the three force directions to match the output spaces of the models.Then the whole-system error can be used to update all 10 models of the system which predict wrist force in one of the three directions; the x component of the whole-system error can update the 10 models that predict wrist force in the x direction, the y component of the error would be used for the 10 models in the y direction, etc.
The method presented here is also readily scalable to subsystems that are correlated if they share input space with the whole system.In the derivation of cov(f E , f i ) in appendix A, we make the assumption that the model outputs are uncorrelated.For the system used in this study, that seems to be a good assumption; the force capability of one muscle does not depend on another muscle.This might not always be the case across various systems.In the case where the subsystems cannot be assumed to be uncorrelated, the technique presented here can still be used.Rather than assuming cov(f i (x E ), f k̸ =i (x)) = 0 as we did in the derivation in appendix A, these cross-covariance terms should be kept in the formulation.If there is cross-covariance between subsystems, then the subsystem identification problem is likely best formulated as a multiple-output GPR [28] problem.Whole-system error, in such a case, can still be readily used as we have presented it here.
The methods presented here is not readily scalable to systems which do not share input space.Some systems, such as those addressed in [29], are composed of subsystems whose inputs are the outputs of other subsystems.An example of such a system is presented in [29].The amount of sea ice varies with time, but also depends on the amount of atmospheric carbon dioxide and global temperatures.In turn, global temperatures depend on time as well as the amount of carbon dioxide in the atmosphere.In such a formulation, carbon dioxide only depends on time.The technique for learning subsystems from whole-system error presented in this paper is likely not sufficient to address these 'nested' subsystems, where the output of one subsystem is the input to another f 3 (f 2 (f 1 (t), t), t).In the GPR formulation presented in this paper, all subsystems share an input space with the whole-system.This shared input space is how whole-system error measurements can be attributed to the various subsystems.Using whole-system error to improve the identification of 'cascading' subsystems, rather than the 'split' subsystems presented in this paper, is a fascinating future direction.
Recall that whole-system error is defined as We can use this error in (18) to generate new predictions, however cov(f must still be calculated.These will be derived using the following property of covariances [30]: We can also define U to be f E and Y to be f, so we can say From this, we see U 1 = f ID (x E ) and U k+1 = f k (x E ), for k muscles.For the constants of (21), we can see a 1 = 1 and a k+1 = −α k .We have a single Y term, so Y 1 = f i (x) and b 1 = 1.An important assumption we make is that each muscle is uncorrelated, which lets us cancel out each term except where the covariance of the ith and ith model is taken, cov(f i (x E ), f k̸ =i (x)) = 0.This lets us determine the covariance below: This same derivation can be used to find cov(f E , f * i ), so we can write that The last component we need is cov(f E , f E ).We use the same definition given by equation (21).We begin by defining our observed torque twice, using general subscripts for arm configurations to illustrate they are not necessarily the same configuration: In order to match the terms of (21), we say: With our terms defined we can then substitute them into (21).Again, an important assumption we make is that each muscle is uncorrelated, which lets us cancel out each term except where the covariance of the same model is taken.
where n is the number of muscle models.This means that each model, whether modeling a muscle or passive arm dynamics, will have the exact same cov(f E , f E ) component.

Appendix B. Data sampled at similar activation states and similar inputs cancel out
The goal of this appendix is to give an analytical explanation for one of the key results of the study.One of our primary goals in this paper was to demonstrate that system identification can be performed using two correlated types of training data: independent subsystem-specific data, and whole-system error.One of the results of this study shows that when the whole-system error is generated with optimized control signals, the model prediction does not improve, but rather remains the same or gets drastically worse.However, when the error is measured while control signals are randomly applied to the system, the data is good and predictions improve.This appendix will show the reason for this by stepping through simplified representative cases of the predictions.We begin by using the equation for making predictions of query data points (18), restated here for convenience: Here the covariances of the whole-system error with the query points X * and subsystem-specific training data X i remain in the generalized presentation.We will use the covariance functions presented in ( 23), (24), and (26) for this example, but the principle at which we will arrive is likely true regardless of system.Note the covariance functions are presented in the matrix form in (15) and (16).
To simplify the various cases used as examples in this appendix, we will set the hyperparameters for the covariance function (9) as {{M}, σ 2 f , σ 2 n } = {{M = 1 −2 }, 1, 0}.There is one length-scale hyperparameter l = 1, illustrative of the fact we have R 1 as an input space.There is also no noise in this signal.
In the various cases in this appendix, we will predict an actuator (muscle) model.The choice of actuator (biceps or triceps) is arbitrary for this example.We will vary if our prediction space x * is close to our subsystem-specific training data input x or our whole-system error training data input x E .

B.1. Case 1: Triceps model, x = x * = x E
In this case, we are evaluating a prediction made at the exact same model input for which we have both classic, subsystem-specific training data and a whole-system error training point.We will begin this case by evaluating the equations for covariance (9), ( 23), (24), and (26), using our simplified hyperparameters Next, we follow order of operations and evaluate the inverse of the middle 2 × 2 matrix While we know that matrix multiplication is associative, we still multiply the first 1 × 2 vector by the middle 2 × 2 matrix It is at this point that we begin see the reason that using whole-system error that was observed during optimal control of the system results in predictions which are barely different from predictions which use the same initial training data but no error training data.In the above example, there was one input-output pair of triceps subsystem-specific training data.There was also one training data point of whole-system error.We can see that the prediction above f * t , will predict exactly the triceps specific training output, y t .The first element in the left 1 × 2 vector in (29) will become 1, and the second element will become 0.
This example formulation queried a single point, the triceps-specific training data was composed of a single input-output pair, and the whole-system error training data was composed of a single input-output pair.If there were more query points for this case where x = x * = x E and we are predicting an actuator model (specifically the triceps), then the left-most vector in (28) would become a row vector with more elements.The first element would be 1, and the 2nd through k elements would be different control signals used during k observations of whole system error, α t .If each of the k observations used control signals that were optimized, they would arrive at the same predictions if the control signals are approximately the same.Randomized control signals definitionally have no intent, and therefore allow the system to be observed at different subsystem states.This is the beginning of showing the proof for one of the core princples: the data from error is a function of the control signals.Observations of the error at the same input with the same activation state, or same control signal, will cancel out.Therefore, the subsystems must be observed at different activation states as well as different model input states.

B.2. Case 2: Triceps model, x ̸ = x * = x E
In this case, we are evaluating a prediction made at the exact same model input for which we have a whole-system error training point, but that is very far (in input space) from where we collected our initial subsystem-unique training point.As such, k(x, x * ) = k(x, x E ) = ϕ, where ϕ is a very small number.We will begin this case by evaluating the equations for covariance ( 9), ( 23), (24), and (26), using our simplified hyperparameters As before, we follow order of operations and evaluate the inverse of the middle 2 × 2 matrix Now, we will evaluate the multiplication of the first 1 × 2 vector by the middle 2 × 2 matrix.At this point we will also make the approximation that ϕ ≈ 0, as we have completed the matrix inversion and are no longer at risk of dividing by 0 ] [ For the sake of completeness, we will now show the final evaluation, remembering that , where f t is the ground truth function governing the triceps While it may be challenging to see how this formulation eventually reaches the desired prediction of f * t ≈ f t from this simplified case with limited training data, this will be true as the amount of training data increases.This is evident from the results presented in figure 3, where with increased data from whole-system errors, the muscle contribution is untangled accurately.

Figure 1 .
Figure1.This is the one-degree of freedom mass pulled by two muscles.Each muscle has a contractile element CE which can pull with force proportional to the activation α, which is also the control signal.Each muscle also has a muscle stiffness in parallel, KP.The triceps can only pull in the negative direction, the biceps can only pull in the positive direction.

Figure 2 .
Figure 2. A simple controller was used to move the mass in one degree of freedom, leveraging predictions of muscle behavior.

Figure 3 .
Figure 3.The sum of the root-mean-squared errors of the 3 subsystem models are shown here, along with the standard errors of the mean.Both the experimental models made with whole-system error data alone and the control models made with subsystem-specific data start off with a large error.The error decreases similarly between the two types of models, but the models made with subsystem-specific data tend to have a lower error.

Figure 4 .
Figure 4.These are models of the 3 subsystems these experiments try to predict.The triceps model improves the most.The improvement of the biceps model is difficult to see as the black line is plotted on top of the red, which is atop the green.This model also uses a large scale on the vertical axis.

Table 1 .
These are data set sizes to show that whole-system error can be used for subsystem identification.

Table 2 .
This table summarizes the experiment and results of testing the effects of using whole-system error data.

Table 3 .
This table summarizes the experiment and results of testing the effects of using mixed subsystem-specific and whole-system error data.Note this is the amount of subsystem specific data per model.So the actual data storage requirement is three times the number given here.bWepresent our results in the form of mean ± standard error of mean. a