Coordinating upper limbs for octave playing on the piano via neuro-musculoskeletal modeling

Understanding the coordination of multiple biomechanical degrees of freedom in biological organisms is crucial for unraveling the neurophysiological control of sophisticated motor tasks. This study focuses on the cooperative behavior of upper-limb motor movements in the context of octave playing on the piano. While the vertebrate locomotor system has been extensively investigated, the coherence and precision timing of rhythmic movements in the upper-limb system remain incompletely understood. Inspired by the spinal cord neuronal circuits (central pattern generator, CPG), a computational neuro-musculoskeletal model is proposed to explore the coordination of upper-limb motor movements during octave playing across varying tempos and volumes. The proposed model incorporates a CPG-based nervous system, a physiologically-informed mechanical body, and a piano environment to mimic human joint coordination and expressiveness. The model integrates neural rhythm generation, spinal reflex circuits, and biomechanical muscle dynamics while considering piano playing quality and energy expenditure. Based on real-world human subject experiments, the model has been refined to study tempo transitions and volume control during piano playing. This computational approach offers insights into the neurophysiological basis of upper-limb motor coordination in piano playing and its relation to expressive features.


Introduction
Biological organisms have evolved over billions of years to flexibly adapt to a complex environment with high uncertainty and indefinite richness.Understanding how the multiple biomechanical degrees of freedom are coordinated to achieve sophisticated motor tasks is a central topic in neurophysiological motor movement control [1][2][3].The control of a biological system is not merely the outcome of internal control but a strict coupling among the agent's brain (such as the central nervous system), its own biomechanical body, and the situated environment [4,5].Neurophysiological studies have shown that vertebrates are able to take advantage of their own body to achieve complex locomotion without the central brain such as the lamprey [6,7], decerebrate cat [8,9] and salamander [10,11].For natural organisms, it is critical to rapidly coordinate different locomotor patterns to mitigate the impact of perturbations and recover from failures [12,13].When compared to robot systems, biological systems are more adaptive and versatile in handling complex and dynamic conditions [1,14,15].
For mammals (e.g.cats, dogs, and humans) a number of essential behaviors such as respiration, mastication, walking, and swimming are controlled by a reciprocal and dynamical coupling among the rhythmic activities of a nervous system, a musculoskeletal system and the environment [12].It has been commonly viewed that the vertebrate locomotor system is organized hierarchically, with central pattern generators (CPGs) in the spinal cord generating basic rhythmic patterns and higher-level centers such as the motor cortex, cerebellum, and basal ganglia adjusting these patterns in response to environmental The concept of coordinated motor movements within the piano playing task.The nervous system has efferent connections (via the axon of motor neurons) to the muscles through the musculoskeletal junction and receives afferent feedback.The muscle fiber bundles are subject to contractile movement, allowing the skeletal system to act on the environment (piano-playing task).The synergy of the coupling among the controller, body and environment can achieve self-organized pattern transitions for multiple expressive styles by tuning simple control parameters that are nonspecific to the movements.(B) The ground-truth human piano player and (C) the proposed simulated pianist with coordinated multi-motor upper limbs in a virtual prototype.perturbations [16][17][18][19][20][21].Inspired by the spinal cord neuronal circuits, CPG-based multi-motor coordination has been extensively implemented in biological systems [7,10,16] and robot systems such as swimming [22][23][24][25], hopping [26], biped walking [27][28][29][30], and arm control [31][32][33].Although the coordinated vertebrate locomotor system has been investigated, the understanding of the coherence, harmony and precision timing of rhythmic movements in the upper-limb system is still far from complete.
Piano playing is such a dynamically rhythmic behavior that requires complex coordination of upper-limb motor movements and the contractions of upper arm and forearm muscles during rhythmic keystrokes.Skilled human pianists are able to produce complex finger sequences with high spatial, serial, and temporal precision and accuracy at a wide variety of performance rate [34][35][36][37][38]. Humans employ the primate corticospinal system to control individual finger and arm motions in order to master playing the piano [39][40][41], during which the nervous system activates corresponding muscles through neuromuscular junction such that the skeletal system is able to interact with the piano (figure 1(A)).The ultimate keystroke is produced by coordinated movements of the entire upper limbs including wrist flexion/extension, elbow pronation/supination, shoulder elevation, finger abduction/adduction and so on [42].Human pianists are known to modulate movement patterns that result in different expressive dynamicsattack speed, attack touch, articulation, key sustains and releases, and tempo at the microstructural level [43,44].However, it is still unclear how the upperlimb movement pattern is related to expressive features in piano performance.
A series of physiological experiments have been conducted with the aim to elaborate the human piano keystroke behaviors [1,37,45,46], in which the movements of human pianists had been widely investigated, in particular their motor coordination involved in rhythmic octave strikes at different tempi (the 'Bernstein Problem' [1]).However, only a qualitative pattern transition of upper-limb movement between distinct piano key strikes had been observed: in slow and medium keystroke tempo, the arm swing exhibits a pattern of two coupled active oscillators where both the hand and forearm move under the action of the active muscles, while at a fast tempo (6.5 strikes per second or more [45]), the hand motion is transformed into a pattern of forced elastic oscillation.However, since then there has been no quantitative explanation of the origin of the pattern in slow and medium tempi and how the pattern transition happens.In this work, we aim to explain the neural activity and muscle recruitment within pattern transition and reveal how the various motor movement pattern influences piano expressiveness.
In this paper, we proposed a computational neuro-musculoskeletal model to study human coordinated upper-limb motor movements for various expressive piano-playing behaviors.We have the following hypotheses.First, the coupling between the nervous system, mechanical body, and piano environment offers an adjustable motor coordination mechanism.This has a significant impact on the quality of piano performances, especially concerning tempo and volume during octave sessions, as evidenced by MIDI tests.Second, by changing the modulation of the spinal cord-inspired CPG parameters, the pianist can spontaneously switch between a range of coordinated piano playing patterns.Third, the double-hump pattern witnessed specifically in slow and medium tempi is a result of an attempt to modulate the musical quality.It is difficult to verify all these hypotheses as it is extremely challenging to accurately track the sophisticated movements and neuro-muscular activities in vivo [47,48].A neuromusculoskeletal model in silico with a bridged simto-real gap is promising to address this challenge.Piano playing is a challenge that is particularly interesting for humans as it requires extreme dexterity, adaptability, and behavioral richness to achieve a range of expressive playing styles [49].A number of researchers have aimed to build both physical prototypes [50][51][52][53][54][55][56][57], data-driven virtual prototypes [49,58] and non-data-driven simulation approach [59] to manipulate this complex musical instrument.However, they have failed to mimic the biological neural and muscular activities as well as the complex passive dynamics between the interaction of a physiological-accurate body and a piano, resulting in a significant reality gap between the model and physics.To simulate the biological rhythm generator for multi-motor systems, a number of optimized CPG algorithms [31,32,[60][61][62][63] have been proposed for energy-efficient motor synchronization.
Our proposed neurophysiological model is composed of a CPG-based nervous system (i.e. the controller), a physiologically-informed mechanical body, and a piano environment, such that the simulated human musculoskeletal model is able to mimic multiple spontaneous pattern transitions for various expressive styles.Primarily, we designed the CPG-inspired neural rhythm generator based on the recurrent synaptic connections (i.e.reciprocal excitation and inhibition) of Matsuoka's neuronal oscillatory network [64,65].We simulated the signals of spinal reflex circuits, which mediate afferent and efferent connections to muscles via the neuro-muscular junction [66].For the biomechanical body, we employed the Hill-type muscle model [67].It is a computational model that is able to replicate the concentric and eccentric contractions of real human arm muscles with accurate forcelength and force-velocity relations.An optimization framework considering piano playing quality (characterized by musical MIDI) and energy expenditure [68] has been implemented.In addition, we have replicated Bernstein's experiments by investigating a human pianist's piano playing performance, benchmarked on which we optimized our model and studied a wider range of biologically spontaneous pattern transitions.

Anthropomorphic piano playing
The simulated pianist is established in a virtual environment based on a genuine piano-playing scenario including the neural oscillatory network, the anthropomorphic upper limbs, the muscle-tendoncomplex (MTC) and the piano environment.A fullsized piano environment is duplicated with identical design specifications to a genuine grand piano to ensure geometrically precise placement of torque and force within keystroke motions.We then introduced a physiological-realistic upper-limb model based on Hill-type muscle and an anatomically correct arm model with genuine flexion-extension behaviors during keystroke arm-swing activities, as well as the modeling of complaint dynamics for all upper-limb hinged joints such as the fingers, wrist, elbow, and shoulder.Finally, true acoustic music production is mimicked via a digital duplication of MIDI-based parameterization.The entire anthropomorphic keystroke actions are mimicked using a numerically computational platform named Simscape Multibody (MATLAB).

Piano environment
A full-scale modern piano is modeled in the simulation with a row of 88 keys that includes 7 C-major scale octaves, i.e. 52 white keys and 36 shorter black keys with chromatic note arrangement.Each note is modeled independently as a cuboid shape with the density of the wood.The dimensions are based on the actual geometric specifications of a grand piano, with black keys measuring 13.7 mm and white keys being 23.5 mm.Each piano key is hinged on one side by a rotational joint with an equilibrium position of 0 • , permitting only earthward rotation with a maximum moving angle of 5 • .The parameters for the constrictive joint, which is modeled as a massspring-damper system, are determined based on the model identification [49].The finger-key interaction is replicated as a collision between the hand's distal joints and the piano blocks, with friction defined to provide resistance to prevent relative motion.

Humanoid limbs and joints
In order to simulate the muscular activity in keystrokes, a computational forward-dynamic model [67] has been exploited to replicate the aforementioned muscles in a biomechanical approach.Figure 2(A) depicts the upper limb motions involved in piano playing.The Hill-type muscle model is comprised of four components: a contractile element (CE), a parallel elastic element, a series elastic element, and a serial damping element.Figures 2(B) and (C) illustrate the four-element Hill-type muscle model and the force output, respectively.The CE represents the muscle's active fiber bundles.The series element depicts the tendon and the myofilament's intrinsic elasticity.The model takes into account various factors such as force-length and force-velocity relationships, high-frequency oscillation damping, and shock absorption.The output is a one-dimensional force, while the model's inputs include MTC length, contraction velocity, and neural excitation.
The proposed simulation platform makes use of the anatomical structure of genuine human upper limbs with delicate morphological characteristics.The linking joints, on the other hand, are modeled as revolute joints that are torque-actuated by the Hill-type muscles.The ligaments, which offer shock absorption, are replicated by imposing stiffness and damping to the joints [69,70].In order to mimic the multi-joint control of human arms, abundant constraints and dynamics have to be considered to replicate the force-length and force-velocity relations of the simulated biomechanical body with high fidelity [71,72].By using the same geometricallycorrect placement and anthropomorphic musculoskeletal model for octave playing, the virtual prototype promises a reduced sim-to-real gap between the simulated humanoid limbs and its ground-truth counterpart from a morphological perspective.

Anthropomorphic fingers
The finger joints have human-like anatomical features.For the hand, it comprises four bones for each of the five fingers.The high degrees of freedom arise from each joint's capability to be actuated by either torque or motion.Yet, in this study, we did not actuate the hand's joints but retained their elastic constraints, rendering the hand elastic and compliant in its interactions with the piano.The entire hand is represented by 20 3D objects: one palm, three thumb bones, and sixteen phalanges for the remaining four fingers.The hand is poised in an octave-hitting gesture, with the thumb and little finger properly stretched and the other three raised.Interphalangeal joints are one-DoF revolute joints that permit only flexion-extension motions, whereas metacarpophalangeal joints are universal joints with two DoFs that allow for an additional abductionadduction movement.The relative motion between two adjacent phalanges is constrained by rotational spring-damper joints.

Wrist, elbow and shoulder joints
The mechanical keystroke action is the result of rhythmic coordination of the entire upper limbs, which includes the shoulder, upper arm, forearm, and hand.The practical actuation of the upper limbs is a complex synergy of multiple muscle groups such as the deltoid, latissimus dorsi, biceps brachii, brachioradialis and so on.In this work, we focus on the motions resulting in an earthward motion of the hand, allowing the piano keys to be pressed downward with varying patterns.Every joint is actuated independently by muscles that generate both active and passive forces based on the force-length relationship depicted in figure 2(C).Table 1 lists the motions of upper limbs and the associated muscles.The upper limbs in the simulation are modeled from genuine bones with a humanoid morphology.We simplify the anatomical structure and focus on three muscles that have fundamental impacts on the piano playing motions.For instance, the shoulder joint is torque-actuated to facilitate flexion-extension movement.Without torque, it reverts to its equilibrium position.The torque, which originates from the muscles, is applied solely to the flexion-extension motion of the shoulder.Yet, the constraints remain active for the other motion in a different direction.Anthropomorphic mass and inertia distribution (density is empirically set [73]) are assigned to the limbs.

MIDI representation
The MIDI protocol has been employed to parameterize the musical quality of keystroke actions.The mechanical contact of finger-key interaction is converted into a MIDI event specifying the note's pitch, timing and loudness.For every single keystroke, MIDI information is captured based on a thresholdtriggering model (see figure 2(D)).A triggering value has been defined.Whenever the key displacement reaches this pre-defined threshold level, a MIDI event takes place and the corresponding keystroke velocity (ω) and time instants are recorded.Two MIDI events happen within one keystroke, where the subscript p and r denote key-press and key-release actions, respectively.For two adjacent keystrokes, the time gap between two key-press instants T gap depicts how fast the key has been hit in repetitive keystrokes (i.e. the regular interval of sound).Note that here the tempo should not be confused with rhythm, which depicts the culmination of note duration (i.e. a recognizable pattern that how long each note is played).The tempo is a fundamental metric for articulation control in

CPG-inspired keystroke controller
Expressive piano playing necessitates the variation of tempo transitions, which is a result of repetitive musculoskeletal coordination.The proposed CPG controller leverages synaptic mutual inhibition and selfexcitation to mimic the rhythmic motions observed in human spinal cords.The controller parameters tuning, which is a challenge itself, is implemented based on an optimization framework that takes the musical quality and energy expenditure into account.

Oscillatory neurons
We use coupled oscillatory neurons to generate the rhythmic movements of the upper limbs such that the aforementioned muscles can be synchronized with steady desired keystroke frequencies.A six-neuron network based on Matsuoka's oscillatory network has been proposed.It is composed of three identical pairs of neurons [65].In the network, each pair of neurons suppresses the activity of each other and each subunit neuron is independently in charge of the extension and flexion of a joint.This is a basic model of the stroke of a single limb.The neurons receive an equal magnitude of excitatory and inhibitory stimuli from the outside and inside of the network.The output, which is an analog signal, is the firing rate of the neuron (y in equation ( 2)).
The oscillatory neurons are able to mimic the neuromuscular junction, which for vertebrates allows the motor neuron to transmit a signal to the muscle fiber [66], and send neural stimulations to the Hilltype muscles according to the desired musical styles.The muscles subsequently actuate the three joints such that the musculoskeletal system works in a synchronized manner and allows the entire upper limb system to be coordinated.The human musculoskeletal model performs the piano playing and the resultant musical production is compared to the desired sheet music.For the neural rhythm generator, the system is characterized by the following differential equations: In equation ( 2), we have i, j ∈ N; j ̸ = i and Terms x i and f i denote the membrane potential and fatigue property discovered in natural bodies.Tr and Ta are constants that determine the reaction time of x i and f i , respectively.s i is the excitatory tonic input.a ij and b denote the strengths of reciprocal and self inhibition.Initial input parameters are provided into the oscillatory system to activate the network.By tuning the mutual inhibition strength and reaction time, we are able to modulate the frequency and offset of the rhythmic motions, allowing the pianist to generate a range of expressive styles in piano playing.

Controller architecture
Expressive piano playing requires a complicated modulation of the keystroke dynamics and articulation.The coordination of neural and muscular systems necessitates a connection between the neuromodulatory oscillator and the musculoskeletal biomechanical system.We have designed a CPG-driven controller that allows for coordinated actuation of the three upper limbs.This is a hierarchical configuration composed of a CPG internal core and a second-level controller with fine-tuned parameters.Figure 3(A) depicts the control architecture.The metric for pattern transition, which is the tempo in expressive piano playing, is computed from the parameterized MIDI message.The controller is embedded in the simulation framework to allow the human musculoskeletal model to play the piano at various expressive styles, during which we focus on the distal finger-key interaction, in particular the magnitude, frequency, and offset of the key's angular displacement.The tempo is dictated by the multi-hump waveform of the piano key trajectory.Figure 3(B) illustrates the typical cyclogram of one single piano keystroke action composed of seven stages.Before the actual keystroke, the piano keys keep still and the human arm/fingers are aligned above the corresponding piano notes (positioning).This positioning is not determined by the CPG controller but is pre-set by the user.Then the entire arm starts to swing and approach the keyboard with acceleration (initial arm swing), which is a synergy resulting from both active human control of the arm and gravity.Once there is finger-key contact (mid arm swing, initial contact), the piano key starts to accelerate until the internal hammer strikes a string so that the vibrating string produces acoustic sounds (loading response).The piano key is hit and held for a specific amount of time for various timing purposes, followed by arm withdrawal (late arm swing).Finally, the upper limbs and keys return (bounce) to their initial orientation and are ready for subsequent repetitive keystrokes.

Parameter optimization
Human players are capable of performing piano playing using arm-swing patterns in a naturally energyefficient manner [38,74].To identify the optimal CPG parameters for the simulated human musculoskeletal model, we proposed a Bayesian optimization (BO) framework to optimize the neuron's parameters (e.g.time constants and coupling strength) while taking into account both musical quality and energy expenditure.

BO framework
We aim to optimize the two time constants and inhibitory coefficients in the CPG controller as they have an essential impact on the efficiency of the keystroke patterns.Figure 4 depicts the workflow of parameter optimization.The virtual prototype receives the initial arguments of CPG parameters.The human musculoskeletal model then inspects the contact between blocks and detects if the key is triggered.If not, a penalty is imposed.Otherwise, the keys' angular displacement is measured and the corresponding keystroke actions are converted into MIDI-triggering events using the MIDI parameterization shown in figure 2(D).Within the simulated piano playing, the energy consumption of the Hill-type muscle model is calculated.The objective function is then fed with the computed MIDI tempo and energy consumption.Iterations and the number of consistent repeating key-press actions are used to define the stop criterion.

Objective function
The controller is designed for tempo control, which is essentially characterized by repetitive keystroke manners.The objective function is comprised of two components: the error in musical articulation (f M ) and the energy expenditure (f E ).Piano playing involves two primary energy forms: potential energy due to gravity and kinetic energy.Both undergo fluctuations with recurring keystrokes.With the arm swing behaviors, when the upper limbs are moving upward, the arm is working against gravity, altering the potential energy.Conversely, when the arm is moving downward, the entire arm is partially accelerated by this potential energy, i.e. the upper limbs are making use of the gravity/mass.Considering the player is both reacting against and making use of gravity, we decide to explicitly focus on calculating the total mechanical work of the musculoskeletal model, as the contractile movement of muscles is merely the source of the actuation.By introducing weight coefficient (λ), we are able to prioritize different metrics. where ( In equation ( 5), T des denotes the desired timing for user-defined tempo patterns.The energy metric is the total energy input that the muscles have injected into the upper-limb system.Terms τ and ω denote the torque and angular velocity of the connective joints.The subscript s, e and w denote shoulder, elbow and wrist joints, respectively.Note that the simulation is terminated until five repetitive keystrokes can be steadily generated with the same pattern.To compute the MIDI signals and corresponding energy expenditure, we make use of the average value of the key's angular movement's steadily repeated multi-humps.The quality of piano playing is quantified based on the MIDI protocol.Using this benchmark, we can evaluate the dynamics and articulation, which directly correlate to the force of the keystrokes and the musical tempo.In this assessment, we emphasize rhythm to determine if the simulated player consistently strikes the piano keys in a rhythmic manner to match a desired tempo.The deviation between the produced tempo and the desired tempo is used as the criteria for evaluation.The reference group is computed using a musical score-writer software named MuseScore.
Figure 5 illustrates the impact of control parameters on the MIDI tempo.Generally, there's a positive correlation between T gap and the two time instants of the CPG parameters (T r and T a ).This implies that as the time instants increase, the keystroke frequency becomes slower.Conversely, a rise in the inhibition strength (b) results in a decrease of T gap , indicating quicker piano key strikes (higher tempo rate).

Expressive octave playing
To demonstrate the capacity of the human musculoskeletal model to manage multiple expressive patterns, the simulated player is instructed to actively change the tempo during the performance.We investigated the neural activities, motions of the entire multi-motor upper limbs, fingertip trajectories and the corresponding MIDI output, in the context of a tempo-dependent octave-playing scenario.Specifically, the human musculoskeletal model is configured to use the thumb to press the C4 key and the little finger to press the C5 key.The orientation of all fingers in relation to the wrist is configured from the beginning and remains constant.Each joint in the fingers allows for passive dynamics.The two notes are simultaneously pressed repetitively over a 20 s time course.

Multi-joint coordination
The simulated pianist is able to dynamically change musical tempos while playing the piano.We performed real-time tempo-variant piano playing in three typical expressive octave playing styles: slow tempo at 60 BPM (Lento), medium tempo at 120 BPM (Allegretto) and fast tempo at 176 BPM (Allegro Vivace).A series of tempo transitions between two arbitrary tempos were performed in real-time, with three levels of inhibitory strength imposed on the neural oscillatory network (figure 6(A)).
We recorded four groups of signals, i.e. neuronal activities, upper-limb joint angles, corresponding vertical displacement of two fingertips, and the resultant musical output.Figure 6(B) depicts the trajectories of the firing rate of six neurons according to the inhibitory strength using a 'slow-medium-fastmedium-slow-fast-slow' sequence.It can be seen that every two neurons in one pair exhibit complementary peak occurrence, implying that the neurons in each pair are inhibited reciprocally.Despite a reduction in magnitude, increased inhibitory strength has resulted in more frequent neural activities.The three pairs produce a constant offset rhythmic brain output, which is required for repeating keyboard activities.
Figure 6(C) depicts the continuous trajectories of three joints' motor movements.It denotes that the musculoskeletal system achieves a naturally increased arm-swing frequency for the transition from slow to rapid tempo and vice versa by adjusting the neural input signal in real time.For example, all three joints demonstrate more frequent angular displacement from Lento to Allegretto tempo despite slight variations, whereas the frequency decreases for fast-toslow transitions such as Allegretto-Lento and Allegro Vivace-Lento.The fingertips in figure 6(D), which exhibits overlap between the trajectories of the two fingertips, have observed the same pattern transition.It demonstrates good coherence between keystrokes on the C4 and C5 keys, as well as tempo-dependent pattern transitions similar to those of the three joints (see supplementary video S1).It should be noted that the motor movements are a synergy of both active muscular control and the passive process imposed by the coordination of physical constraints such as gravity, joint stiffness, density, body morphology, and so on.The synergy demonstrates that the coupling of the neural system, the physiological body, and the environment provides a mechanism for the human musculoskeletal model to manipulate musical flows at various tempos.
Figures 6(C) and (D) demonstrate that the entire upper limbs have been effectively coordinated for rhythmic piano keystrokes, suggesting that the piano playing task can be achieved by exploiting a spinal cord-inspired oscillatory neural network.Note that only one neural parameter (self-inhibitory strength), which is nonspecific to motor movements, Figure 6.Tempo transition three different piano playing (A) Three levels of neurons' inhibitory strength have been tuned for varying the (B) neural activities of the six-neuron oscillatory network.Note that the neuron indexes which are in charge of the flexor and extensor are indicated in figure 3(A).(C) The angular displacement θ of three upper-limb joints including the shoulder, elbow and wrist joints as the result of the musculoskeletal constraints.(D) Fingertip trajectories (Z-axis displacement) of the thumb and the little finger are in line with the previous joint movements.(E) The resultant MIDI output for the tempo transition between three different tempo styles including the Lento, Allegretto and AllegroVivace, as a result of the interaction between the fingertip and the piano, where the colored rectangular denotes the steady tempo output.has been tuned.It demonstrates that a complex motor coordination task can be accomplished by modifying a few simple parameters in a neural oscillatory network.

MIDI output
The aforementioned coordinated arm-swing movements ensure consistent repetitive keystroke actions for the finger-key interaction, during which the piano notes have been pressed down multiple times and the corresponding MIDI events were created for each keystroke according to figure 6(D).Figure 6(E) illustrates the resultant MIDI output for the comprehensive tempo transition among three patterns in line with the motor movements coordination.It demonstrates that the piano keyboard was manipulated smoothly with correct switching between either two typical expressive tempos, implying that the human musculoskeletal model could achieve fine timing control in keystrokes (acoustics in supplementary video S1).Because there is no recurring arm-swing activity for the first few seconds, the first keystroke is treated as an outlier and is excluded from the sequence of MIDI tempos.
The initial few keystroke tempos for slow-tofast transitions deviate slightly from the expected one, but it quickly stabilizes around the desired tempo, such as the Lento-Allegretto and Allegretto-Allegro Vivace transitions.An anomaly was seen in the first keystroke when there is a big leap from Lento to Allegro Vivace, where the musculoskeletal model takes longer to stabilize around the appropriate tempo.The human musculoskeletal model, on the other hand, generates a smooth repertoire of tempo switching for fast-to-slow tempo transitions.

Optimization for CPG parameters
The CPG-based controller can effectively actuate the upper limb system to perform expressive piano playing.However, parameter tuning is a challenge in itself.The workflow of the parameters optimization is illustrated in figure 4. In order to validate the effectiveness of the proposed BO framework, we have optimized the energy consumption for a threeneuron actuated piano player by satisfying the prerequisite of guaranteeing desired tempo patterns.
Figure 7 delves into the intricacies of the optimization procedure.Prior to optimization, the three CPG parameters are derived from the initial version of the Matsuoka oscillator [64,65] with proportional scaling.Observing figures 7(A)-(C), it can be seen that various CPG parameter combinations were evaluated during each iteration, totaling 300 trials.Notably, the optimal control parameters matching the target tempo were pinpointed at timestep 157. Figure 7(D) showcases the discrepancy of the T gap from the intended tempo (T des ), which, over iterations, aligns closely with the target of 75 BPM.Energy consumption is represented in figure 7(E).
Figure 8(A) illustrates that in both scenarios, the elbow undergoes more significant displacements compared to the other two joints.Over a duration of 20 s, prior to optimization, keystrokes occur more frequently with 15 strikes.While after the optimization This number decreased to ten keystrokes after optimization within the identical timeframe.Figure 8(B) provides a visual comparison of tempos for two notes (C4 and C5) using both initial and optimized parameters.Before optimization, it can be seen that the notes did not strictly adhere to the desired 75 BPM tempo.However, post 300 iterations of the BO framework and with the tempo set at 75 BPM, the tempo is accurately achieved despite fewer keystrokes taking place for the given period.This decrease in keystrokes stems from the principle that, within a fixed duration, playing the piano at a slower tempo naturally results in fewer key presses.Finally, drawing from equation (4), our primary objective was to refine the controller to minimize energy consumption while ensuring the accuracy of the keystrokes.This was essential for reproducing the intended MIDI pattern in an energy-efficient manner.The BO framework managed to optimize the controller parameters.During a simulation spanning 20 s, the human musculoskeletal model exhibited a consistent rise in energy consumption.parameters before and after the application of the BO framework.A noticeable optimization is seen in three parameters (T r , T a , and b), facilitating the human musculoskeletal system in achieving piano playing at a tempo of 75 BPM.The resultant changes in the objective function, both in terms of musical quality (f M ) and energy expense (f E ), reflect decreases of 66.27 and 1.06, respectively.The proposed BO framework has been validated to be effective in decreasing energy expenditure since the optimized parameters provided accurate but low-cost piano playing patterns at the specified tempo.

Analysis of coordinated movement 4.2.1. Human player
We replicated Bernstein's experimental task [45] of piano octave strike with one concert pianist (female, 29 years old), who was instructed to perform a series of right-hand octave keystrokes in staccato-style with 16th notes (four notes per beat) at 33 BPM and 115 BPM (2.20 Hz and 7.67 Hz).Each tempo condition was repeated three times.During the experiment, we recorded the movement of the upper limbs at 960 Hz with an electromagnetic motion tracking system.Sensors were attached to the sternum, acromion, lateral surface of the humerus, the posterior distal surface of the forearm, the dorsal side of the hand, and the nails of the little finger and thumb (supplementary video S2).The position and attitude data from the magnetic tracking system were used to calculate the time courses of the joint rotation angles that corresponded to the seven degrees of freedom of the arm [75]: abduction-adduction and flexion-extension in the wrist; pronation-supination and flexion-extension in the elbow; and abductionadduction, flexion-extension, and rotation in the shoulder.
To assess the motor control regime, we used discrete relative phase analysis [76], in which the rhythmic phase of the wrist flexion movement with respect to elbow extension movement was computed at each instant of maximal extension of the elbow.Here, a phase lag by the wrist with respect to the elbow is indicated by negative relative phase values, while perfect 'in-phase' coordination would result in 0 • .
In the fast tempo condition (115 BPM), the pianist exhibited relatively small amplitudes of flexionextension movement of wrist and elbow joints (amplitudes of elbow and wrist joint movements were 1.50 ± 0.38 • and 4.70 ± 0.78 • , respectively) with a single-peaked, sinusoidal pattern (figures 9(A) and (C)).The shoulder flexion movement (figure 9(B)) was less regular compared to those of the wrist and elbow, although the timing of peaks appeared to be synchronized to that of the peak and valley of wrist flexion movement.The phase of the wrist flexion movement was behind the phase of the elbow extension movement by 20.70 • on average.
If the hand was performing forced oscillations determined by movements of the forearm (i.e. one oscillator plus a passive element) as observed by Bernstein and Popova [45], the wrist flexion and the elbow extension would exhibit a simple sinusoidal movement in which the phase of the wrist flexion would be slightly behind the phase of the elbow extension, which was exactly what we observed in our replicative experiment.
In the slow tempo condition (33 BPM), the pianist exhibited a very complicated pattern of joint movement, where both wrist and elbow exhibited a doublepeaked curve of flexion-extension movement profiles, whose amplitudes almost doubled compared to the fast tempo condition (amplitudes of elbow and wrist joint movements were 4.04 ± 0.64 • and 7.28 ± 1.78 • , respectively) (figure 10(A)).Although one of the two peaks of the curve of the wrist and elbow flexionextension movement profile seemed to coincide, at the beginning of the downward extension of the elbow and shoulder toward a keystroke, for a short time counter-directional maximal extension of the wrist was observed, resulting in a considerable lag of downward wrist maximal flexion after the maximal extension of the elbow.Put differently, there was a general pattern of proximal to the distal sequence of downward movement, from shoulder extension, and elbow extension, to wrist flexion (figure 10(B)).The phase of the wrist flexion was behind the phase of the elbow extension by 71.70 • on average, but the standard deviation was high (29.76• ), suggesting that the phase relation was not locked but varied over time (figure 10(C)).
The double-peaked shape of the flexion-extension movement profiles of the wrist and elbow observed in the slow tempo condition agrees with the observation by Bernstein and Popova [45], who argued that synergy between impulses at the wrist and impulses from elbow muscles at a medium or slow tempo would produce a double-peaked curve of joint movement profile like that produced by a complex pendulum.

Human vs. musculoskeletal model
Based on the analysis of human movement, a comparison between the human expert pianist and the simulated player has been conducted by making use of the computational piano-playing simulation platform.The simulated pianist was tailored to replicate the aforementioned repetitive rhythmic octave playing task but with varying levels of neural stimuli (supplementary video S3). Figure 11(A) shows the comparison of the motions of elbow and wrist joints between the human and the musculoskeletal model.Note that the angular displacement of the human test has been normalized in order to explicitly show the pattern of the joint motion.It can be seen that the modeled player exhibits similar motions for both two tempos compared with that from real human experiments.In particular, for a slow tempo at 33BPM, both elbow and wrist joints saw a double-hump waveform pattern, which transforms into a single-hump pattern for high-speed keystrokes.The actual angular displacement for elbow and wrist joints at a slow tempo (Grave) is 1.07 • and 1.24 • , respectively, which is nearly double that at fast (Allegro) tempi (0.42 • and 0.45 • ).This is consistent with the results observed in human experiments.The human musculoskeletal model performs tempo-dependent keystrokes.For Grave and Allegro, the relative phase of the wrist and the elbow is −86.77• and −13.83 • , respectively.This indicates that the wrist flexion motion is behind the phase of elbow extension movement.The modeled pianist exhibited a similar phase transition from one motor coordination pattern to another as the tempo was varied.As the tempo increased, the simulated player reduced the degree of active oscillation of the wrist and increased the inhibitory strength between elbow and wrist movements, replicating the phase transition observed in human pianists.It is worth highlighting that such switching of the control regime was not prescribed by micro-management of movement patterns, but spontaneously emerged through the entangled synergy between gravity, morphology, and specific mass distribution of each segment of the body, characteristics of muscle activation, and central pattern generators.
The pattern transition was achieved by merely tuning a constant input at the nervous system level that is unrelated to joint movements.To investigate how the movements of elbow and wrist joints transformed into other patterns during the desired Grave-Allegro transition.We imposed the CPG controller with eight levels of self-inhibitory strength and measured the angular displacement of each joint.The motions of the elbow and wrist joints during continuous keystroke cycles are depicted in figure 11(B).It can be seen that during a fixed-time course (6-10 s, the human musculoskeletal model does not converge to stable oscillations at the beginning few seconds), more keystrokes are produced as the inhibition strength increases.This is because increasing the constant input results in proportionally more frequent neuromuscular stimulation.However, the amplitude of the trajectory has slightly decreased.Both joints first exhibit a double-hump trajectory with a low amount of neuronal stimulation, following which the waveform gradually changed to a singlepeak pattern.This is a result of the simulated pianist's spontaneously evolving behavior rather than the result of tuning the joint parameters.

Implications of double-hump trajectory 4.3.1. Musical effects of varied motor movements
According to the aforementioned results, the computationally simulated pianist can reproduce piano-playing actions while capturing the invariants of neuro-musculoskeletal synchronization.
According to Bernstein's research, the hand exhibits a complicated pattern at slow and medium tempi, compared to a forced elastic oscillation at fast speeds [45].The origin of the 'complex pattern' is, however, yet unknown.To explore the emergence of the double-hump pattern, first we demonstrated two distinct double-hump patterns at a consistent keystroke frequency (or tempo).This comparison was qualitative and aimed to show that the shape of the double-hump waveform has an impact on the MIDI velocity.The parameter tuning is undertaken by maintaining consistent CPG parameters, with a focus on adjusting the neuronal output's blending ratio.Given the hierarchical nature of the controller architecture, the CPG oscillator is responsible for producing the fundamental signal that governs the controller network's frequency, thereby influencing the tempo.Once CPG parameters are optimized, subsequent tuning involves adjusting the amplitude of individual neuronal outputs and the blending ratio, enabling alterations in waveform while preserving the foundational frequency of rhythmic movement.Subsequently, in section 4.3.2,we pursued a more in-depth quantitative analysis, focusing on revealing the correlation between the double-hump configuration and volume.Based on the previous studies on expressive features in piano performance [43,44], music quality was determined by taking into account both articulation and dynamics in keystrokes according to the parameterized MIDI protocol.
Figure 12 denotes the angular displacement of two piano keys from two different double-hump octave playing patterns.Both the piano keystroke patterns yield the same tempo (the time gap between two key-press events remains the same) based on the MIDI parameterization.The two different doublehump trajectories are generated based on fine-tuning the blending ratio of neuronal pairs of the CPG controller.Due to the simultaneity of the two-note octave playing, the MIDI variation for two notes (C4 and C5) overlaps with each other.Despite the fact that the keystroke frequencies are the same, the MIDI threshold-triggering velocity (ω p and ω r in figure 2(D), which denotes the keystroke loudness instead of tempo, varies.Noted that MIDI velocity is a unitless value ranging from 0 to 127 that represents the volume of a given keystroke for a specific MIDI note.It expresses how strongly the pianist strikes the piano keys.The key-press velocity for the double-hump pattern shown in figure 12(A) is 50.4,while the key-release velocity is 115.5.Nevertheless, for the second double-hump pattern (figure 12(B)), the key-press velocity increases by 24.8% and reaches 62.9, while the key-release velocity decreases to 106.6, indicating that the articulation and loudness of the produced music have been massively tailored by the pianist's use of varied double-hump trajectory during keystrokes.

Double-hump pattern vs. MIDI
In order to better understand how the pattern of the double-hump trajectory influences the musical sound produced, we performed 11 double-hump patterns in silico and investigated how the complicated coordinated keystroke behaviors tailor music quality.We hypothesized that human exploits double-hump trajectory to regulate the quality of music at slow or medium tempos.Music quality here relates not only to musical articulation but also to dynamics (acoustic loudness).In the case of a piano, the tempo can be easily altered by changing the level of cerebral stimulation and inhibitory strength.Thus we shall concentrate on how a specific double-hump waveform elicits variance in musical dynamics.
We defined parameters that represent the doublehump trajectory pattern (figure 13(A)).The heights of the left and right peaks with regard to the valley are denoted by h 1 and h 2 , respectively.The relative difference between the two humps is depicted by d = h 2 − h 1 .The human musculoskeletal model has been configured to play octave playing at the same speed using 11 different double-hump patterns (33BPM).These 11 trajectory patterns' parameters and illustrations are shown in figure 13(B) and at the top of figure 13(C), respectively.The 11 patterns are ordered in accordance with the value of d, progressing from the smallest to the largest values.Note that d can be negative values.Figure 13(C) depicts the key-press and key-release velocities for various double-hump trajectories.The variation in the double-hump pattern has a substantial impact on the MIDI key-press velocity, although the MIDI key-release velocity is less sensitive to the double-hump patterns.This is because finger retraction occurs faster than the self-bounce of the piano key.For the given tempo, the key-release action is more dependent on the mechanical properties of the piano keys rather than the finger's acceleration or deceleration.
Figure 13(D) depicts the relationship between d and the two types of piano playing velocities.The left subplot has shown that the double-hump pattern (represented by d) has an exponential relationship with key-press velocity.Curve fitting has been performed for the various patterns and MIDI velocities.The solid curves represent the fitted outcomes using the fitted exponential model.The fitted equations for C4 and C5 are shown in equation ( 6), with R-Square values of 0.934 and 0.938, respectively.Similarly, the coupling of the double-hump trajectory to the keyrelease velocity is curve-fitted in equation ( 7) with corresponding R-Squares of 0.984 and 0.989 In equations ( 6) and (7), ω p and ω r denote the velocities of key-press and key-release events of an individual keystroke, respectively.It can be that the intended MIDI velocity is significantly dependent on the complex double-peaked motor movements, which display an exponential and proportional relationship to key-press and key-release velocities, respectively.This is because the double hump trajectory observed in humans is expected to tailor the acceleration and force given to the notes, allowing the pianist to execute delicate control of the music dynamics.

Discussion
Understanding the coordination of human motor movements has been a long-standing challenge.The development of piano skills necessitates a complex development of the plasticity of neural systems and the complicated coordination of muscle control.We explored Bernstein's problem by investigating how humans coordinate multiple degrees of freedom in the piano playing task.Bernstein conducted a series of motor movements for pianists and observed certain keystroke patterns at different tempos, based on which we proposed a computational neurophysiological model to analyze the emerging patterns and explain how the upper-limb movement pattern is related to expressive features in piano performance.The simulated pianist is composed of a CPGinspired neural system, a physiologically realistic mechanical body, and a piano environment.The CPG-based neural rhythm generator uses reciprocally-inhibitory oscillatory neurons to generate rhythmic stimuli for an anthropomorphic upper-limb system.As a result of the interaction between the biomechanical body (including anthropomorphic upper-limb morphology and specific density/mass distribution), the activation of the Hilltype muscle model, and the environment (including gravity, stiffness and passivity of the piano notes), the human musculoskeletal model is able to perform various spontaneous pattern transitions.The contractile movements of each muscle are mimicked using human arm force-length and forcevelocity relationships.To optimize the controller, the energy expenditure of piano playing and MIDI-based musical quality has been considered.
We have replicated Bernstein's arm coordination experiment in a piano playing scenario at various tempos with both a human concert pianist and our developed human musculoskeletal model.The tempo transition experiment between three typical expressive styles demonstrates that the complex coordination of upper limbs can be achieved by exploiting a spinal cord inspired oscillatory neural network.The various tempos within expressive piano playing are a result of the complex coupling between the neural system, the mechanical body and the environment.On the other hand, the motor movements exhibit emerging patterns for varying tempos.It is worth highlighting that the transition of the joint pattern spontaneously takes place as a result of only adjusting the inhibitory strength in the neural oscillatory network.
Notably, the complex double-peaked motor movements observed in a slower tempo were shown to be dependent on the intended MIDI velocity and display an exponential and proportional relationship to key-press and key-release velocities.Previous studies on expressive piano playing have indicated that the attack and release dynamics of the keystroke are among the performance features that best characterize pianists' individuality which affects timbral nuances of their performances at the microstructural level [43].Taken together, the result suggests that the double hump trajectory observed in humans may have occurred for the purpose of tailoring the acceleration and force given to the notes, allowing the pianist to execute delicate control of the music dynamics that have direct consequences in expressive piano playing.

Figure 1 .
Figure 1.The neuro-musculoskeletal coupling for expressive piano playing.(A) The concept of coordinated motor movements within the piano playing task.The nervous system has efferent connections (via the axon of motor neurons) to the muscles through the musculoskeletal junction and receives afferent feedback.The muscle fiber bundles are subject to contractile movement, allowing the skeletal system to act on the environment (piano-playing task).The synergy of the coupling among the controller, body and environment can achieve self-organized pattern transitions for multiple expressive styles by tuning simple control parameters that are nonspecific to the movements.(B) The ground-truth human piano player and (C) the proposed simulated pianist with coordinated multi-motor upper limbs in a virtual prototype.

Figure 2 .
Figure 2. Modeling of an anthropomorphic piano player.(A) Upper-limb motions and the corresponding joints in piano playing.(B) Schematics of the four-element Hill-type muscle model.(C) The force-length relation of the muscle model.(D) MIDI parameterization for two adjacent keystrokes.

Figure 3 .
Figure 3. Framework of the CPG-based motor movements control for rhythmic keystrokes and the cyclogram.(A) Control architecture of the CPG-based keystroke coordination for a virtual piano player with anthropomorphic musculoskeletal properties.(B) A cyclogram of piano keystroke action.There are seven phases within one keystroke including the (1) positioning, (2) initial arm swing, (3) initial contact, (4) mid-arm swing, (5) loading response, (6) late arm swing and (7) bounce, during which the corresponding muscles' activities have been illustrated.

Figure 4 .
Figure 4. Flowchart of the CPG parameter optimization.

Figure 5 .
Figure 5.The correlation between Tgap and its control parameters including (A) Tr, (B) Ta, and (C) b.

Figure 7 .
Figure 7.The optimization process includes the BO selection of three parameters including (A) Tr (B) Ta and (C) b.The (D) Tgap and the (E) energy cost vs. the optimization trials.

Figure 8 .
Figure 8.Comparison of the MIDI readings and energy expenditure before and after the BO framework.(A) Angular displacement for individual joints in the upper-limb system.(B) The resultant MIDI output of the robot player.(C) Total energy consumption of the human musculoskeletal model.

Figure 9 .
Figure 9. Excerpts of movement of a pianist performing a series of octave keystrokes with 16th notes at 115 BPM (7.67 Hz).(A) Elbow and wrist flexion angles as a function of time (elbow amplitude: 1.50 ± 0.38 • wrist amplitude: 4.70 ± 0.78 • ).(B) Magnified plot of wrist, elbow, and shoulder flexion angles.Joint angles were normalized to have a common scale.Vertical lines correspond to the timing of keystrokes.(C) The relative phase between the elbow and wrist flexion at the timing of maximal extension of the elbow.A negative value indicates a phase lag by the wrist flexion with respect to the elbow extension.(D) Histogram plot that presents the frequency of occurrence of relative phase angle.

Figure 10 .
Figure 10.Excerpts of movement of a pianist performing a series of octave keystrokes with 16th notes at 33 BPM (2.20 Hz).(A) Elbow and wrist flexion-extension angles as a function of time (elbow amplitude: 4.04 ± 0.64 • wrist amplitude: 7.28 ± 1.78 • ).(B) Magnified plot of wrist, elbow, and shoulder flexion angles.Joint angles were normalized to have a common scale.Vertical lines correspond to the timing of keystrokes.(C) The relative phase between elbow and wrist flexion at the timing of maximal extension of the elbow.A negative value indicates a phase lag by the wrist flexion with respect to the elbow extension.(D) Histogram plot that presents the frequency of occurrence of relative phase angle.

Figure 11 .
Figure 11.Analysis of coordinated upper-limb movement.(A) Comparison of the elbow and wrist joint coordination of the human and agent for two tempo styles, where the angular displacement of human joints has been normalized.(B) The activity of the elbow (left) and wrist (right) joints during Grave-Allegro transition shows the spontaneity of the pattern transition with a single constant input (inhibitory strength).

Figure 12 .
Figure 12.The comparison between the keys' trajectories and the corresponding MIDI information for two double-hump patterns.

Figure 13 .
Figure 13.Various double-hump trajectories and the resultant music quality.(A) Characterization of the double-hump trajectory, where d = h2 − h1 the difference between the left and right humps.(B) The parameters (h1 and h2) for the 11 doublepeaked patterns that have been tested.(C) The MIDI key-press and key-release velocities (ranging from 0-127) represent the loudness of each piano stroke and the decay of produced acoustics.The subplot depicts the waveforms of 11 patterns as well as the corresponding MIDI velocities.(D) Relationship between d and MIDI key-press (left) and key-release (right) velocities.Note that there is no second hump in P10 and P11.Thus for these two patterns, the MIDI key-press velocity is treated as an outlier for the left subplot.To capture the relationship between d and MIDI velocity, curve fitting has been applied.The left subplot reveals an exponential relationship for two piano keys, with R-Square values of 0.934 and 0.938 (RSME = 0.101 and 0.105).The right subplot demonstrates a proportionate relation, with R-Square values of 0.984 and 0.989 (RSME = 0.062 and 0.055, respectively).

Table 1 .
Movements of the upper limbs and the associated human muscles according to the coordinate system in figure2(A).
musical flows.For a constant tempo melody, the tempo is measured for two consecutive keystrokes, the time duration between these two keypress instants denotes the T gap , i.e.T 2,p − T 1,p in figure2(D).For two adjacent keypress time instants (T i+1,p and T i,p ), the tempo is measured in beats per minute (BPM) as

Table 2
lists the CPG

Table 2 .
A comparison of CPG parameters and objective functions before and after optimization.