Learning from Demonstration: Generalization via Task Segmentation

In this paper, a motion segmentation algorithm design is presented with the goal of segmenting a learned trajectory from demonstration such that each segment is locally maximally different from its neighbors. This segmentation is then exploited to appropriately scale (dilate/squeeze and/or rotate) a nominal trajectory learned from a few demonstrations on a fixed experimental setup such that it is applicable to different experimental settings without expanding the dataset and/or retraining the robot. The algorithm is computationally efficient in the sense that it allows facile transition between different environments. Experimental results using the Baxter robotic platform showcase the ability of the algorithm to accurately transfer a feeding task.


Introduction
One of the key challenges of learning from demonstration is to generalize a learned task to different situations and environments. This generalization to a new environment should be done such that the key features and constraints of the task (i.e., key motions required to accomplish the task) be satisfied resulting in the successful implementation of the task in the new environment. In [1], a programmingby-demonstration framework is presented in order to generically extract a trajectory having the relevant features of a given task. The authors in [1] use Gaussian Mixture Model/Regression (GMM/GMR) to model the task and generate new trajectories. The GMM/GMR approach has been also used in [2] for manipulation tasks and is robust to the movement of obstacles included in the demonstrations. A number of methods have previously attempted to generalize a learned trajectory to different environments. The GMM/GMR is modified in [3] to adapt the means and covariances of a GMM to different locations and orientations of objects involved in a task. In [4], [5], and [6] the authors present a demonstrationguided motion planning (DGMP) to generate plans that avoid obstacles in a new environment while maintaining the critical features of the task. Another work has attempted to find sub-optimal controllers which reproduce a set of constraints of a given task [7]. However, the constraints are selected from a set of predefined constraints, e.g., certain orientations for the end-effector. In [8], the authors use an invariant trajectory representation to separate essential motion information from context-specific information of the recorded demonstrations. Another work [9] uses dynamic movement primitives that are learned from human demonstrations with generalization to different start and goal points for a task.
In this paper, we propose an alternative approach to generalize a learned trajectory of a given task to different environments. In majority of the approaches discussed above, in order to learn and execute a task, a relatively big dataset, spanning the space in which the task would be performed, is needed. However, our method does not require a big dataset; instead, a small number of demonstrations is needed to collect data with a robot for a fixed setup of the environment (no change in the position/orientation of the objects involved in the task). Consequently, the amount of data needed for training greatly reduces. Our approach takes the learned trajectory of the task for a fixed environment as an input and outputs an appropriate trajectory which successfully implements the learned task in a different environment. The approach consists of two parts: 1) segmentation of the learned trajectory w.r.t different motions that take place in each part of the trajectory, followed by 2) proper rotation and scaling of each segment based on the changes between the new environment and the environment in which the robot has been trained. The learned trajectory (input) is derived using the approach in [1]. Unlike neural network based methods, whose application is limited only to interpolation between the space over which they have been trained, our method is capable of generalization of the task to different environments even outside the training environment without the need for retraining the robot. Our approach only scales and rotates different segments of the trajectory, therefore it has less computational complexity than the approaches in [4]- [6]. Our method has been validated by running a task of feeding cereal from a bowl to a user.

Notations and Definitions
The set of real n-vectors is denoted by R n and the set of real m × n matrices is denoted by R m×n . Matrices and vectors are denoted by capital and lower-case bold letters, respectively. The set of natural numbers is referred to by N. Uppercase letters are used for constants and lowercase letters for scalar variables. The end-effector's trajectory is denoted by X(t) = [X pos (t); X ori (t)] ∈ R 6×N where X pos (t) = [x(t); y(t); z(t)] ∈ R 3×N is a matrix containing 3 time-dependent vectors for Cartesian coordinates (positions), X ori (t) = [θ x (t); θ y (t); θ z (t)] ∈ R 3×N is a matrix containing 3 time-dependent vectors for the orientation of the end-effector (e.g., Euler angles), and N is the length of the trajectory (number of time samples). R β,a = [c 1 c 2 c 3 ] ∈ R 3×3 represents the rotation matrix around an arbitrary axis, a, by an angle, β, where c k ∈ R 3×1 represents the k th column vector of the rotation matrix R β,a .
2.1.1. Definition (Landmark) Any entity relevant to the task influencing the behavior of the robot while the task is being performed. It is assumed that the shape of all landmarks do not change over the task. Landmark i is denoted by l i . 2.1.2. Definition (Nominal Setup for Landmarks) Given a certain task, we define, as nominal, a known fixed setup in which the robot has been trained to perform that task. Specifically, the nominal setup, denoted by s nom , is comprised of the fixed positions and/or orientations of all landmarks related to the task.

Definition (Nominal Trajectory)
A given task is modelled as a Gaussian Mixture Model (GMM) using data collected from demonstrations (via kinesthetic teaching) performing the task in s nom . The nominal trajectory of a given task, denoted by X nom (t) = [X nompos (t); X nom ori (t)] ∈ R 6×N , is the estimated mean of the GMM, containing all of the essential relevant features (key motions) of the task.

Motivation and Problem Definition
One can obtain a nominal trajectory for a given task by using the methodology described in [1]. One interesting question is how to generalize the nominal trajectory to a new setup different from the nominal setup (outside the training set) without expanding the training set for retraining the robot. While most Machine Learning/Neural Networks techniques are capable of interpolation, they are not able to produce a proper trajectory for the sets outside the subspace spanned by the training samples [10]. Hence, we need other solutions to the following problem: Let X nom (t) ∈ R 6×N be the nominal trajectory of a given task, T , performed in the nominal setup, s nom . Given a new setup s with the same landmarks, find X new (t) ∈ R 6×N performing T in s , using X nom (t), s nom and s, without retraining the robot? Please note that in the problem above, there are no assumptions imposed on the distances and the positions between the landmarks in the new setup s compared to s nom . Let us consider a trajectory X nom (t) of an arbitrary given task T , with n landmarks in the nominal setup s nom . Now consider a new setup s. At first glance, the solution to the above problem is to rotate and scale each part of X nom (t) lying between two consecutive landmarks l i and l i+1 , according to their rotation and scaling between s nom and s. However, by using this method the key motions of X nom (t) would not be preserved. For example, consider a feeding task with two landmarks: a bowl full of cereal and a human user (the spoon is strictly attached to the end-effector). Scooping cereal is a key motion having a different nature compared to other key motions of this task (e.g., keeping the loaded spoon level). Using the technique in [1], we can find X nom (t) of this task in s nom . Now, let us move the bowl away from the user defining the new setup s. As discussed above, by uniform stretching of the entire trajectory between the only existing landmarks, the robot would continue scooping even outside of the bowl since the corresponding segment of the trajectory within the bowl is also stretched; this would result in the end-effector tipping the bowl over. Hence, uniform scaling and/or rotation of a nominal trajectory (in general) does not guarantee the successful implementation of a given task. To avoid this problem, a segmentation technique is needed to divide the nominal trajectory of a given task into different segments. This segmentation technique should distinguish the different key motions taking place at each part of the task and segment the nominal trajectory accordingly. By having such segments, the nominal trajectory can be scaled and rotated intelligently (i.e., segment by segment) during execution according to the differences between the two setups s and s nom . Hence, we can state our problem definition as follows: Let X nom (t) = [X nompos (t); X nom ori (t)] ∈ R 6×N be the nominal trajectory of a certain task, T , in the nominal setup, s nom .
A. Find all the segments of X nom (t) corresponding to different key motions of the task T .

Problem A: Segmentation of the Nominal Trajectory
In the previous section, we divided the problem into two parts. The first part is intended to segment the nominal trajectory corresponding to different key motions of the given task. A motion is constructed by a set of velocity commands (both linear and angular). At each segment, certain velocities, having specific directions and amplitudes, are commanded to perform the appropriate motion required for that part of the task. Thus, for the purpose of the segmentation, we base our analysis on the nominal velocity trajectory Λ(t) ∈ R 6×N . In lieu of a binary representation as motion/no-motion, we embed the motion in a higher dimensional space by representing Λ(t) as a quantized level-valued signal. Next, we discuss quantization and segmentation.

Quantization
Nominal velocity trajectory Λ(t) ∈ R 6×N is quantized by the following approach: we uniformly quantize linear and angular velocity components into q v and q ω levels, respectively. The length of two consecutive levels in our quantization are denoted by δ v and δ ω for linear and angular velocity, respectively. Thus, the linear velocity signals belonging to interval (−δ v , +δ v ) and angular velocity signals belonging to (−δ ω , +δ ω ), are mapped to level 0 denoting no motion. The choice of quantization steps can be formulated as a detection problem where δ v and δ ω are calculated for a chosen small probability of false alarm 1 . For more details regarding this subject, the readers are referred to literature in detection theory [11]. The number of sets q v and q ω are derived as follows: labeled with d vx , d vy , and d vz corresponding to v x , v y and v z , respectively (e.g., d vx (i) = 2 means that v x at time stamp i falls into the second level of our quantization). Similarly, we use d ωx , d ωy , and d ωz for the quantizations level labels of angular velocities. Now, we have a quantized nominal velocity trajectory of the end-effector denoted by D nom = [d nom (i)] 6×N for i = 1, 2, ..., N where d nom (i) = [d vx (i); d vy (i); d vz (i); d ωx (i); d ωy (i); d ωz (i)] ∈ R 6 is the quantized nominal velocity vector of the end-effector at each time stamp i. It is important to mention that the quantization levels take into account both magnitude and sign of the velocities. This is required because two motions in opposite directions with the same magnitude should be distinguished from each other.

Segmentation
Algorithm D nom is segmented by the following approach: we first uniformly divide D nom into r mutually-exclusive, equally-lengthed units in time called search-segments: The number r is a parameter, chosen such that it is at least the size of the smallest acceptable task segment but significantly smaller than data size N ; one good choice would be: r ≈ N 10 . The center of each search-segment is defined as the average of its members, i.e., if search-segment D nom i ∈ R 6× N r starts at time stamp f i and ends at time stamp f i + N r , its center is calculated as: As previously stated, we aim to merge these search-segments to form the larger segments representing the different key motions taking place in different parts of the nominal trajectory. In order to do that, a measure of similarity between two consecutive search-segments should be defined to decide if two search-segments need to be merged or not. This measure is defined as the absolute value of the difference between the center of search-segment i and i + 1: η(i) = d nom i+1 −d nomi , ∀i = 1, 2, ..., r−1. To reduce the effect of noise, for any two consecutive search-segments i−1 and i, the mean of the defined measure of time stamp i − 1, i.e.,η(i − 1), is compared with η(i): ifη(i − 1) ≤ η(i), it means there exists more similarities between the corresponding velocities in search-segment i and the mean of the members of the very last segment of this step, compared to the similarities between the corresponding velocities in search-segment i and i + 1. Thus, search-segment i merges with the very last segment of this step; otherwise, the very last segment of this step is closed as a complete segment and the search for a new segment initializes with search-segment i being the first member of it. The following pseudo code provides the complete algorithm which returns seg f inal as the output:.

Problem B: Off-line Adaptation of The Nominal Trajectory to a New Setup for Landmarks
Now, by having the segmented nominal trajectory, we can properly scale and rotate each segment based on the differences between s and s nom . Since the nominal trajectory needs to be rotated, here we use the rotation matrix representation for X nom ori (t) rather than Euler angle representation, i.e., at each time stamp t we haveX nom ori (t) = [c 1 (t) c 2 (t) c 3 (t)] ∈ R 3×3 denoting the rotation matrix for the orientation of the end-effector. Now, suppose a segment k starts at time index j and finishes at time index j + N/r . The new segment in X new (t) is derived from the corresponding segment inX nom as follows: X new ori (t) = R θ k ,a k ×X nom ori (t)f or t = j, j + 1, j + 2, ..., j + N/r − 1 where A k = diag(α x,k , α y,k , α z,k ) is the scale factor matrix calculated using the corresponding scale factors in each Cartesian components and t is the time index. The initial condition for this equation, i.e., X new (j − 1) is equal to the last point of the previous segment k − 1, which has been previously properly rotated and scaled using (1). Note that since the shape of landmarks are assumed to be constant and both positions and orientations are tied with time stamps, it is sufficient to only scale the positions. By proper rotation and scaling of all segments, the characteristics of the motions in different segments (key motions taking place at different segments) will remain intact. Finally, by back projectingX new ori (t) to Euler angle representation, i.e.,X new ori (t) → X new ori (t), the appropriate trajectory of the task for setup s is achieved as: X new (t) = [X newpos (t); X new ori (t)] ∈ R 6×N .

Experimental Results
In order to demonstrate the validity of our method, we have chosen the task of feeding cereal to a human user. There are two landmarks involved in the task: landmark l 1 is a bowl full of cereal and l 2 is the mouth of a user which is considered to be a point in the space 2 . The nominal setup is s nom = {x nom b = 0.806, the base frame of the robot. figure 1(a) shows the nominal setup for the task. We use a 7-DOF Baxter robot in order to conduct our experiment. An arbitrary coordinate frame is set to be placed at the center of the bowl and the positions and orientations of the spoon are being recorded w.r.t this coordinate frame. 11 demonstrations of the task are recorded by a human teacher performing the task in the fixed nominal setup s nom with the robot using kinesthetic teaching. Using [1] the nominal trajectory is derived from the demonstrations. Next, we run our segmentation algorithm on X nom (t) consisting of 870 samples. We test with two different number of initial search-segments, r 1 = 28 and r 1 = 68 which set the length of each search-segment to be 870 28 = 32, and 870 68 = 13, respectively. The algorithm results in 6 segments for r 1 = 28 and 7 segments for r 2 = 68 as shown in figures 1. As illustrated in figure 1(b), the algorithm is able to successfully capture total number of 6 segments; 3 segments for the bowl, namely, approaching the bowl (blue segment), scooping (red segment), and leaving the bowl (green segment), 1 segment for the translation (yellow) of the spoon from the bowl to mouth of the user, another segment for the reversed translation (black), and finally the last segment (light blue) as the resetting of the spoon in preparation for the next feeding cycle. For the case where r 2 = 68, we get 7 segments; however, the additional segment shown in figure 1(c) is a very small blue segment at the beginning of the trajectory likely resulting from the fact that increase in the number of search-segments may lead to artifacts related to reduction in window size. Nonetheless, the result that both r 1 = 28 and r 2 = 68 yield almost identical results for segmentation, shows that the algorithm is robust to the initial number of search-segments. Next, we discuss the results for experiments using the 3 different setups: Since the size of the bowl has not changed, as it was assumed in the definition of landmarks, only the parts of nominal trajectory corresponding to the first and the second translation phases (as described above) are scaled by their corresponding scale factor matrices A 1 and A 2 , respectively. Moreover, the whole trajectory is rotated using R θ,a (see equation (1) ,z . We successfully tested our algorithm on these 3 setups. The new trajectories for s 1 , s 2 , and s 3 are shown in figure 2. For better comparison, all the trajectories are plotted in their new bowl frames. As it can be seen, the segments corresponding to the bowl have remained intact while the other segments have been scaled properly to implement the task successfully for each setup.

Conclusions
In this paper, we presented a method (a segmentation algorithm along with an off-line adaptation law) to generalize a learned nominal trajectory of a given task to different setups without expanding the training set to retrain the robot.The method was successfully validated on 3 different setups for a feeding task.