• Keine Ergebnisse gefunden

Adaptive synthesis of dynamically feasible full-body movements for the humanoid robot HRP-2 by flexible combination of learned dynamic movement primitives

N/A
N/A
Protected

Academic year: 2022

Aktie "Adaptive synthesis of dynamically feasible full-body movements for the humanoid robot HRP-2 by flexible combination of learned dynamic movement primitives"

Copied!
19
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

HAL Id: hal-01484935

https://hal.archives-ouvertes.fr/hal-01484935

Submitted on 8 Mar 2017

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive synthesis of dynamically feasible full-body movements for the humanoid robot HRP-2 by flexible

combination of learned dynamic movement primitives

Albert Mukovskiy, Christian Vassallo, Maximilien Naveau, Olivier Stasse, Philippe Souères, Martin Giese

To cite this version:

Albert Mukovskiy, Christian Vassallo, Maximilien Naveau, Olivier Stasse, Philippe Souères, et al..

Adaptive synthesis of dynamically feasible full-body movements for the humanoid robot HRP-2 by

flexible combination of learned dynamic movement primitives. Robotics and Autonomous Systems,

Elsevier, 2017, 91, pp.270-283. <10.1016/j.robot.2017.01.010>. <hal-01484935>

(2)

Adaptive synthesis of dynamically feasible full-body movements for the humanoid robot HRP-2 by flexible combination of learned dynamic movement

primitives

Albert Mukovskiya, Christian Vassallob, Maximilien Naveaub, Olivier Stasseb, Philippe Sou`eresb, Martin A. Giesea

aSection for Computational Sensomotorics, Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research&Centre for Integrative Neuroscience, University Clinic, T¨ubingen, Germany

bGepetto Lab, LAAS/CNRS, Universit´e de Toulouse, Av. du Colonel Roche 7, F-31400, Toulouse, France

Abstract

Skilled human full-body movements are often planned in a highly predictive manner. For example, during walking while reaching towards a goal object results in steps and body postures are adapted to the goal position already multiple steps before the goal contact. The realization of such highly predictive behaviors for humanoid robots is a challenge because standard approaches, such as optimal control, result in computation times that are prohibitive for the predictive control of complex coordinated full-body movements over multiple steps. We devised a new architecture that combines the online-planning of complex coordinated full-body movements, based on the flexible combination of learned dynamic movement primitives, with a Walking Pattern Generator (WPG), based on Model Predictive Control (MPC), which generates dynamically feasible locomotion of the humanoid robot HRP-2. A dynamic filter corrects the Zero Moment Point (ZMP) trajectories in order to guarantee the dynamic feasibility of the executed behavior taking into account the upper-body movements, at the same time ensuring an accurate approximation of the planned motion trajectories. We demonstrate the high flexibility of the chosen movement planning approach, and the accuracy and feasibility of the generated motion. In addition, we show that a na¨ıve approach, which generates adaptive motion by using machine learning methods by the interpolation between feasible training motion examples fails to guarantee the stability and dynamic feasibility of the generated behaviors.

Keywords:

robotics, navigation, walking pattern generator, goal-directed movements, movement primitives, motor coordination, action sequences

1. Introduction

The modeling and the synthesis of the online-reactive multi-action sequences is an extremely important topic in both, computer graphics and humanoid robotics. The most challenging problem in the online control of com- plex whole-body behaviors, which is solved apparently effortless by humans, is the flexible coordination of goal-directed movements with the maintenance of dy- namic balance during locomotion. The solution of this

Email addresses:Corresponding author:

albert.mukovskiy@medizin.uni-tuebingen.de(Albert Mukovskiy),christian.vassallo@laas.fr(Christian Vassallo), maximilien.naveau@laas.fr(Maximilien Naveau),

ostasse@laas.fr(Olivier Stasse),philippe.soueres@laas.fr (Philippe Sou`eres),martin.giese@uni-tuebingen.de(Martin A. Giese)

problem requires simultaneously the flexible adaptation of executed upper-body behaviors, e.g. to changing po- sitions of goal objects or obstacles, combined with a control of dynamic balance during walking, in order to avoid that the robot falls. In addition, the robot’s joint torques have to be kept in a feasible range. The detailed analysis of human behavior shows that their motor con- trol is highly predictive, and often optimizes complex behaviors over long time horizons, e.g. lasting over multiple steps. This is fundamentally different from many solutions of this problem in humanoid robotics [Siciliano and Khatib (2016)].

The realization of such complex behaviors for hu- manoid robots with long time horizons for predictive control is a challenging problem. A standard method for the computation of control signals for such high-

(3)

dimensional robots is optimal control. However, the solution of optimal control problems over such long temporal horizons is computationally extremely costly.

With state-of-the-art methods [Koschorreck and Mom- baur (2012)] even the optimization of a single step for a humanoid robot can take minutes, and multi-step se- quences would take even multiple hours. This ren- ders such methods inappropriate for the real-time plan- ning and control of such complex full-body motion se- quences in humanoid robots.

In this paper, we present an approach for the solu- tion of this problem that combines two components.

The first component is an online-capable planning algo- rithm that is based on learned dynamic movement prim- itives, which generates human-like full-body motion se- quences that flexibly adapt to changes in the action space, e.g. displacements of the goal object. The sec- ond component is a nonlinear Model Predictive Control (MPC) system [Naveau et al. (2016, in press)] for the humanoid robot that combines the outputs of a Walking Pattern Generator (WPG) with the panned upper-body motion in a way that guarantees the the dynamic feasi- bility of the resulting full-body motion. An essential el- ement of this architecture is adynamic filterthat appro- priately modifies the planned Center of Mass (CoM) and Zero Moment Point (ZMP) trajectories in dependence of the planned upper-body motion. We demonstrate the novel approach by the control of multi-step sequences that realize highly adaptive reaching and walking to- wards goal objects at different distances, where the sys- tem implements human-like highly predictive control over multiple-steps. The resulting computational com- plexity of this control system is not much higher than the one of the state of the art WPG algorithm [Naveau et al. (2014)].

Our article is organized as follows: In the following section, we give an overview of related work in the ar- eas of computer graphics and humanoid robotics. Sec- tion 3 describes the developed system. This includes a short description of the underlying human trajectory data from a drawer-opening task, of the online planning algorithm that is based on a special form of dynamic movement primitives, and a more detailed description of the integration of this planning algorithm with the MPC-based control architecture of the robot. In Sec- tion 4 we present a variety of results obtained with the OpenHRP robot simulator, which evaluate the proposed approach also in comparison with simpler solutions, and a preliminary test that implemented the planned trajec- tories on the HRP-2 humanoid robot. Limitations and extensions of the developed approach are discussed in the final section.

2. Related Work

Work related to the developed system can be found in biological motor control and related robotics approaches, computer animation, and in humanoid robotics.

2.1. Control of multi-step sequences in biological sys- tems

Biological systems effortlessly coordinate locomo- tion with other goal-directed tasks [Weigelt and Schack (2010)]. A relevant example are studies on the coordi- nation of walking and reaching. The kinematics of this behavior can be approximated by two separate under- lying movement components, which mainly model the periodic locomotion and the non-periodic goal-directed movement [Chiovetto and Giese (2013)]. A recent study [Land et al. (2013)] investigated in detail the underly- ing coordination, using a task where participants had to walk towards a drawer and to reach for an object.

Participants showed highly predictive control in their motor behavior, where within multi-step sequences al- ready the first step was adapted dependent on the posi- tion of the goal object. In addition, participants adjusted their behavior in a way that ensured comfortable reach- ing during in the last step. This behavior is compatible with themaximum end-state comforthypothesis that has been formulated in human motor control [Rosenbaum (2008); Weigelt and Schack (2010)]. In recent work we have tried to reproduce this behavior by an algorithm for trajectory synthesis that is based on learned dynamic movement primitives [Mukovskiy et al. (2015)]. A sim- ilar problem has also been solved by [Gienger et al.

(2010)], who computed optimized stance locations with respect to the position of a reaching target, using a dy- namical systems approach for the generation of reach- ing behavior.

An influential idea in the field of biological mo- tor control has been the concept of movement primi- tives [Flash and Sejnowski (2001); Flash and Hochner (2005)]. According to this hypothesis the coordina- tion of complex movements is based on the combina- tion of lower-dimensional control units, strongly reduc- ing the dimensionality of the underlying control prob- lem. Such primitives have been extracted by unsuper- vised learning from kinematic and EMG data. This idea has been transferred to robotics. [Ta¨ıx et al. (2013)]

extracted primitives from human reaching movements using principle component analysis (PCA), success- fully implementing reaching behavior on an HRP-2 hu- manoid robot. Movement primitives, including the use of force feedback, have also been proposed by [Gams

(4)

et al. (2009, 2013)]. A related important idea is the concept of dynamic movement primitives that generates planned trajectories by appropriately designed nonlin- ear dynamical systems [Schaal et al. (2003); Ijspeert et al. (2003)]. Systems based on dynamic movement primitives have been proposed for the generation of complex movements in real-time [Ijspeert et al. (2013), Ajallooeian et al. (2013)]. But all these online DMP- based methods of modeling the kinematic trajectories do not guarantee the dynamic feasibility of the resulting motion, which is a critical issue.

2.2. Modeling of whole-body movement sequences in computer graphics

The problems of kinematic synthesis of complex whole body movements has been addressed extensively in computer graphics, e.g. [Levine et al. (2012)], and many learning-based approaches have been pro- posed that provide low-dimensional parameterizations of classes of whole body motion [e.g. Hsu et al. (2005);

Safonova et al. (2004); Wang et al. (2008); Li et al.

(2002)]. The generated individual movements can be automatically concatenated into longer sequences, tak- ing into account additional task constraints [Kovar et al.

(2002)]. A relevant example is [Huang and Kallmann (2014)] who modeled the coordination between loco- motion and arm pointing in the final step, by blending and selecting arm pointing primitives dependent on the actual gait phase. All these methods model the move- ment kinematics without taking dynamic constraints into account. A recent example is [Feng et al. (2012)]

who blended motion-captured example motion priori- tized ’stack of controllers’. [Shoulson et al. (2014)]

presented a method where controllers for different body parts are blended, where their prioritization is changed sequentially over time, dependent on the actual action within a longer sequence.

Other work in this domain has developed dynamic filtering techniques in order to adjust such synthesized motion to fulfill dynamic constraints derived from phys- ical models, e.g. for the Zero Moment Point (ZMP), in order to increase the generalization regime of such learning-based methods [Liu et al. (2005)].

2.3. Related approaches in humanoid robotics

In humanoid robotics numerous approaches have been proposed for the synthesis of walking in combi- nation with grasping movements. An example is the DARPA robotic challenge valve manipulation task. For this problem, [Ajoudani et al. (2014)] proposed a hy- brid controller with a goal-driven fast foot step planner

that is combined with visual servoing for the reaching and grasping of the valve. [Kuindersma et al. (2015)]

proposed a control architecture for the humanoid robot Atlas that automatically finds foot steps around and over obstacles, in order to reach for a goal object and to real- ize more complex actions. Other solutions for the com- bination of walking and vision-controlled reaching of a static and mobile targets during walking have been proposed in [Stasse et al. (2008)] and [Brandao et al.

(2013)].

Some researchers have used randomized motion planning algorithms for whole-body walking combined with manipulation tasks in constrained environments [Dalibard et al. (2013)]. For example, [Kanoun et al.

(2011)] proposed a method that is based on a virtual kinematic tree for the planning of foot placements, which was successfully implemented on the HRP-2 robot. A framework that decomposes reach-to-grasp hu- man movements into sequences of kinematic tasks has been developed in [Sreenivasa et al. (2012)]. Further work applied imitation learning [M¨uhlig et al. (2010)], where walking and grasping were modeled as a se- quence of separate actions. A task priority approach based on a generalized inverse kinematics was applied in [Yoshida et al. (2007)] in order to organize several sub-tasks, including stepping and hand motion.

The control of human-like multi-joint systems taking into account contact constraints and guaranteeing dy- namic balance is a challenging approach. Current solu- tions range from near real-time whole body Model Pre- dictive Control with regularized modeling of contacts in order to decrease the associated computational cost [Tassa et al. (2012); Koenemann et al. (2015)] to ap- proaches based on optimal control with precise model- ing of contact phases, requiring typically hours of off- line computation time (e.g. [Koschorreck and Mombaur (2012)]).

A solution based on prioritized IK, that integrates DMPs with MPC for individual actions has been pro- posed by [Vuga et al. (2013)].

3. System architecture

In the following we first give a brief overview of the human data that was used for the training of the prim- itives of our online planning algorithm, and which also provides evidence of the highly predictive coordination of complex human goal-directed movements. Subse- quently, we describe briefly our movement primitive- based online motion planning algorithm and discuss how this planning algorithm can be integrated with the

(5)

Normal step (repeated)

Short step with reaching

Drawer opening and grasping

Action 1 Action 2 Action 3

Parameters optimized according to “end-state comfort hypothesis Step length inferred from

residual target distance

Figure 1:Illustration of the different phases of the human behavior, illustrated in terms of intermediate postures: 1) normal step; 2) step with initiation of reaching; and 3) standing with opening of the drawer and reaching for the object.

model predictive control architecture for the humanoid robot HRP-2.

3.1. Human data

3.1.1. Drawer opening task

The modeling of the coordination of walking and reaching was based on a motion capture data set from humans who opened a drawer. The participants walked towards a drawer, opened it with their left hand and reached for an object inside the drawer with right hand.

The initial distance from the drawer and the position of the object inside it were varied [Mukovskiy et al.

(2015)]; see Fig. 1. The recorded sequences consist of three subsequent actions: 1) a normal walking step (starting with left heel strike and ending with left heel strike); 2) a shortened step with the left-hand reaching towards the drawer. This step showed a high degree of adaptability, and its length was typically adjusted in or- der to create an optimum distance from the drawer for the final reaching movement. This behavior is consis- tent with themaximum end-state comfort hypothesisin motor control, which assumes that motor planning op- timizes the comfort of the end state of planned move- ments [Land et al. (2013)]; 3) the drawer opening com- bined with the reaching for the object while standing.

An example trial is shown in [movie1].

The data set consists of the trajectories of ten trials of single participant, recorded in Univ. Bielefeld with optical motion capture system (Vicon Motion Systems, Oxford, UK) consisting of 12 MX-F20 CCD cameras at a frame rate of 200 fps with a spatial accuracy of about 1.5 mm. PluginGait marker set was used with 41 mark- ers. The length of the individual steps (actions) for the

1https://goo.gl/5HKiG7

individual motion-captured sequences is shown in Fig.

2. This figure very nicely illustrates the predictive na- ture of the motor planning. The size of the second step (yellow) is strongly adapted dependent on the distance of the starting point from the goal position. The lengths of the other steps is much less variable and shows sys- tematic dependence on this distance. (See [Mukovskiy et al. (2015)] for further details about this data set.)

-2.5 -2 -1.5 -1 -0.5 0 0.5

5 9 8 2 7 6 3 10 1 4

1st action 2nd action Dist. to drawer Dist. to object

Distances in meters

Trajectories

Figure 2: Predictive planning of real human trajectories. Dis- tances from the pelvis to the front panel of the drawer (green, yellow, red), and the distance between the front panel and the object (blue) for different trials. (Reproduced from [Mukovskiy et al. (2015)].)

3.2. Preprocessing of training trajectories

The recorded motion capture data was processed and animated in MotionBuilder (Autodesk), using an ’actor’

puppet whose geometric parameters were adapted to the recorded subject. The trajectories were cut, starting at the first heel strike and ending with the object reach- ing. A kinematic model of the HRP-2 robot was created in Maya (Autodesk), neglecting joint angle constraints.

All ten trajectories were retargeted to the HRP-2 model using MotionBuilder. During retargeting the feet posi- tions of HRP-2 were constrained to level ground, and the step sizes were reduced proportionally to the height

(6)

Retargeted human trajectory

Segmentation, IK reconstruction of robot angles

Angles and angular velocities of upper-body Base position, velocity, angle, angular velocity

Yaw angle compensation, arm angles corrected for self-collision

Computation of base velocity and yaw angular velocity

Training data

Figure 3:Offline pre-processing of motion capture data from humans.

of the robot. The resulting joint frame trajectories were exported, using the Denavit-Hartenberg (DH) conven- tion. Trajectories were then segmented by hand, and the durations each action and the corresponding step sizes were stored separately.

The computed trajectories were further analyzed in Matlab (MathWorks) and resampled, resulting in a nor- malized duration of 1.6 sec for each action. The data was split into two subsets, separating the stored pelvis trajectories (time course of pelvis position and pelvis direction in the horizontal plane), and the upper body trajectories (HRP-2 joint angles extracted from DH representation). The pelvis position trajectories were rescaled, ensuring the maximally admissible propaga- tion velocity for the HRP-2 (0.5 m/sec). The pelvis yaw- angle trajectories were rescaled by a constant factor, and a fraction of the yaw angle trajectory was added back to the trunk yaw-angle for compensation. After this com- pensation, customized inverse kinematics (IK) methods were applied to correct the upper body arm reaching motion in order to satisfy joint limit constraints. As in- put to the Walking Pattern Generator (WPG) (see be- low) we used the time courses of pelvis velocities in the horizontal plane, and of the pelvis yaw angular velocity.

An overview of the pre-processing steps is given in Fig.

3.

An illustration of this preprocessing is given in [movie2], which shows the angular trajectories, animat- ing a human avatar, and the corresponding retargeted trajectories for a HRP-2 kinematic model in Motion- Builder.

3.3. Primitive-based online motion planner

The first core component of our architecture is an on- line motion planning algorithm that is based dynamic movement primitives, which are derived from classes of human trajectories by unsupervised learning. Because of space limitations we can here only briefly sketch the

2https://goo.gl/ucbVA2

structure of the online planning algorithm. Further de- tails can be found in [Mukovskiy et al. (2015)].

3.3.1. Learning of kinematic primitives

The first step in the construction of dynamic move- ment primitives is the learning of a low-dimensional representation of a set of motion-captured trajectories that spans the space of relevant behaviors, by a super- position of a small number of source or basis functions.

By this form of dimension reduction the relevant behav- iors can be generated by a very small number of cou- pled dynamic movement primitives (s.b.). Contrasting with many related approaches for the modeling of tra- jectories, which exploit for example PCA or ICA (e.g.

[Safonova et al. (2004)]), we fit such trajectory sets by a generative model that is known in acoustics asanechoic mixing model. Opposed to the instantaneous mixing model that underlies PCA and ICA models for trajec- tory representation, the anechoic model allows for time shifts of the superposed components. We have shown elsewhere [Omlor and Giese (2011)] that the anechoic mixing model for many types of movements result in representations with a much smaller number of source functions (typically by factor two), for equal approx- imation quality in comparison with standard methods such as PCA or ICA. This low dimensionality is essen- tial since it determines the dimensionality of the state space of the nonlinear dynamical system that generates behaviors adaptively online.

Mathematically, the anechoic mixture model is defined by the equation:

ξi(t)

|{z}

angles

=mi+X

j

wi jσj

t−τi j

| {z }

sources

The joint angle trajectoriesξi(t) were derived from the original motion capture data that is temporally seg- mented into the three subsequent actions (s.a.). The nor- malized action trajectories are approximated by a linear mixture of the source signalsσj(t), weighted with the mixing weightswi j. The individual sources are shifted

(7)

1 2 3 4 5 6 7 0.7

0.75 0.8 0.85 0.9 0.95 1

Number of source signals

Approximation quality, Q

Q = 1

Anechoic PCA Fixed delays (action 1 only) Fixed delays (for actions 2 and 3 with extra sources

Figure 4: Approximation quality as function of the number of sources for all three actions, comparing anechoic demix- ing without constraints (blue) and PCA (green). Thepurple dottedline indicates approximation quality for the first action with fixed delays across trials. Thered dashedline indicates approximation quality with two additional sources (with fixed delays).

in time with the delays τi j. The means of the angle trajectories are indicated by the variables mi. Source functions and model parameters were learned, apply- ing anechoic demixing algorithms described in [Om- lor and Giese (2011); Chiovetto et al. (2013)]. For the application presented in this paper, in addition to the learned source functions, which approximate the peri- odic signals components, we used an additional non- periodic source component, which was pre-specified.

This component was given by the fixed functions0(t)= cos(πt/T), whereT was the cycle time of the learned periodic source function with the lowest fundamental frequency.

In order to model the multi-step sequences we learned such a representation using a step-wise regression ap- proach. The whole training data was first used to fit the mean valuesmi and the weights corresponding to the non-periodic source function. The residuals of ac- tion 1, the normal walking step, were then approximated by three periodic source functions, applying a modified algorithm that constrains all time delays for the same source function to be equal across all trials (but allow- ing different delays for different joint angles). This con- straint simplifies the spatio-temporal blending between different motion patterns, at the cost that more sources have to be introduced for an accurate approximation (cf.

Fig. 4). The second and third action then were approxi- mated using the sources introduced for the normal walk- ing step, and two additional periodic sources that were added in order to account for the residuals, where the same constraint was applied to the estimated time de- lays. Fig. 4 shows the obtained approximation quality in

time time Additional sources: actions 2+3

nonperiodic source

-1.5 -1 -0.5

0 0.5

1 1.5 -2 -1 0 1

2 Sources: 1st action

Figure 5:Estimated source functions.

comparison with a PCA model, and with models with- out constraints for the time delays. It is possible to ac- complish a quite accurate approximation with a total of four sources for the first action and two additional pe- riodic sources for actions 2 and 3. And the resulting shapes of the learned source functions for the approxi- mation scheme of 4+3 sources are shown in Fig. 5.

3.3.2. Online planning of multi-action sequences Our online planning algorithm for whole-body move- ments generates the trajectories as solutions of nonlin- ear dynamical systems that is based on dynamic move- ment primitives, which are derived from the kinematic primitives described in section 3.3.1. Dynamic move- ment primitives have been proposed in robotics before [Schaal et al. (2003); Ijspeert et al. (2003)], and similar approaches have been described in [Gams et al. (2009);

Petriˇc et al. (2011); Buchli et al. (2006)]. These pre- vious approaches, however, exploit no dimension re- duction for the learning of the kinematic primitives.

We have previously demonstrated the suitability of our approach for the adaptive online generation of com- plex multi-step sequences, coordinated with arm move- ments, and for the animation of coordinated crowds of agents [Mukovskiy et al. (2015); Giese et al. (2009)].

We constructed dynamic movement primitives (DMPs) from the kinematic primitives described in the previous subsection. For this purpose, we mapped the

(8)

Canonical DS Periodic sources

time

Mixing model

Joint angles

3D positions x(t)

SVR

s (t)0

time

Non-periodic source Kinematic model

xi(t) = m + i

S

j w sij j(t - tij) s(t - t)

j ij

timing control

DMP-based online synthesis

Figure 6:Architecture for the online synthesis of body move- ments using dynamic primitives, [Giese et al. (2009)].

state space of simple nonlinear canonical dynamical system onto the values of the learned source functions.

These nonlinear mappings were learned using Support Vector Regression (using a Radial Basis Function ker- nel, exploiting the LIBSVM Matlabr library [Chang and Lin (2001)]). In this way the source signals can be generated online as solutions of a nonlinear dynamical system. The canonical dynamical systems related to different primitives were dynamically coupled in order to ensure a synchronization of the corresponding states.

The resulting architecture is summarized in Fig. 6. A more detailed discussion of the design of this coupling and its relationship to the stability of the resulting dynamics is given in [Mukovskiy et al. (2013)]. The online generated source signals are then used as inputs for the anechoic mixing model, which defines the planned joint angle trajectories. An overview of the underlying architecture is shown in Fig. 6.

As canonical dynamics for the periodic DMPs we chose a limit cycle oscillator (Andronov-Hopf oscil- lator), which is given by the equations (ω defining the eigenfrequency), and the pair of state variables [x(t),y(t)]:

˙

x(t)=[1−(x2(t)+y2(t))]x(t)−ωy(t)

˙

y(t)=[1−(x2(t)+y2(t))]y(t)+ωx(t))

Since the attractors of this nonlinear systems can be mapped onto circle in the phase plane, delays can be represented by a rotation of the vectors in state space by

an angle that is proportional to the delay. In this way, we are able to model coupled networks with delays between different CPGs by a set of coupled set of differential equations without explicit delay times (see [Giese et al.

(2009)] for further details). The instantaneous phase of the leading DMP, which generates the periodic solution with the lowest frequency also was used to control the timing of the non-periodic source.

In order to plan online highly flexible behaviors, with an adaptation of steps and reaching behavior to the goal position, we learned nonlinear mappings from task pa- rameters onto the mixing weights of the anechoic mix- ing model. The task parameters were the were steps lengths and durations. Mappings were learned from training data, applying Locally Weighted Linear Re- gression (LWLR) [Atkeson et al. (1997); Mukovskiy et al. (2015)].

For the synthesis of multi-step step sequences the step lengths was computed from the actual estimated target distance. For this purpose we tried to reproduce the dependencies between the individual step sizes and the distance to the goal. In the human data, the reaching dis- tance of the arm (action 3) is positively correlated with the distance to the object inside the box, and negatively with the length of the previous step. These dependen- cies were modeled by linear regression and exploited for the computation of the reaching distances while stand- ing. For the second step, the step length was adjusted in order to realize a maximum-comfort distance for reach- ing. The length of the other steps then was adjusted accordingly. Step ranges were computed from the train- ing data, and an appropriate number of additional steps was automatically introduced when the target could not be reached within three steps. A more detailed descrip- tion of the algorithms for the smooth interpolation of the mixing weights, ensuring smooth transitions between the different steps is given in [Mukovskiy et al. (2015)].

Figure 7 illustrates the high degree of flexibility of this online planning algorithm. For this example, the goal (drawer) jumps away from its original position while the agent is approaching it. The algorithm adapts online to this perturbation, generating very human-like adaptive coordination, and it includes automatically an additional step in order to compensate for the suddenly increased distance to the drawer. See alsomovie3. 3.4. Integration of online-planning with model-

predictive control of the HRP-2

The central innovation of our work is the integration of the described online-planning algorithm with a con-

3https://goo.gl/9fLzO7

(9)

t=0.00s

t=1.18s

t=1.95s

t=3.30s t=2.36s t=1.94s

Figure 7: Adaptivity of online planning. If the goal (drawer) jumps away during the approaching behavior, automatically an additional step is introduced. Overall, a very smooth human-like whole-body coordination pattern emerges.

trol architecture for the HRP-2 humanoid robot, which is based on nonlinear model predictive control (NMPC).

This does not only involve the combination of trajecto- ries derived from human data, as described in section 3.2, but it requires specifically the approximation of hu- man data by dynamically feasible trajectories, exploit- ing NMPC framework. These feasible trajectories form a novel training set, from which a new set of optimized dynamic primitives was derived.

3.4.1. Overview of control architecture

The control architecture for the HRP-2 robot is shown in Fig. 8. It consists of three main building blocks. The online kinematic synthesis algorithm, which was laid out in section 3.3, provides input to the control archi- tecture (shaded box in Fig. 8) in terms of two sets of variables: the velocity and angular velocity of the Cen- ter of Mass (CoM) (variablesvrefandωref), and the joint angles of the upper bodyqupper body.

The first building block is a Walking Pattern Gener- ator (WPG) that computes from the variables vref and ωref, for one gait cycle, foot placementsxfeetand trajec- tories of the CoMxCoMand of the Zero Moment Point (ZMP)xZMPthat ensure the dynamic stability of the gait [Vukobratovi´cand Stepanenko (1972)]. This computa- tion is based on model predictive control (MPC), and further details about the underlying computations can be found in section 3.4.2 and in [Naveau et al. (2016, in press)].

The second block is a Dynamic Filter (DF) that cor- rects the preplanned foot, CoM, and ZMP trajectories, taking into account the planned upper-body motion, re- sulting in the corrected trajectoriesxcorfeet,xcorCoMandxcorZMP. The DF operates in closed-loop together with WPG, and further details about the underlying algorithms are described in [Naveau et al. (2016, in press); Stasse (2013)].

The third building block is generalized inverse kine- matics (IK) module that implements a ’Stack-of-Task’

approach. This module combines the corrected CoM and ZMP trajectories, and the upper-body motion (spec- ified by the joint angles). This module outputs joint angle trajectories for the legs and the upper-body that respect the dynamic stability constraints of the robot, at the same time approximating, as far as possible, the planned behavior of the upper body. For this purpose the task of stabilizing the locomotion is given the high- est priority, and the approximation of the planned tra- jectories is realized in the null-space of the control sig- nals for this prioritized task. The resulting optimization problem is solved by a sequential quadratic program- ming approach (QP solver).

(10)

Dynamic Kinematic

Pattern

Synthesis Walking Pattern Generator

Dynamic Filter

Genera- lized

IK ,

SoT Robot

(position control)

Estimator

(robot and objects

relative positions) sensor data

scene parameters q, q, q . ..

[v , w

ref ref

]

upper body

q

x

CoM

x

ZMP

x

feet

x

CoM

x

ZMP

x

feet

cor cor

Figure 8:Control system for the humanoid robot HRP-2. The Walking Pattern Generator computes foot positions and CoM and ZMP trajectories, which are further adjusted by the Dynamic Filter, dependent on the planned upper body motion. The resulting trajectories are consistent with the dynamic stability constraints of the robot. The approximation of the planned upper body movement and dynamic stability of walking are guaranteed by a Stack of Task approach, where optimal trajectories are computed by sequential quadratic programming. (See text for further details.)

The resulting optimal trajectoriesq are dynamically feasible and can be realized by the low-level controllers of the HRP-2 robot. During motion execution, the real- world environmental and task parameters and the cur- rent state of the robot are fed back to the kinematic plan- ner, closing the control loop for an adaptive interaction between online planning and MPC in the real world.

3.4.2. WPG based on optimal predictive control The Walking Pattern Generators (WPG) based on Model Predictive Control (MPC). The first WPG of this class was proposed by [Kajita et al. (2003)]. This method computed the reference nominal Zero Moment Point (ZMP) trajectory from the desired placements of feet during the gait cycle. A simplified linear inverted pendulum dynamics (’Cart-Table Model’) was used to link the Center of Mass (CoM) and the ZMP. Preview control was exploited for computing the CoM trajec- tory from the desired ZMP. Due to the model simplifica- tions, the real ZMP trajectory deviates from the desired one. This deviation is the result of neglecting the iner- tial and Coriolis forces generated by the leg swing and by fast movements of the upper-body. In order to alle- viate this problem, the authors ran the full body inverse dynamics in order to compute a better approximation of the real ZMP. This new ZMP can be computed for the preview horizon in real-time. The resulting ZMP error was transformed into a resulting CoM error via the Preview Control, following the approach proposed by [Kajita et al. (2003)]. This result can then be ex-

ploited to correct the CoM trajectory. The described two steps of preview control combined with an evalua- tion of the inverse dynamics can be repeated iteratively, successively reducing the ZMP error. This approach for the dynamic correction can be interpreted as a kind of Newton-Raphson iteration [Stasse (2013)], and was re- ferred to as Dynamic Filter in section 3.4.1.

Another improvement of MPC-based WPG is the in- tegration of the computation of the optimal ZMP tra- jectory within the constrained quadratic optimization framework that computes the optimal CoM trajectory [Wieber (2006)]. This approach requires only the speci- fication of the preplanned foot positions as input, return- ing the optimal trajectories for the ZMP and the CoM.

Our approach for nonlinear MPC relies in addition on another improvement of the same framework made by [Herdt et al. (2010)], which is the further extension of the approach by [Wieber (2006)]. This reformulation of the optimization framework allows to exploit positional and angular velocities of the CoM as reference trajecto- ries (for a time horizon of the next two steps), returning the foot placements and the optimal ZMP trajectories as result of the nonlinear predictive control problem. This framework (which is described in detail in [Naveau et al.

(2014, 2016, in press)]) was exploited in our system.

3.4.3. Generation of the dynamically feasible training data

In order to link the described approach for the online synthesis of movements with the NMPC approach de-

(11)

scribed above, we transform a set of human-compatible movement trajectories that were generated by interpola- tion from the original human data into trajectories that result in dynamically feasible behavior of the robot. For this purpose, we approximated the human-like trajec- tories by ones generated by physics-based simulations, exploiting the NMPC framework discussed in section 3.4.1. This training of our learning-based approach us- ing dynamically feasible training data is one of the key concepts of our proposed approach. The details of retar- geting and transformation in dynamically feasible tra- jectories of the training data are discussed in section 4.3.

4. Results

In the following we first briefly discuss the compu- tation time of our approach. Then we present some re- sults on the online kinematic planning algorithm. We then present results of the performance of the method in the off-line mode, where it was used to reproduce the behaviors of retargeted training trajectories with- out adaptation to new step sizes or goal distances. We demonstrate that the obtained behaviors indeed are dy- namically feasible and can be implemented on the real HRP-2 robot. We then demonstrate the performance of the system in case when adaptive behavior is planned dependent on the actual goal positions. In the last sec- tion we compare the robustness of the proposed method that integrates MPC with learning-based online plan- ning with a simpler machine learning-based approach, which realizes control by interpolation between learned whole-body angle trajectories which have been derived from training examples that were dynamically feasible.

It turns out that such a more na¨ıve machine learning ap- proach in many cases results in instability and infeasi- bility of the produced behavior.

4.1. Computation time

The kinematic pattern synthesis algorithm has a com- putation time around 81-86 ms for the whole trajectory (1280 time steps, each 5 ms) on modern a CPU (Intel(R) Xeon(R) CPU E3-1241 v3, 3.50GHz, Ubuntu 14.04).

The kinematic synthesis required only when target goal changes its position, and this computation time is be- low the buffer size for the preview control (100 ms).

The average computation time of the optimization prob- lem involving WPG-DF iterations is 4 ms (see Naveau et al. (2016, in press)), which is below the duration of control time-step (5 ms). The whole algorithm is thus realtime-capable. In contrast, the optimal control ap- proaches typically require several hours for the off-line synthesis of multi-step sequences.

4.2. Primitive-based synthesis of kinematic trajectories In order to validate the primitive-based online plan- ning algorithm we generated a set of highly human-like novel full-body movement behaviors, varying the initial distance of the actor from the goal object, including a spectrum of distances that were not part of the training set. In order to judge the human-likeness of the inter- polated behaviors, we did not retarget the movements to the robot kinematics and illustrate them as movies, using a human avatar. We learned 3 sources for the approximation of the first step, and another two extra sources for approximating the residuals of the second and third step (”3+2 sources”). The training data set consisted of 10 human joint-angle trajectories.

From the model trained with this data new trajectories were generated by interpolation, and the agent’s propa- gation velocity and rotation of the base (pelvis) were computed from the feet-ground contact events. The step distances from this simulation served as task parameter for the Linearly Weighted Regression (LWLR).

For all tested novel goal distances the algorithm gen- erated very human-like highly coordinated three-step sequences. This is illustrated in themovie4that shows step sequences for total goal distances between 2.34 and 2.94 m, all of which were not in the training set. Be- haviors for goal distances above 3 meters are shown in movie5, where the algorithm introduces automatically additional gait steps in order to ensure that the agents reaches the goal. Also these behaviors look amazingly human-like.

The capability of online replanning is demonstrated inmovie6(see also Fig. 7). I this case, the goal jumps away from the agent during the approaching behavior towards a more distant position, where it cannot be reached anymore with the originally panned number of steps. The online planning algorithm automatically in- troduces additional steps and adjusts the others, so that the behavior can successfully accomplished, again re- sulting in quite human-like appearance of the generated behavior.

4.3. Approximation of training trajectories by robot movements

In order to validate our new architecture, we first tested the system by the realization of open-loop con- trol, simulating a physical model of HRP-2 robot that

4https://goo.gl/Pn7atI

5https://goo.gl/JBz216

6https://goo.gl/9fLzO7

(12)

was implemented using the OpenHRP simulator, and also testing generated behaviors on the real robot.

In a first set of simulations the robot started from a parking position and makes a transition to a normal step.

At the end of this step the pelvis velocities (propagation and angular) were determined and used as initial condi- tions for the generation of a three-action sequence. At the end of the last action, a step back to the final park- ing position was generated by spline interpolation of the pelvis angular and positional coordinates between the final state of the last step of the action sequence and the final position, introducing two additional steps on the spot. We also generated examples of four-action sequences. For this purpose, the retargeted trajectories were extended by an additional normal walking gait cy- cle. In order to augment the training data set for the learning of the mappings between the task parameters and the model parameters, we generated additional ar- tificial kinematic data by scaling of the pelvis forward propagation velocities for all gait cycles uniformly (by the factors 0.8, 0.92, and 1.2), while keeping the up- per body trajectories fixed. In this way we generated a total of 30 training examples from the original 10 mo- tion capture trials. Examples of the generated three- and four-action sequences are shown inmovie7.

These trajectories were dynamically feasible for the robot, but still based on movement primitives learned from human data. In order to construct optimized prim- itives for the control of the robot, we generated 30 tra- jectories that were simulated with the OpenHRP physics simulator of the robot as novel training data and learned from this novel optimized movement primitives. For this purpose, the trajectories were approximated using 4 sources for the approximation of the first step, and 3 additional ones for the approximation of the residuals of the other steps, because this resulted in the best ap- proximations with a small number of sources (Fig. 5).

A systematic validation of the approximation qual- ity, dependent on the number of learned sources, is pre- sented in Fig. 9. This figure shows histograms of the re- production errors of the step sizes of the first two actions and the resulting arm reaching distance for the last ac- tion for different choices of the number of source func- tions. In all cases the spatial errors of the parameters, realized by the full control system, are small, always below 10 mm and often below 5 mm. This shows that in spite of the high complexity of the operations that are necessary to transform the original human motion

7https://goo.gl/7IZ0P1

-6 -4 -2 0 2 4 6

0 5 10 15 20

reconstruction error [mm]

Step length: 1st action

-6 -4 -2 0 2 4 6

0 2 4 6 8 10 12 14 16 18

“3+2”

“4+3”

“5+4”

reconstruction error [mm]

Step length: 2nd action

-8 -6 -4 -2 0 2 4 6

0 2 4 6 8 10 12 14 16 18

reconstruction error [mm]

Arm reaching distance: 3rd action

number of trialsnumber of trialsnumber of trials

“3+2”

“4+3”

“5+4”

“3+2”

“4+3”

“5+4”

Figure 9:Reconstruction accuracy of the step-sizes and reach- ing goal distances for different numbers of sources. The figure shows the histograms of the spatial errors. The first number in- dicates the number of sources learned from the first step (ac- tion 1), and the second number the number of sources used to approximate the residuals of the other actions (2 and 3).

(13)

into a motion sequence that is feasible for the robot the final control system produces movements that approxi- mate the planned step sizes and reaching distances quite accurately.

1 2

3 4

5 6

Figure 10: Real HRP-2 robot performing a 4-action walking- reaching sequence in the laboratory of LAAS/CNRS.

Some of these feasible re-synthesized trajectories were also tested using the real HRP-2 robot (cf. Fig.

10). A demonstration of the resulting behaviors for the three-action sequence is shown inmovie8, and a four- action sequence is shown inmovie9.

8https://goo.gl/jjAVfT

9https://goo.gl/RqT6Q3

0 1 2 3 4 5 6 7 8 9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

6.0 6.5 7.0 7.5 8.0 8.5

0.70 0.71 0.72 0.73 0.74 0.75 0.76

time (s) time (s)

meter (m)

meter (m)

x

ZMP

x

ZMP

x

ZMP

x

ZMP reference unfiltered DF (legs only) DF (full body)

Figure 11: Trajectories of the Zero Moment Point (ZMP) (in walking direction, x coordinate) for different architectures.

The blue curvexreferenceZMP indicates the reference ZMP trajec- tory computed from the linear inverted pendulum model. The green curvexunfilteredZMP shows the ZMP trajectory without filter correction. The trajectory with filter correction of all degrees- of-freedom is indicated in orange color xDF(full body)

ZMP , and the case where the dynamic filter was only applied to the lower- body degrees-of-freedom is indicated by the magenta trajec- toryxDF(legs only)

ZMP .

We also quantified the improvement of the behav- ior resulting from the inclusion of the dynamic filter in comparison with an architecture without this stage. Fig.

11 shows thex-coordinate trajectories of the Zero Mo- ment Point (ZMP) for different model variants: 1) the idealized inverted pendulum model, which provides a reference trajectory for the underlying MPC approach (solid blue line xreferenceZMP ); 2) the architecture without the dynamic filter correction (green dashed-dotted line xunfilteredZMP ); 3) application of the dynamic filter only to the lower body degrees of freedom, assuming the up- per body degrees-of-freedom to be freezed (magenta dashed-dotted line xDF(legs only)

ZMP ) and 4) when the dy- namic filter takes in account the full body motion (or- ange dashed-dotted linexDF(full body)

ZMP ). The trajectory of a model without dynamic filter correction (green) devi- ates significantly from the planned reference trajectory (blue). The inclusion of the dynamic filter results in a much better approximation of the reference trajectory (orange color curve). This correcting effect of the dy- namic filter is significantly reduced when it is only ap- plied to the lower body degrees-of-freedom (magenta curve). This implies that only if the dynamic filter is ap- plied to all degrees of freedom the robot motion is close to the planned dynamically feasible motion.

4.4. Inference of adaptive behaviors for novel gait dis- tances

In order to test the architecture, with an online gener- ation of new behaviors (step lengths and reaching move-

(14)

ments) dependent on the actual state of the robot, we synthesized the control signals for 30 different 4-action sequences, where a spectrum of step sizes was gener- ated by linear morphing of the sources’ weights. The first normal walking step length spanned 30 values in the range of 50.5 and 56.1cm, and the size of the second step was linearly sampled within the interval between 16.3 and 35.9cm. The reaching distance of the box in the last step varied in the interval of 66.3 to 75.5cm.

The distance between the object and the front of the drawer was varied within the interval between 12.4 and 27.3 cm. The generated behaviors for the most extreme step sizes (smallest and largest) are shown in Fig. 12.

Movie10 shows these action sequences. For all tested intermediate step sizes that were not part of the initial training set very human-like coordinated behavior was generated.

In order to validate more precisely whether the gener- ated closed-loop behaviors reproduce details of human grasping-reaching behavior we quantified the generated step sizes for different goal distances and starting po- sitions. Figure 13 shows the step sizes generated for 10 different combinations of the two task parameters:

distance of initial standing positions from the goal, and position of the object in the drawer. Consistent with the results in humans, the generated behavior shows a weak positive correlation between position of the object in- side the drawer (blue) and the reaching distance of the arm (red). This reaching distance is almost constant, realizing the principle of the optimization of end-state comfort. The bars in the other colors indicate the dura- tions of the starting step, the initial walking step, and the stopping step, which are changed in an adaptive manner similar to the behavior shown in Fig. 2. The starting gait cycle is not present in the human behavior, and is required in order to ensure a correct initiation of the first step from the parking position.

A more quantitative assessment of the performance is given in Fig. 14, which shows the variability of the ZMP in the lateral plane. The figure compares feasible trajectories, which are generated by the WPC from orig- inal trajectories without interpolation to novel step sizes or goal distances, with the behaviors of the system for novel goal distances that were not part of the training set and that required adaptation of the behavior using the online planning architecture. We compared again the behaviors for the choices of different number of sources for the anechoic mixing model (in total between 5 and 9 sources). The analysis is based on 30 newly synthesized four-action sequences for novel goal distances.

10https://goo.gl/IcwrXb

shorter steps longer steps

Figure 12: Synthesized behaviors with the full close-loop ar- chitecture, simulated with the OpenHRP simulator for the two most extreme goal distances.

The ZMP trajectory in the lateral plane was computed within all stance intervals, and the standard deviation (STD) of the difference between this trajectory and the reference ZMP trajectory was computed. The figure shows error bars with mean and variances as well as the maximum ranges of the variation. The ZMP variability is relatively independent of the number of sources for the reconstruction of trajectories and even for an infer- ence of novel step distances the variability is not sig- nificantly higher that for original trajectories generated with the WPG. This shows that the closed-loop system produces highly stable behaviors in terms of the varia- tion of the ZMP.

(15)

-2 -1.5 -1 -0.5 0 0.5 1

2 3 4 5 6 7 8 9 10

Starting gait cycleWalking gait cycleStopping gait cycleDistance to the boxObject in the box

Distances in meters

Trajectories

Figure 13: Simulation of multi-step sequences with the full closed-loop architecture using the OpenHRP simulator. The behavior reproduces details of the step-length relationships in humans, Specifically it reproduces adjustments which result in a relatively constant reaching distance during the last step, compatible with themaximum end-state comfort hypothesis.

The results are shown for 10 simulated trials, aligning the po- sitions according to the front of the drawer. Different colors refer to different steps within the sequence. (See text.)

4.5. Comparison with simple machine learning ap- proach

One might ask if the proposed complex architecture is really necessary, and if one could not just learn dynam- ically feasible trajectories generated with the WPG and interpolate between the corresponding full-body kine- matic angle trajectories using machine learning tech- niques. This approach would be based on the hope that the generated interpolations of the control signals also result in dynamically feasible behaviors when the train- ing trajectories were dynamically feasible. We tested our method against such a simpler approach.

For this test we created training data consisting of 30 dynamically feasible walking-reaching trajectories, which were directly generated by the MPC-based WPG.

Each of these trajectories results in dynamically stable behavior of the robot. The resulting full-body angle tra- jectories were again approximated with anechoic mix- ing models with different numbers of sources (between 5 and 9). Based on this training data 30 new trajecto- ries for the new goal distances were computed, using either the simple machine learning approach discussed above, or with our method of learning upper-body and base trajectories separately.

The behaviors generated with the simple machine learning approach often result in falling of the robot, specifically during the last action (box opening and reaching for the object, where both arms are extended).

The instability frequently also emerges earlier, already after the robot stops during the reaching step. A demon-

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

min, max values standard deviation reconstructed trajectories

2 + 3 sources

3 + 4 sources

4 + 5 sources

Inferred trajectories for new step distances Standard deviation of ZMP [mm]y Trajectories generated by WPG (training data)

Figure 14: ZMP variability in the lateral plane (y-direction).

See the text for more explanations.

stration of this behavior is given in movie11, which compares the behavior generated by the na¨ıve machine learning approach with the stable behavior obtained with our method. The parameters of the target behav- iors are exactly identical for the two simulations.

A further quantitative analysis is given in Tab. 1, that shows how often the robot fell down out of the 30 novel synthesized behaviors. The simulations are grouped according to the speed of the walks. In addition, we tested interpolations generated with different numbers of source functions for the machine learning approach, and compared this with our method using 4+3 sources.

For the low speed behaviors the machine learning ap- proach leads to stable behaviors in some cases, and to falling in others, where the success of the method varies in a non-systematic manner with the number of source functions used for the approximation. For the fast speed movements the simple machine learning approach al- ways results in falling in a significant number of cases.

Opposed to this result, our method always results in sta- ble behaviors without falling.

The superiority of our approach is also confirmed by an additional analysis of mechanical parameters that de- termine whether the behaviors can be realized on the real HRP-2 robot. Figure 15 shows the peak values of the ankle pitch torques for behaviors created directly us- ing the MPC-based WPG, behaviors generated with the

11https://goo.gl/6hbX6g

(16)

The distribution of falling events

Algorithm slow speed (tr.1-10) middle speed (tr.11-

20) fast speed (tr.21-30)

”ML 3+2” 0 10 7

”ML 4+3” 1 0 2

”ML 5+4” 0 5 10

Our method 0 0 0

Table 1: Fraction of trials with falls of the robot within 30 test trials with novel goal distances that were not part of the training set. Simple interpolation using machine learning techniques, approximating the trajectories with different numbers of sources (ML) is compared with our method that integrates online planning with the MPC control system. (For the ML conditions, the first number indicates the number of sources for the approximation of the fist action, and the second number the additional sources introduced for the approximation of the other steps).

na¨ıve machine learning approach (ML) of approximat- ing the full body angle trajectories, and our method. For the na¨ıve machine learning approach almost all torque peak values exceed 30 Nm, which is infeasible for the robot (red shaded region in Fig. 15). This is especially true if this approach is used for the learning-based in- ference of the new trajectories. Contrasting with this result, the torques for behaviors generated directly with the WPG and the ones generated with our method are always in the feasible range. This is true for both, for the off-line reconstruction and for the learning-based in- ference using our method, and independent of the used number of source functions.

A similar result emerges for the analysis of the ground reaction forces (maximal normal force of the feet during the 4-action sequence). The maximum ad- missible force for the real HRP-2 is 800 N. Figure 16 shows that for the na¨ıve ML approach in many cases the ground reaction force is too large compared to this limit, except for the reconstruction with 9 sources. Especially for the synthesis of new inferred behaviors, the peak ground reaction forces are always infeasible. This con- trasts with the results obtained with our method. Here in all cases, for the off-line reconstruction and for the learning-based inference, the ground reaction forces are always in the feasible range and quite similar to the peak values that are obtained when the behavior is directly computed by the WPG using MPC.

Summarizing, we think that these results convinc- ingly show that the proposed architecture provides a sig- nificant benefit over simpler approaches that just inter- polate between control signals obtained from training data that corresponds to stable behaviors of the robot.

The integration of online planning with the MPC-based control architecture in combination with the dynamic filter results in always stable and robust behavior, even largely independently of the accuracy of the learned tra- jectory model (number of source functions).

5. Conclusions

We have presented an architecture that combines the highly flexible online planning of coordinated full-body movements, based on learned dynamic movement prim- itives, with a control architecture that is based on a Walking Pattern Generator, which exploits nonlinear Model Predictive Control. The proposed architecture is suitable for the online generation of human-like highly coordinated full-body movements with long planning horizons. It generates dynamically feasible behavior of the robot, ensuring appropriate balance control during walking in presence of fast online replanning.

To our knowledge, no other presently existing ap- proach allows the realization of such human-like long- term predictive motion planning in combination with a guarantee of dynamic balance during walking in combi- nation with other tasks for the upper body of bipedal robots. Common alternative approaches, such as the optimization of such complex behavior by model-based optimal control approaches [Koschorreck and Mombaur (2012)] are presently computationally too costly to al- low the online generation of such complex, where even the optimization of short multi-step sequences can take easily hours of computation time with the presently available optimization methods. The functionality and flexibility of the proposed architecture was demon- strated by simulation using the OpenHRP physics sim- ulator and also in trials on the real HRP-2 robot. In addition, the proposed system realizes predictive motor behavior that is compatible with the end-state comfort hypothesis [Weigelt and Schack (2010); Rosenbaum (2008)]. Similar approaches have been proposed for the off-line optimization of reaching behaviors before (e.g.

[Gienger et al. (2007, 2008)]).

The shown results represent only a first feasibility test of the proposed architecture, and they demonstrate that a single highly complex behavior can be robustly im- plemented on the HRP-2, resulting in robust behavior,

Referenzen

ÄHNLICHE DOKUMENTE

As all experimental conditions had a highly significant, medium to large main effect on both the torque amplitude and variation (Tables 3, 4, respectively), it can be assumed that

These projects deal with the neurobiological correlates of different NEUROGES values (Deutsche For- schungsgemeinschaft DFG LA 1249/1-1, LA 1249/1-2;

During the study, the participants increased the proportion of work they did at home or in tran- sit from 24 to 33 percent (while working the same number of hours per week).. It

It was proved in [13] that these bodies coincide if and only if L is centrally symmetric about z (the ‘if’ part follows easily from Brunn–Minkowski theorem).. In view of Theorem 1,

Since the body model measurements were more precise than the Vicon cali- bration, the marker trajectories were scaled by the factor lBM/lMC, where lBM is the distance of two markers

Our findings suggest that (a) an intuitive graphical user interface for providing discrete feedback can be used for robot learning of complex movement skills when using

Our experiments evaluate the generalization capabilities of the parameterized skill for forward signals that reduce the tracking error of the feedback controller as well as

Identification performance (mean ± SD) of the Fourier-based anechoic demixing algorithm (FADA), fast independent compo- nent analysis (factICA), and non-negative matrix