• Keine Ergebnisse gefunden

Parameterized Skills for Dynamic Action Primitives

Feedback Control Encoding e.g.Dyn.System

e.g. ELM

Affetto or Robot Simulation

+ −

Encoding e.g.DMP Joint SpaceTask Parameter

Error

Real Joint Positions Desired Joint Positions

PressureDifference Controller (PIDF)

Training

P I D

+ + 3

2

4 5

6

8 9

1

Previous Work: Kinematic Representation, see Chapter 3

Feed-Forward Optimization for Current Primitive, e.g. ILC Forward Signal

Low-Level Control:

Iterative Optimization:

Task Specific Generalization:

EQ-Model 7

Figure 6.2: System overview of the proposed skill learning framework. The param-eterized skill PS(τ) is the core component and mediates between high-level task parameters and feed-forward signals that represent the dynamic properties of the robot system. Background color indicates functional grouping and the nested loop structure of task parameterization, feed-forward signal optimization and primitive execution.

sequenced. Training samples are gathered by iterative optimization of the initial guess of the parameterized skill. The experiments evaluate the generalization capabilities of the parameterized skill for forward signals that reduce the tracking error of the feedback controller as well as the iterative optimization of forward signals and online learning.

Figure 6.2 shows the structure of the proposed learning framework: Target trajectories in relation to the task parameterization (Figure 6.2- 1❖) are assumed to be given, as highlighted in red in Figure 6.2- 2❖. The generalization for feed-forward signalspFFWDi=1 (t) for the first iteration i= 1 is performed by the parameterized skill P S(τ) (Figure 6.2- 3❖) and its encoding (Figure 6.2- 4❖). Iterative optimization of the generalized feed-forward signal pFFWDi+1 (t) for one task instance (defined by τ) is given by Fig. 6.2- 5❖. Optimization is performed until convergence of the tracking error has been achieved. The feed-forward signal giving the lowest tracking error pF F W D(t) is used as training target for an incremental update of P S(τ). For action execution, a feedback controller (Figure 6.2- 6❖) estimates a control signal pPIDi (t) based on the current tracking errorei(t). The utilized low-level controller is the PIDF EQ I RESET controller, as it shows the best performance on the robot platform. Details on the low-level controller are presented in Section 5.3.2. The additional equilibrium model based forward signal is represented by ˆpPDi as shown in Figure 6.2- 7❖. The resulting signal that is processed by the outer loop of the PIDF controller is given bypi(t) =pPIDi (t) +pFFWDi (t) + ˆpPDi .

The parameterized skill does not estimate the complete inverse dynamics of the robot system and its environment, as performed in case of classical robot control applications for estimation of pFFWDi (t). The generalization of optimized pFFWDi is based on the high-level task parameterization and is supposed to support the feedback controller.

In the case of the Affetto robot, it is not possible to directly command joint torques or accelerations. To abstract the antagonistic control signals that represent the opening of the valves of the pneumatic chambers, the PIDF controller [Todorov et al.,2010] is utilized as shown inFigure 6.2- 8❖. Further, the low-level controller in-corporates an additional equilibrium model, as discussed inSection 5.3.2, to enhance the precision of the system. This allows to operate withu(t) in the domain of desired pressure differences that correlate to torques at the end effector (Figure 6.2- 9❖). The overall system incorporates three nested loops: 1) Generalization of forward signals and the respective joint angle trajectories for each new task instance; 2) Iterative optimization of generalized forward signals; 3) Execution of the joint trajectory by the low-level controller. A more detailed view on the loop structure of the skill learning framework is presented inSection 2.2.

A crucial requirement for the estimation of optimized feed-forward signals is the repeatability of the generated movements of the robot. As investigated in [Todorov et al.,2010] for a humanoid robot with comparable air valves and actuation principle, resulting end effector trajectories showed proper repeatability under multiple

execu-tions of identical controller signals. Further, the system is faced with a multi-modal representation: The parameterization of the task will affect the desired trajectory as well as the optimal feed-forward signal, e.g. caused by different loads at the end effector, variable stiffness of the actuator or changing trajectory durations. The evaluation metric is the generalization performance of the parameterized skill for feed-forward signals of unseen task parameterizations. It is expected that the more training samples have been presented to the parameterized skill, the better is the generalized feed-forward signal. Consequently, a gradually increasing tracking per-formance as well as a reduced number of required optimization steps to achieve convergence of minimizing the tracking error of the system is expected as well.

6.2.1 Component & Task Selection

In the following, the chosen signal representation, the algorithm for feed-forward signal optimization, the selected learning method and the task variability are in-troduced. The component selection is closely related to the previously presented bootstrapping experiments inSection 3.2.1.

a) Feed-Forward Signal Representation:

The proposed method does not rely on a specific type of policy representation, i.e. compact representation and encoding of forward signals to support the execution of motion primitives. Many methods for compact temporal signal representation have been proposed, e.g. based on Gaussian Mixture Models (GMM) [G¨unter et al., 2007] or Neural Imprinted Vector Fields [Lemme et al.,

2014], as discussed in Section 2.2.2. The presented work relies on a dynamical system representation based on Dynamic Motion Primitives (DMP) [Ijspeert et al.,2013], because they are widely used in the field of motion generation and show good task related generalization capabilities. DMPs for point-to-point motions are based on a dynamical point attractor system. For encoding of feed-forward signals as in Fig. 6.2- 4❖, a variant without scaling invariance is implemented. Feed-forward signal uFFWDj=1 (t) as well as its velocity and acceleration profiles are defined as

¨

pFFWDj=1 =kS(g−u)−kDFFWDj=1 +fFFWD(x,θ) (6.1) The canonical system is typically as a linear decay and the disturbancefFFWD is defined as motivated in Section 3.2.1. For the experiments in this chapter, the number of Gaussians set to K = 20 per DOF for the feed-forward signal representation. The DMP is parameterized by the mixing coefficients θk, gen-eralized by the parameterized skill. Fixed variancesVkand a fixed distribution of centers Ck as in [Ijspeert et al., 2013;Reinhart and Steil, 2015] are assumed.

The representation of the joint angle trajectories is performed in the same way as discussed in Section 3.2.1.

b) Selection of Feed-Forward Signal Optimization Algorithm:

For optimization of feed-forward signals encoded by policy parameters θ given

1984;Longman,1998;Norrloff and Gunnarsson,2002]) is applied. Integration into the framework is shown inFigure 6.2- 5❖. ILC is a method for optimizing control signals and was initially proposed as a solely feed-forward approach.

Application in combination with feedback control was demonstrated as well in [Roover and Bosgra,2000;Bristow et al., 2006]. A successive observation and update of the feed-forward signal leads to a reduction of the tracking error and thereby to a lower feedback controller response. An illustration of the working principle is shown in Section 2.2.2. ILC is widely used in industrial application areas, e.g. for enhancing positioning precision of machines [Chen and Hwang, 2005; Kim and Kim, 1996]. A PD-Type learning function was used for the presented experiments [Bristow et al., 2006]: the feed-forward signal is updated based on a proportional (P) and derivative (D) gain of the current error. ILC is based on a Q-Filter and learning function L. A low-pass filter Q suppresses high frequency learning and contributes to the stability of ILC. Further details can be found in the discussion in Section 2.2.2b and in Figure 2.6. In this case, the Q-filter is given by the representation of the feed-forward signal as the parameterization of a dynamical systems representation (inherent smoothing), additionally a Gaussian filter is applied on the error signal of the joint controller. Iterative adaptation including the update lawL of the forward signal is defined as

uFFWDi+1 (t) = uFFWDi (t) +kPei(t+d) +kD

ei(t+d+ 1)−ei(t+d)

| {z }

Update LawL(ei(t))

, (6.2)

for iterationi, proportional factorkP, derivative factorkD and system delayd.

The errorei(t) over timetis defined by the difference between target joint angles

˜

qi(t) and real joint angles of the current iteration qi(t): ei(t) = ˜qi(t)−qi(t).

For each joint an independent ILC is executed. Due to the high compliance of the application and the pneumatic actuation principle, long and varying temporal delays between the control signal and a response of the actuator can be expected. Therefore, the current temporal delay dof the system depends on the estimation of the time shift with the minimum error between the target and the actuator response: min

d 1 T

PT

t ||q(t)˜ −qj(t+d)||. c) Selection of Learning Algorithm:

To allow the comparison of the methods that are proposed in this chapter to the bootstrapping of parameterized skills as presented inChapter 3, the learner configuration was kept unchanged. For learning of parameterized skills PS(τ) an incremental variant of the Extreme Learning Machine (ELM, [Huang et al., 2006]) was implemented as discussed inSection 3.2.1. As before, hidden Layer size was set to NH = 50 for generalization in joint space. Linear regression is applied on a random projection of the input Winp ∈ RNH×E, a nonlinear transformationσ(x) = (1 +e−x)−1 and a linear output transformation Wout

#1 - #100

Figure 6.3: Shape variation at end effector that is used for evaluation.

RF×NH that can be updated by incremental least squares algorithms. A more detailed discussion on the learning method and parameter estimation of the readout weights is presented in Section 2.2.2.

d) Selection of Parameterized Task:

For the experiments an evaluation of parameterized 2D end effector tracking tasks is performed, as shown inFigure 6.3. Additionally, the end effector loads are varied in simulation as well as the overall duration for the real robot of the action primitives. As mentioned before the learning of the feed-forward signals assumes that the joint angle trajectories are predefined.