Evaluation of the Dynamics Representation

#1 - #100

Figure 6.3: Shape variation at end effector that is used for evaluation.

R^F^×N^H that can be updated by incremental least squares algorithms. A more detailed discussion on the learning method and parameter estimation of the readout weights is presented in Section 2.2.2.

d) Selection of Parameterized Task:

For the experiments an evaluation of parameterized 2D end effector tracking tasks is performed, as shown inFigure 6.3. Additionally, the end effector loads are varied in simulation as well as the overall duration for the real robot of the action primitives. As mentioned before the learning of the feed-forward signals assumes that the joint angle trajectories are predefined.

parameterization of the task, the weight of a load that is attached to the end effector of the robot is varied. The evaluation of the generalization properties of optimized forward signals for single instances is shown inFigure 6.3. The tracking performance of the PID controller with a zero forward signal (baseline) is compared to three conditions in which the low-level controller is supported by forward signals gathered by optimization of ILC for a specific shape parameterization (#1,#50 and #100, see Figure 6.3). The parameters for the iterative ILC update have been estimated as K = [k_P, k_D] = [0.005,0.04] by manual tuning with a Gaussian window filter size of 100 timesteps. As it can be seen in Figure 6.4, the tracking error is much lower for the shape parameterizations if the forward signal is optimized for this specific shape (colored vertical bars). The more the shape deviates from the shape for which the for-ward signal was optimized the higher the tracking error, since the used feed-forfor-ward signal was not optimized for the selected shape. If the forward signal was optimized for a shape that strongly deviates from the shape used for optimization, the tracking error of the controller that utilizes the forward signal can be higher compared to the case in which no forward signal is used. In this case, the forward signal perturbs the trajectory tracking and is not beneficial for the feedback controller. This experiment shows that the optimized forward signals are beneficial in a local neighborhood of the task parameterization and generalization for task parameterizations is feasible.

Figure 6.4: Evaluation of generalization capabilities of forward signals with respect to the task parameterization. Resulting tracking error of the 2-DOF arm with zero forward signal (black) is compared to conditions in which the optimized forward signal (FFWD) for a specific shape parameterization is used (#1, #50 and #100).

Based on the aforementioned observations, the evaluation of the generalization capabilities of the parameterized skill is performed in the second experiment. To evaluate the system performance during the presentation of random training tasks, a fixed test set of parameterizations over shapes and load (0-2kg) has been generated.

For each iteratively presented training task instance, the generalization of feed-forward signals of the parameterized skill is evaluated. Given this initial feed-feed-forward signal an iterative update of the forward signal by ILC is performed for optimization.

Iterations are performed until a convergence criterion of the joint tracking error is fulfilled. Subsequently, the optimized forward signal for the given task instance

2 4 6 8 10 12 14 0

2 4 6

Mean of all Trials 95% Conf. Int.

# Training Samples Mean # of Optimization Iterations

(a)

0 5 10 15

0 0.02 0.04 0.06 0.08

Mean of all Trials 95% Conf. Int.

# Training Samples Tracking Error MSE [rad2 ]

(b)

Figure 6.5: Results of 2-DOF arm experiment. (a) The mean number of rollouts that are necessary for optimization by ILC until convergence and (b) the tracking error for parameterized tasks for forward signals decoded from θ_PS = PS(τ) in relation to the number of presented training samples. Results and confidence intervals are based on ten repeated experiments.

is used as a training sample for the iterative update of the parameterized skill.

The evaluation of the generalization capabilities is performed by the estimation of the tracking error on the test set. The results of this procedure are shown in Figure 6.5, the MSE of the trajectory tracking task decreases with an increasing number of presented training task instances. Additionally, it can be observed that the number of iterations that are required to achieve convergence of the ILC for new training tasks decreases as more solutions for tasks have been consolidated by the parameterized skill. This allows for a bootstrapping of the learning process: the more experience the system has in solving task instances the faster it can find solutions for unseen instances. Figure 6.7c-6.7k shows the tracking performance of the end effector for three shape parameterizations as more samples have been presented to the parameterized skill. The results reveal that learning is successful, as the system gradually enhances the precision of the task execution. After the presentation of only two samples a higher variance in the generated samples can be observed, which is caused by the high shape variability of the randomly selected training tasks.

The second part of the evaluation is performed on the Affetto robot platform, as shown in Figure 6.1. Further information on the robot platform is presented in Section 5.2. The Affetto is a humanoid robot child that is driven by pneumatic actuators, as introduced in [Ishihara et al., 2011; Ishihara and Asada, 2015]. For the following experiments 6-DOF of 8-DOF (# 1❖-# 6❖, see Figure 5.3) of one side of the upper body of the Affetto robot are utilized. The generated joint angle trajectories are parameterized by the shape of the resulting end effector trajectory of the right arm. The remaining 2-DOF are assumed to be optional joints and neglected in the following evaluation. As before, experiments are based on parameterized end effector trajectories as described inSection 6.2, but instead of a load, the duration of the actions is varied (1.6-26.6 seconds) by a second task parameter. As for the 2-DOF experiment a kinematic model and the inverse kinematic solver of the VREP simulator are utilized. It is ensured that the generated joint angle trajectories do not contain multiple solutions of the redundancy resolution and can be represented as parameterized functions. The simulation of the kinematics is shown inFigure 6.8a.

The PIDF controller [Todorov et al., 2010] is used as a basis for control of the pneumatically driven joints of the robot and extended according to Section 5.3.2 by an equilibrium model and a reset of the integral component. The controller parameters are optimized by automatic optimization and hand tuning on a test trajectory that includes sine waves and step responses. Further details regarding the low-level control can be found in Section 5.3. A grid search was performed to estimate appropriate parameters for the iterative PD update step of ILC as well as the filter width, as introduced in Section 6.2. The result of the grid search is shown in Figure 6.8b, tracking performance was evaluated for shape parameterization #50.

Based on this evaluation the Gaussian window filter with was set to a width of 20 time steps and the update rate factor to 0.75K. As presented in Figure 6.8b, smaller filter widths or larger step sizes do not result in lower tracking errors but enhance the risk for instabilities during ILC optimization.

For evaluation of the system performance, the same scenario as for the 2-DOF experiment ofSection 6.3.1was selected. The low-level controller for the antagonistic actuators is defined as

u⁺_i =k_F(p^PID_i +p^{of f set}_i + ˆp^PD_i (q) + ˆp^FFWD_i −p^PD_i ). (6.3) It is based on the PIDF EQ I RESET controller as introduced inSection 5.3.2and ex-tended by the forward signal ˆp^FFWD_i . AsFigure 6.6shows, the real robot experiments reproduced similar results as the simulation of the 2-DOF arm. The parameterized skill is able to enhance the generalization incrementally for unseen task parameteriza-tions. The more samples have been used for training of the parameterized skill, the lower the tracking error for the test set. Additionally, it can be seen that the same bootstrapping effect as in the previous experiment occurs, a significant reduction of the required ILC iterations with the gradually enhanced parameterized skill can be observed. As in the previous experiment, the kinematics model was used to visualize

the tracking performance of the end effector for three shape parameterizations during presentation of training samples to the parameterized skill,Figure 6.8c-6.8k.

5 10 15 20 25 30

0 5 10

Mean of all Trials 95% Conf. Int.

# Training Samples Mean # of Optimization Iterations

(a)

0 5 10 15 20 25 30

20 40 60 80

Mean of all Trials 95% Conf. Int.

# Training Samples Tracking Error MSE [%2]

(b)

Figure 6.6: Results of Affetto experiment. (a) The mean number of rollouts that are necessary for optimization by ILC until convergence and (b) the tracking error for parameterized tasks for forward signals decoded from θ_PS = PS(τ) in relation to the number of presented training samples. Results and confidence intervals are based on ten repeated experiments.

(a) Scenario overview (b) Kinematic chain of actuator

-1.5 -1 -0.5 0 0.5 1 1.5

(c)

-1.5 -1 -0.5 0 0.5 1 1.5

(d)

-1.5 -1 -0.5 0 0.5 1 1.5

(e)

-1.5 -1 -0.5 0 0.5 1 1.5

(f)

-1.5 -1 -0.5 0 0.5 1 1.5

(g)

-1.5 -1 -0.5 0 0.5 1 1.5

(h)

-1.5 -1 -0.5 0 0.5 1 1.5

(i)

-1.5 -1 -0.5 0 0.5 1 1.5

(j)

-1.5 -1 -0.5 0 0.5 1 1.5

(k)

Figure 6.7: (a) Experimental setup of the compliant 2-DOF arm experiment. Due to the high compliance of the robot, tracking tasks on the 2D target plane (black line) result in perturbed trajectories (red line). (b) Kinematic chain of the simulated actuator. (c-k) Examples of the generalization of PS(τ) to unseen tasks. For three shape parameterizations and a fixed load, resulting target trajectories for zero forward signal (c-e), with a parameterized skill trained with two samples (f-h) and with 10 presented training samples (i-k) are shown.

(a)

144.15 0.46

131.24 3.24

137.99 3.96

167.57 3.62

216.31 2.69

78.75 9.18

73.14 5.53

70.64 7.51

91.43 7.42

124.96 11.52

66.71 9.14

73.05 5.99

70.02 2.09

67.20 4.58

101.92 4.38

0.25 0.75 1.5

Step size

Filter width

80 100 120 140 160 180 200

(b)

-1.5 -1 -0.5 0 0.5 1 1.5

(c)

-1.5 -1 -0.5 0 0.5 1 1.5

(d)

-1.5 -1 -0.5 0 0.5 1 1.5

(e)

-1.5 -1 -0.5 0 0.5 1 1.5

(f)

-1.5 -1 -0.5 0 0.5 1 1.5

(g)

-1.5 -1 -0.5 0 0.5 1 1.5

(h)

-1.5 -1 -0.5 0 0.5 1 1.5

(i)

-1.5 -1 -0.5 0 0.5 1 1.5

(j)

-1.5 -1 -0.5 0 0.5 1 1.5

(k)

Figure 6.8: (a) Experimental setup of the Affetto experiment. Tracking tasks on the 2D target plane (black line) results in perturbed trajectories (red line). (b) Results of parameter grid search of ILC filter width and step size. Mean minimum reached MSE of three trials and range that includes all trials. (c-k) Examples of the generalization of PS(τ) to unseen tasks. For three shape parameterizations and a fixed load, resulting target trajectories for zero forward signal (c-e), with a parameterized skill trained with two samples (f-h) and with 20 presented training samples (i-k) are shown.

6.4 Interaction in Dynamic Environments by

Im Dokument Multi-modal Skill Memories for Online Learning of Interactive Robot Movement Generation (Seite 154-161)