Experts as Controllers - Guiding with Reward

5.3 Guiding with Reward

6.1.5 Experts as Controllers

The extraction of behaviors as discussed above can only be considered successful if the behaviors can be reproduced reliably by the experts. For that reason we will analyze the quality of the acquisition in terms of behaviors exhibited by the robot when the experts are used as controllers. In order to obtain behavioral primitives it is important that the behaviors are combinable and robust. Thus, it is important that there is an attractor behavior obtained when a particular expert network is controlling the robot. Additionally, the basin of attraction of each attractor behavior is of particular interest. This is the region of the behavior-space from which the attractor behavior is reached. If the basin of attraction is large or even spans the entire space of initial configurations then the expert can be activated independently of the state of the robot. This makes the sequencing of experts especially easy. Remember that the experts are closed-loop controller, hence, the transition from one expert to the next will most likely happen on a smooth transient, which we will see at the end of this section.

FourWheeled Robot

In the case of theFourWheeledrobot we find for all committed experts stable attractor behaviors with global basins of attraction. This is not very surprising because the con-trol of a behavior consists essentially of constant motor values. We use one by one the developed experts presented in Fig. 6.5(p. 155) as controllers for the robot. Starting from different initial conditions we measured the behaviors exhibited after a transient phase.

6.1. Acquisition of Behavioral Primitives 163

Figure 6.14: Attractor behaviors and their basins of attraction of one expert controlling the Spherical robot in the under-represented case of 4 experts.

The expert #4 of Fig.6.11is used as an example. (a)Initial conditions (yellow), original winning points (green), and final states of the robot (black); (b)Final states are parti-tioned in 2 clusters as marked in (a);(c)Initial conditions belonging to the two clusters.

The clusterI (red) has a much larger basin of attraction.

For each expert there is a stable attractor behavior reached. These attractor behaviors are displayed in Fig. 6.13 together with the original partitioning of the behavior space.

The shown behaviors are close to the original center of training points for the forward and backward driving expert. The training points are those where the particular expert won the competition and was allowed to learn the sensorimotor correspondence. With the experts controlling curved driving a small deviation from the training data center is ob-served, but the quality of the behavior is mostly preserved. Since expert #4 and #7 show very similar behavior the acquisition of 6 distinct behavioral primitives has been achieved.

Even if the learning of the experts continues theses primitives are not forgotten due to the exponentially decreased learning rate of the experts, Eq. (6.21). Nevertheless, a certain fine tuning of the represented behaviors can still occur.

Spherical Robot

In the case of theSphericalrobot, the control of the behaviors is much more complicated than for the wheeled robot, because the weights have to be coordinated with the orientation of the robot. We consider again the developed experts as controller for the robot. Firstly, we want to analyze what happens if too few experts are provided, as in Fig. 6.11(p. 160), such that multiple behaviors are represented by the same expert. Let us consider the robot behavior when controlled by expert #4 from Fig.6.11without additional noise in the sensor values. Starting from a large number of different initial orientations and initial rolling velocities the behavior after a sufficiently long time was recorded, see Fig. 6.14(a),(b).

1 4 2

-1 0 Ω_x 1 -1 0 1

Ω_y

-1 0 1

Ω_z

Figure 6.15: The main attractor behaviors of the three committed experts.

For each committed expert (#1, #2, and #4) in Fig.6.11(p.160) the attractor behavior, which has the dominant basin of attraction is presented. See also Fig. 6.14.

We observe two attractor behaviors which are not a single point in the space of angular velocities, but appear as small clouds. With closer inspection we find small limit cycles that corresponds to a slight precession⁴ of the Sphericalrobot. The first attractor cloud, marked with I, lies inside the original behavior and represents the rolling motion around the first internal axis. The second cluster, marked with II, represents a spurious behavior, that is not connected to the training set. To determine the basin of attraction we associate the initial conditions to the two clusters and find that the first cluster has a large basin of attraction, whereas the behavior of the second cluster is only reached from a small subset of the initial conditions, see Fig. 6.14(c).

The remaining two committed experts have a very similar structure of attractors. In Fig. 6.15 the attractor behaviors with a large basin of attraction are displayed for all experts. We find that the main attractor behaviors represent the rotation around all three internal axes in one direction.

Let us now consider the case of many experts, as illustrated in Fig. 6.12(p. 161). We find mostly one attractor per expert with global basin of attraction. A selection of attractors is displayed in Fig.6.16. For each of the three internal axes, a behavioral primitive for forward and backward rolling has developed. Beside that, there are behaviors that correspond to a rotation around an axis different from the internal axes, e. g. expert #5 and #6. The attractors of these two experts show particularly well their orbital structure in the space of angular velocities. This usually corresponds to a slow precession movement of the Spherical robot.

4Precession refers to a cyclic change in the direction of the axis of a rotating object, e. g. a gyroscope.

6.1. Acquisition of Behavioral Primitives 165

3 4

5 6

7 11 12 9

13 14

-2

-1

0 1

2 Ω_x

-2 -1 0 1 2 Ω_y

-2 -1

0 1 2

Ω_z

Figure 6.16: Attractor behaviors for the case of 20 experts with theSpherical robot. The colored points show the attractor behavior for each expert. For clarity only a selection of experts is displayed. Each transparent sphere marks the center of the point-cloud belonging to one expert.

7 14 13

-2

-1

0 1

2 Ω_x

-2 -1 0 1 2 Ω_y

-2 -1

0 1 2

Ω_z

4 7 9 11 13 14

Figure 6.17: Transient behavior when the Spherical robot is controlled by a sequence of experts. Each expert of the sequence #11, #4, #7, #14, #9, and #13 was subsequently controlling the robot for 30 sec. The colors correspond to the controlling expert. The transparent spheres mark the attractor behaviors, see also Fig.6.16. A smooth transient between the attractor behaviors is observed.

The attractor behaviors represent the repertoire of primitive behaviors that have been ac-quired during the online learning phase. In the here considered examples, the basins of attraction of these behaviors are global if sufficiently many experts are provided. This qualifies them as behavioral primitives because they can be arbitrarily sequenced to com-plex behaviors. In order to study the sequencing and the transient behaviors between the primitives we controlled to robot with a sequence of experts. Figure 6.17 shows the trajectory of the Spherical robot in the space of angular velocities for a sequence of 5 experts. Starting from a calm initial position (ωx,y,z = 0), the robot reaches the attractor behavior of the first expert (#11) on a smooth transient. After a certain time the control was switched to the next expert in the sequence and for each transition a smooth tran-sient to the belonging behavior is observed. This shows that the experts, which have been acquired in a self-organized way, serve as combinable behavioral primitives.

Im Dokument Goal-Oriented Control of Self-Organizing Behavior in Autonomous Robots (Seite 170-175)