Extraction in Action - Guiding with Reward

5.3 Guiding with Reward

6.1.4 Extraction in Action

Let us now apply the competing experts architecture to the behavior extraction task at the example of two robots. We start with a wheeled robot and continue with the Spherical robot.

Application to the FourWheeled Robot

The first robot we consider is the FourWheeled robot, as depicted in Fig. 6.4(a) and described in Section 2.2.2. The robot has four independently driven wheels. In order to move the robot reasonably a certain degree of coordination is required. To illustrate the physical properties of the robot we use predefined motor values that are described by a periodic impulse function as

yi =







1 sin(0.2t+iρ)>0.5,

−1 sin(0.2t+iρ)<−0.5, 0 otherwise,

(6.23)

with the phase shift ρ. The responses of the robot for different values of ρ are plotted in Fig. 6.4(b). Essentially the robot only moves if the wheels on one side rotate in the same direction, i. e. for ρ=dπ with d= 0,1, . . ..

6.1. Acquisition of Behavioral Primitives 155

Figure 6.5: Distribution of 8 experts on the behavior space during a 30 minute experiment with the FourWheeled robot. Each point represents the state of the robot in terms of driving velocity v and rotation velocity ω at a certain time. (a) Data points as produced by the robot; (b) Clustering with experts. The color represents the number of the winning expert (w). Only points with a low prediction error of the winning expert are plotted. The center of the point-cloud for each expert is marked with a numbered disk. Parameters of homeokinetic controller: C = A = 0.05, 50 Hz update rate; for competition: F = 0.01,p= 10,τE = 20.

When controlled with the homeokinetic controller a coordinated behavior develops very quickly, such that the robot shows straight driving and curved driving with different radii.

For the behavior extraction we use the following parameters: There are r = 8 expert networks with k = 2 hidden units (see Fig. 6.1). The learning rate for the experts is F = 0.01and the penalty factor p= 10. The timescale for the averaging of the prediction error isτE = 20. For the remaining parameters the default values are used, see Section6.1.3.

We let the robot drive for 30 min and recorded the translational and rotational velocity as well as the index of the winning expert. The behavior of the robot in terms of translational and rotational velocities is displayed in Fig. 6.5(a). The resulting clustering is presented in Fig. 6.5(b), where a clear partitioning of the behavior space is observed. There are two experts for forward and backward driving (#1,#2) and two for rotation in place in both directions (#4,#5). Experts #3, #6, and #7 represent curved driving with different radii and expert #8 remains unused. The histogram of winning frequencies of the experts is plotted in Fig. 6.6. The experts do not win with the same probability. This shows that the extraction is not based on the duration and frequency of a behavior but rather its qualitative properties. The trajectory of the robot is displayed in Fig.6.7, where each part is colored according to the winning expert. A clear and stable segmentation of the different behaviors is seen from the start of the experiment on.

If there are many experts available then not all of them receive sufficient training data to learn a behavior. We define a threshold at 1% of the learning time and those experts

1 2 3 4 5 6 7 8 w 0.00

0.05 0.10 0.15 0.20 0.25

rel. freq.

Figure 6.6: Histogram of winning. For each expert the relative frequency to win the competition is plotted. The color-code is the same as in Fig.6.5(b). The gray line at 0.01 marks the threshold under which the experts are considered to be uncommitted, here #8.

-20 0 20 40 60 80

-20 -10 0 10 20

25 30 35 40 45 50

-20 -15 -10 -5

Figure 6.7: Trajectory of the robot in physical space, colored according to the active expert. Theright graph shows a magnified view of the rectangular area marked in theleftgraph. The color code is identical to Fig. 6.5.

6.1. Acquisition of Behavioral Primitives 157

Figure 6.8: Parameter dependence of the number of committed experts. The blue lines stand for the setup with independent robot behaviors and red dotted lines stand for the case of a single recorded robot. For each parameter setting 10 independent simulations have been performed, where the initial weights of the expert networks are randomly chosen. The graphs show the dependence of the number of committed experts on: (a) the averaging timescale τE; (b) the penalty p (in log-linear scale); and (c) the learning rate F. Parameters (if not varied): τE = 5,p= 10,F = 0.005.

that win less than 1% are considered to be uncommitted, see also Fig. 6.6. Let us study how the number of committed experts is influenced by the system parameters. Essentially, there are three parameters to check: the learning rate of the experts F, the averaging constant τE and the penalty p. We use r = 12 experts in order to always have room for uncommitted ones. We use two setups, one with a differently behaving robot³ for each trial and one with a single recorded robot behavior. In the first case the particular sensorimotor data differs in the sequence and duration of behaviors. However, the number of actually shown behaviors is roughly constant. In case of the recorded robot behavior the only source of non-determinism is the random initialization of the synaptic weights (Wⁱ, qⁱ) of the experts. We found that the extraction algorithm has little dependence on the choice of τE and p, see Fig.6.8(a),(b). Unsurprisingly, for a higher penalty we obtain a higher number of experts. The learning rateF, however, has a significant impact on the number of committed experts, see Fig. 6.8(c). For very low learning rates few experts are committed, which is due to a slow learning progress. Thus, the minimal errors decrease slowly and the penalty for suboptimality is less effective. Nevertheless, a wide range of values of the learning rate results in a high number of committed experts, such that F

does not require fine tuning. In general, we find no over-specialization, i. e. that behaviors are not split into many small sub-behaviors, independent of the parameter choice, such that uncommitted and highly adaptive experts remain available for new behaviors.

Let us now have a closer look at the clustering with different numbers of committed experts.

For better comparison we use the same recorded robot behavior as above (Fig. 6.8). In Fig. 6.9 the winning statistics and the clustering of the behavior space for three different values of the learning rate are displayed. When only 4 experts are committed we find a

3The robot was controlled with an independently initialized homeokinetic controller with a different instance of the noise process, thus the behavior is not identical.

(a) 4 Exp. (F=0.001)

Figure 6.9: Different number of committed experts depending on learning rate. The 12 experts are sorted according to their winning frequency. The winning statistics(a-c)and the clustering of the behavioral space (d-f )for simulations with F = 0.001,0.002,0.005 are displayed. For F = 0.001 (a,d) only 4 experts are committed, whereas for F = 0.002we find 5 and forF = 0.005there are 7 experts committed. See also Fig. 6.5 and Fig. 6.8. The underlying robot behavior was identical (a recorded run was played back). Parameters: τE = 5,p= 10.

6.1. Acquisition of Behavioral Primitives 159

ωz

ωy

ωx

Figure 6.10: Spherical robot with angular velocities around internal axes.

symmetric arrangement of experts covering the most prominent behaviors, Fig.6.9(a),(d).

When more experts are committed, a more fine grain clustering occurs. For example, with 5 experts there are two experts for forward driving, one for slower (#5) and one for faster (#4) motion, Fig. 6.9(b),(e). With 7 experts we get different representations for curved behaviors, e. g. left curves with slow forward or backwards speed covered by experts #1 and #5 in Fig. 6.9(c),(f).

Before we turn to theSphericalrobot, let us summarize. The clustering of the behavioral space of theFourWheeledrobot was successfully performed by the proposed competing expert algorithm. The parameter dependence turned out to be graceful and only the learning rate had to be chosen appropriately. It should be emphasized that no over-specialization was observed, such that uncommitted experts are available for future novel behaviors.

Application to the Spherical Robot

A more interesting robot in terms of behavior is the Spherical robot, which was the subject of several experiments, e. g. in Sections 4.8.4 and 5.3.1. We will now apply the competing experts setup to this robot and extract primitive behaviors. In the following experiments we use the three axis-orientation sensors described in 2.2.5. Thus the robot has three motors and three sensors, which results in six inputs and six outputs for the expert networks, Eq. (6.5). As the number of hidden units we have chosen k = 4, instead of 2 in the case of the FourWheeledrobot, because the behaviors have a more complex dynamical structure. For example, when the robot rolls on a flat surface, the sensor values perform a harmonic oscillation with different amplitudes. The frequency of this oscillation depends on the velocity of the robot. If the rotation axis matches one of the internal axes, (as in Fig. 4.28 (p. 107), A-C), then one sensor value has a zero amplitude. The internal masses must perform an oscillation with a suitable frequency and phase-shift in order to produce a coherent rolling behavior. In this way each particular behavior is characterized by an orbit in the sensorimotor space, and the orbits of different behaviors are also partially overlapping. In order to obtain a suitable visualization, we consider the angular velocities

-1 0 Ω_x 1 -1 0 1

Ω_y

-1 0 1

Ω_z

Figure 6.11: Partition of the behavior-space of the Spherical robot with 4 experts. Each point represents the state of the robot in terms of angular velocities around the three internal axes of the robot at a certain time. Note that the angular velocities are not directly accessible by the controller and by the experts. Nevertheless, a clear partition is observed, where only three experts are committed.

Parameters of homeokinetic controller: C =A= 0.1, update rate100Hz, extended world model (Section 4.8.4); for competition: F = 0.001,p= 1,τE = 50.

ωx,y,z around the three internal axes, as depicted in Fig.6.10. A rolling behavior with fixed rotation axis and fixed velocity is represented by a single point in the space of angular velocities.

In a first experiment we provide only r = 4 experts and select a low learning rate and low penalty, such that only 3 experts are committed. The resulting clustering in terms of angular velocities is depicted in Fig. 6.11. Each of the three experts occupies the rotation around one particular internal axis, however, without discrimination of forward and backward motion.

In a longer experiment with r = 20 experts and suitably selected parameters we find a much more fine grain partition of the behavior-space, as depicted in Fig. 6.12. Not only is the symmetry broken between forward and backward rolling around the different axes, but also there are experts for different speeds around one and the same axis, e. g. expert numbers #3 and #4. It should be noted that the experts cannot sense the rotation speed directly, because their inputs are only axes orientations. So in terms of inputs and outputs (sensor and motor values) the clusters are actually periodic orbits in a six dimensional space. This demonstrates nicely the potential of the competing expert approach, which does not depend on the input space but only on the prediction performance.

6.1. Acquisition of Behavioral Primitives 161

-2 -1

0 1

2 Ωx

-2 -1 0 1 2 Ωy

-2 -1

0 1 2

Ω_z

2 3 4 5 10 11 13 15 16 20

Figure 6.12: Partition of the behavior-space of the Spherical robot with 20 experts. Each point represents the state of the robot in terms of angular velocities around the axes of the robot at a certain time. For clarity only a selection of experts is drawn. In contrast to Fig.6.11 the experts specialize to have a specific velocity.

Parameters of homeokinetic controller: C =A= 0.1, update rate100Hz, extended world model (Section 4.8.4); for competition: F = 0.005,p= 10,τE = 50.

1 2

Figure 6.13: Attractor behaviors of the experts for the FourWheeledrobot.

The colored points and the transparent discs show the training data for the experts (points where the expert won the competition), see Fig. 6.5(p. 155) for a description. The black disks mark the attractor behavior of each expert.

Im Dokument Goal-Oriented Control of Self-Organizing Behavior in Autonomous Robots (Seite 162-170)