• Keine Ergebnisse gefunden

4.9 Discussion

5.1.2 Direct Sensor Teaching

In this section we transfer the direct teaching paradigm from motor teaching signals (Sec-tion 5.1.1) to sensor teaching signals. This is a useful way of teaching because desired sensor values can be more easily obtained than motor values, for instance by passively moving the robot, or parts of the robot, assuming that proprioceptive sensors are present.

This kind of teaching is also commonly used when humans learn a new skill, e. g. think of a tennis trainer that teaches a new stroke by moving the arm and the racket of the learner. Thus, a series of nominal sensations can be acquired that can serve as teaching signals. Setups where the desired outputs are provided in a different domain than the actual controller outputs are called distal learning [50, 74, 155]. Usually a forward model is learned that maps actions to sensations (or more generally to the space of the desired output signals). Then the mismatch between a desired and occurred sensation can be backpropagated to obtain the required change of action. The backpropagation can also be done using an inversion of the forward model. Another option is the use of a backward model, which learns the mapping from sensations to actions. The main difference between backward models and inverted forward models is the handling of noisy subspaces. The inverted forward model will expand these subspaces, whereas backward models will shrink them.

In our case a forward model is already at hand, namely, it is given by the internal world model, see Eq. (3.16). Instead of a backpropagation we can invert the world model directly as done in Sections 4.1.2 and 4.1.6. Let the sensor teaching signal be given by xDt . The

distal learning error is the misfitξtD between desired sensationsxDt and predicted sensations

˜

xt (Eq. (4.2)), thus

ξDt =xDt −x˜t. (5.8)

From Eq. (5.8) and using the world model M (Eq. (4.2)) we can calculate a misfit ηSt in motor space that satisfies

xDt = ˜xtDt =! M(xt−1, yt−1tS). (5.9) Using a linearization we can write

ξDt =My0(xt−1, yt−1tS+O (ηtS)2

, (5.10)

where My0 denotes the derivative of M with respect toy. Using the pseudoinverse My0+ of the derivative of the world modelM we can obtain ηtS in a linearized way as

ηSt =My0+(xt−1, yt−1tD. (5.11)

Since our particular implementation of the world model is linear (M(xt, yt) =Ayt+b, cf.

Eq. (4.19)) we can obtain the exact formula for ηSt as

ηSt =A+ξtD. (5.12)

Alternatively, we can calculate the motor teaching signal asytS =A+(xDt −b), which makes it easier to confine them to an appropriate interval, see Section 5.1.1. Now the update formulas for C and h from the direct motor teaching setup are used, cf. Eqs. (5.3, 5.4).

Note that the extended world model (Eq. (4.91)) used in Sections 4.8.4 and 4.8.5 has the same derivative with respect toy, such that the formulas remain identical. Analogously to Eqs. (5.5, 5.6) and by using Eq. (5.11) we find the following update rules for C and h:

1 C

∆C =−∂E

∂C +γD JJ>1 A+ξtD

◦g0

x>t−1, (5.13)

1

C∆h=−∂E

∂h +γD JJ>1

A+ξtD

◦g0

, (5.14)

whereg0 is taken atg0(Cxt−1+h). The guidance factor is here called γD and regulates the strength of the additional drive.

The application to theTwoWheeleddriving robot, as it was done in the previous section, is trivial since the world model consists essentially of a unit matrix. Among the previously considered robots theSphericalrobot has a non-trivial relation between sensor and motor values, hence we will use it in the following experiment.

5.1. Guiding with Teaching 127

Figure 5.2: Sphericalrobot and its behavior in a distal learning setup. (a) Il-lustration of the robot with its sensor values; (b) Behavior for the distal learning task, Eq. (5.15). The plot shows the percentage of rotation around each of the axes for different values of the guidance factor γD (no teaching for γD= 0). The rotation around the red axis is clearly preferred for non-zero γD. The mean and standard deviation of 10 runs each 60 min long, excluding the first 10 min (initial transient, no teaching). Parameters:

C =A= 0.1, update rate 100 Hz.

Experiment

In this experiment we use the Spherical robot, as described in Section 2.2.5. For each axis we have one sensor value that is the z-component of the axis vector in world coordi-nates which is illustrated in Fig. 5.2(a). We use the world model extension as proposed in Section 4.8 with the bias towards self-induced interpretation of sensor values, see Sec-tion 4.8.4.

The objective of this experiment is to show that a simple teaching signal in terms of sensor values can be used to effectively guide the behavior of the Spherical robot towards rotations around the first internal axis. To achieve that, we use the distal learning task that requests a low value of the first sensor. More precisely,

xDt =

thus only the first component of the sensor value produces an error signal. We expect that the robot will preferably rotate around the first axis since this keeps the first sensor value low, see Fig. 5.2(a). To evaluate, we performed for different values of the guidance factor 10 runs each with a duration of 60 min and the robot on the level ground. The distal learning setup requires a well trained world model. Therefore no teaching signal was provided during the first 10 min of the experiment. As a descriptive measure of the behavior, we used the index of the internal axis around which the highest rotational velocity

was measured at each moment of time. Figure 5.2(b) displays for different values of the guidance factor (γD) and for each of the axes the percentage of time it was the major axis of rotation. Without teaching there is no preferred axis of rotation. With distal learning the robot shows a significant preference for a rotation around the first axis up to 75 %.

For overly strong teaching, a large variance in the performance occurs. This is caused by a too strong influence of the teaching signal on the learning dynamics. Remember that the rolling modes can emerge due to the fine regulation of the sensorimotor loop to the working regime of the homeokinetic controller, which cannot be maintained for large values of γD. Why is it not possible with this method to force the controller to stay in the rotational mode around the first axis? The answer is rather intuitive: When the robot is in this rotational mode the teaching signal is negligible. However, the controller’s sensitization will increase the impact of the first sensor, such that the mode becomes unstable again.

To summarize, the direct teaching mechanism proposed in Section5.1.1allows us to specify motor patterns that are more or less closely followed, depending on the strength of integra-tion of the addiintegra-tional force into the learning dynamics. In this secintegra-tion we considered sensor teaching signals that were transformed into motor teaching signals using the internal world model. We have shown that the Spherical robot with the homeokinetic controller can be guided to locomote mostly around one particular axis, by specifying a constant sensor teaching signal at one of the sensors. The supervised learning in terms of sensor signals is one step in the direction of imitation learning. Imitation learning [142] deals with how an autonomous robot can acquire behaviors from other agents or from humans. In this setting the robot can typically perceive the movement via a camera or perhaps via its joint sensors if the demonstrator was moving the parts of the robot’s body. In the latter case the methods proposed here can be directly used. If only visual information is available the correspondence problem concerning the mismatch between the teacher’s body and the robot’s body needs to be resolved [34,49, 134], which is not discussed here.