• Keine Ergebnisse gefunden

9. Emphasizing the “positive” in positive reinforcement: using nonbinary rewarding for

9.5 Results

9.5.3 Example 3: Teaching abstract cues

9.5 Results

27E). We then applied an NB-PRT regime that only provided a low reward for RTs longer than 400 ms. Trials with RTs faster than 300 ms received a high reward, and all other RTs were rewarded as during the preceding block of binary rewarding. This regime is much more likely to allow the monkey to distinguish the association between different behaviors and different reward amounts. Note, however, that only 15.1% of trials were treated differently than during the preceding session with binary rewarding, whereas all other trials received exactly the same reward as before. Nevertheless, this NB-PRT regime had a strong effect on the monkey’s performance and caused a clear leftward shift of the entire RT distribution (Fig.

27E). Based on Mann-Whitney statistics (Z = 7.32; P < 10–12, n = [1,352, 1,658], R = 0.13), the estimated probability EP for getting a faster response in the NB-PRT regime was 0.58.

We applied a similar regime to M2 (Fig. 26F) after it had finished fixation training and obtained essentially the same result (Z = 6.77; P < 10–10, n = [851, 1,198], R = 0.15, EP = 0.59). Thus, even with a simple fixation paradigm, graded rewarding exerted clear effects on RT distributions, indicating that the trial-wise reward outcome has a direct impact on an animal’s performance. If properly chosen, graded rewarding can support the willingness of the animal to spent effort and provides a useful tool to guide the animal toward the desired behavior.

9.5 Results

introduce target events (change in color, motion direction, orientation, or the like) at uncued items, which have to be ignored by the animal. With binary rewarding, responses to such uncued events are considered errors, resulting in the termination of the trial [Galashan et al.

2013; Reynolds et al. 1999; Treue & Maunsell 1996; Wegener et al. 2004].

Teaching the animal the meaning of such abstract cues frequently is a matter of patience.

From the perspective of the trainer, the task rule might be straightforward because reward is only provided for responses to the cued item, yet, from the perspective of the animal, the situation is less clear. Even with selective attention directed to the cued item, a target event occurring at the uncued item usually is not out of sight but may be well perceived [Braun &

Sagi 1990; Wegener et al. 2008]. Responding to this event is exactly what has been trained before; for the example in Fig. 28A, the monkey would be perfectly in line with the previously learned task rules if it indicated a speed change of any of the two stimuli, independently of the cue. With a binary reward, there is no clear alternative to guide the animal toward the desired behavior other than adjusting the number of errors by some means,

Fig. 28: Training of abstract cues. A:

visual display. The monkey was required to gaze at the fixation point (FP) in the center of the display. A rectangular frame indicated the mutual position of the object to undergo the to-be-detected speed-change event (left). Gabor stimuli appeared subsequently at 2 FP-mirrored locations (right). The desired behavior was to use the cue information for allocating spatial attention (bold dashed lines). B: in valid trials, the speed change occurred at the cued object, whereas it occurred at the uncued object in invalid trials. Validity (indicated as color-coded percentages) was changed over the course of training sessions. C: empirical reaction-time (RT) distributions for cued and uncued changes.

Insets depict the mean parameters (µ, mean; σ, variability) of ex-Gaussian fits to each of the RT distributions, depending on cue validity. Color code is same as in B. D:

relative amount of reward obtained in response to cued and uncued changes, depending on cue validity. Trials with medium reward are disregarded for simplicity. E: average reward amount (vol.) per 100 trials during early (90%

validity) and late (75% validity) training sessions. Error bars are SD.

9.5 Results

assuming that at some point the animal makes the association between cue appearance at one of the stimulus locations and reward delivery following a correct response to that stimulus.

NB-PRT, in contrast, provides additional options for shaping the monkey’s behavior, relying on the monkey’s natural expertise: getting the highest reward. In this and the following example, we show how NB-PRT can be used to teach abstract cues within a few sessions.

The first example continues the training situation of monkey M1, described above (Figs. 25 and 27, A-C). The spatial cue indicating the position of the upcoming target had been introduced when the second stimulus was added to the display, as mentioned previously. Yet, the cue had no obvious meaning with respect to uncued events, because such events were not present in earlier sessions. To use nonbinary rewarding for teaching the meaning of the cue, we chose a Posner paradigmlike design [Petersen et al. 1987; Posner 1980] and presented speed changes at the uncued location in a fraction of trials. We rewarded responses to such uncued events in the same way as responses to cued events, with high, medium, and small reward for fast, medium, and slow RTs, respectively. The rationale of this scheme was to test whether the monkey would distribute attention over the whole display (likely resulting in RTs of medium length and a relatively safe amount of medium reward) or, alternatively, use the information provided by the cue and direct attention to one stimulus selectively (increasing the chance of a high reward for fast RTs, at the risk of getting a small reward for slow RTs if changes occur outside the attentional focus). If the monkey mainly relies on maximizing reward in the short term (i.e., in single trials) and/or the reward regime is chosen to provide a higher mean reward per trial in the long term (i.e., over the entire session), choosing the second of the two options is beneficial.

We started with a cue validity of 90% and then reduced it to 85%, 80%, and 75% (Fig. 28B).

Stepwise decreasing cue validity was chosen to slowly accustom the animal to uncued changes. Each validity was kept for six sessions, with the exception of the 80% condition, which was applied in only one session. To foster selective attention, the speed-change magnitude was reduced from 100% to 80% for all sessions. For different cue validities, RTs were analyzed by fitting ex-Gaussians to their distributions (Fig. 28C). With reduced validity, RTs to cued changes became increasingly faster [µ(σ)[90 85 80 75]: 303 (22), 295 (23), 291 (23), 287 (19) ms; Wilcoxon rank sum test(val90, val75), Z = 2.8, P = 0.0051, n = 6], whereas RTs to uncued changes became increasingly slower due to an increase in the exponential component

9.5 Results

of the distribution (τ[90 85 80 75]: 45, 66, 120, 143 ms; Z = 2.8, P = 0.0051). Based on Mann-Whitney statistics, the estimated probability EP of getting a faster response for cued changes was 59.6% in the 90% validity condition and increased to 77% in the 75% validity condition.

This shows that the training by RT-dependent rewarding was highly effective, even though responses to uncued changes were not treated as errors. At the end of the training, the median RT difference between cued and uncued changes was as large as 50 ms. This is about the same magnitude as observed in human psychophysical experiments using the same stimuli and paradigm, after verbal instructions [Wegener et al. 2008].

If reward amount is not considered, this significant, cue validity-associated RT effect may seem surprising at first glance, because a higher number of uncued changes is expected to promote distributed rather than spatially selective attention. However, a closer look at the reward schedule helps to explain the behavior of the monkey: the nonbinary rewarding allowed it to select a 50% reward benefit for fast RTs and a 33% loss for slow RTs, compared with medium RTs. With decreasing cue validity, considering the cue information and allocating attention to the cued object not only helps the monkey to get a high reward more often in a single trial but, at the same time, compensates a probable loss in average reward amount in the long run. In fact, for cued changes, the monkey significantly increased its ratio of high-reward trials from 0.57 in the 90% validity condition to 0.74 in the 75% validity condition (Wilcoxon rank sum test, Z = 2.48, P = 0.013, n = 6; Fig. 28D). This was at the expense of reward for uncued changes, which showed an increase in the ratio of low-reward trials from 0.07 to 0.3 (Z = 2.81, P = 0.005). Thus, despite reduced cue validity and higher uncertainty of target location, focusing on the cued targets allowed the monkey to not only keep the same amount of reward obtained per trial but even to achieve a slight (although insignificant) increase in the average amount of reward per trial (Z = 1.52, P = 0.128; Fig.

28E).