• Keine Ergebnisse gefunden

II. STUDIES

II.1 Study 1:Feedback-Related Brain Activity Predicts Learning from Feedback in

II.1.2. Introduction

II.1.4.1 The feedback-locked P300 75

The P300 has been shown to play an important role both in feedback processing and in memory encoding. In decision-making tasks, the feedback-locked P300 has been shown to vary with reward expectancies (Hajcak et al., 2005; Hajcak et al., 2006) and reward

magnitude (Yeung & Sanfey, 2004), but not always with feedback valence (Yeung & Sanfey, 2004; but see Bellebaum & Daum, 2008; Bellebaum, Polezzi, & Daum,2010; Hajcak, Moser, Holroyd, & Simons,2007; van der Helden et al., 2010), which led to the idea that it is related to the update of outcome expectancies. In memory tasks, the P300 was related to learning success mainly under conditions where rote learning, rather than elaborate processing, was involved (Fabiani et al., 1990). Both findings are consistent with the idea that the P300 is related to the updating of working memory and, hence, to explicit encoding of new

information (Donchin, 1981; Polich, 2007; Polich & Kok, 1995). The present results integrate findings from literature on feedback processing and memory. First, we showed that the P300 is directly related to the processing of feedback about the correct response. This is suggested by the finding that positive feedback is associated with a larger P300 than is negative

feedback (e.g., Hajcak et al.,2007). Second, the P300 for negative feedback was predictive of learning from feedback, replicating the dm effect (Paller et al., 1987) in the context of

feedback processing. Together, these findings suggest that the P300 reflects a learning process that integrates both feedback about the initial response and information about the correct response.

To account for these results, we propose that the P300 to corrective feedback

represents a learning process triggered by a feedback-based evaluation of the initial response in working memory. This account is based on the idea that learning from feedback is

particularly fast and efficient if feedback confirms that a response that is already held in working memory is correct. In this case, an immediate update of working memory is triggered in which the correct response is linked to the probe and further context information that facilitates retrieval. The P300 could represent either working memory update itself or,

alternatively, the decision process by which the feedback is evaluated (for a similar idea in the context of error detection, see Steinhauser & Yeung,2010). Because participants normally hold only the selected response in working memory, fast learning indicated by a large P300 is obtained mainly for positive feedback trials. This can also account for the observation that

performance is improved for items that were associated with a correct guess in the initial block. Under some conditions, however, fast learning can also occur for negative feedback trials – for example, because participants hold more than one response in working memory, or because participants decide for one answer but press another response. In this case, there is an increased P300 also for negative items, and this explains why the P300 is predictive of

successful learning from negative feedback.

II.1.4.2 The frontal positivity

A second component that predicted successful learning from feedback was the early frontal positivity that immediately succeeded the P300. Butterfield and Mangels (2003) found a similar component following successful learning from corrective feedback in a semantic retrieval task. Because the frontal positivity varied with the expectedness of negative feedback, they assumed a functional relationship between this component and the P3a or novelty P3 – a component of the stimulus-locked ERP reflecting an attentional orienting response to a novel or unexpected stimulus (Simons et al.,2001). Whereas the P3a typically precedes the posterior P300 (or P3b), this order seems to be reversed in the present data. This might reflect that the feedback-locked P300 is related to a fast learning process, whereas the frontal positivity represents attentional orienting that precedes a slower, more elaborate learning following negative feedback. If this interpretation is valid, a reduced frontal positivity in E–E items could reflect that a lack of attentional orienting has increased the probability that learning from feedback failed.

The finding that learning from corrective feedback is predicted by a frontal positivity resembles the results from the literature on memory processing that reports a late frontal positivity that predicts later memory retrieval (Fabiani et al.,1990; Kim et al., 2009; Mangels et al., 2001; Weyerts et al.,1997). Although the attentional orienting response represented by the early frontal positivity is presumably related to later – perhaps more elaborate – memory processing, it is un-likely that both phenomena represent identical mechanisms. The present frontal positivity occurs in the time range of the P300, whereas the late frontal positivity typically starts at 600 ms or later. The fact that we did not obtain a late frontal positivity that predicts learning success might be related to the present stimuli. Because our Swahili words convey no meaning by themselves, it is difficult to improve learning by more elaborate processing.

II.1.4.3 The FRN

Finally, we obtained an FRN that was larger for negative feedback trials than for positive feedback trials. Given that the FRN has been assumed to reflect reinforcement

learning (Holroyd & Coles, 2002), this result clearly demonstrates that feedback in the present task is processed as a reinforcer. However, the amplitude of the FRN was not positively correlated with learning success. Rather, larger FRN amplitudes were even associated with impaired learning, although this result was obtained only when peak-to-peak amplitudes were considered. These results could reflect that reinforcement learning is an automatic process, which cannot be prevented but which does not improve learning success in the present task, presumably because learning requires explicit memory processes. The increased FRN for E–E items could reflect that on some trials, participants adopted a more reinforcement-related strategy at the cost of explicit learning (e.g., by attending more strongly to feedback valence than to information about the correct response), which enhanced the FRN on these trials while reducing the probability that an item was learned.

These results are consistent with a recent finding from Mangels et al. (2011), who investigated the relation between feedback-related brain activity in a complex math test with multiple-choice items and the participants’ decision to voluntarily engage in further learning following negative feed-back. They found that, at least in a stereotype threat condition, a strong FRN implied that participants were less willing to review the correct solution of a math problem. These and the present results are in accord with the common assumption made by dual-process theories that tasks typically involve both implicit and explicit processes and that performance sometimes reflects the one or the other (Ashby et al.,1998; Frank & Claus, 2006). They further highlight the role of strategies for learning in multiple-choice testing and illustrate how these strategies are reflected by ERP components.

II.1.4.4 Conclusion

Taken together, the present results suggest that two feedback-locked ERPs predict successful learning from corrective feedback in multiple-choice testing: the feedback-locked P300 and the early frontal positivity. We suggest that these ERPs are related to two different stages of learning. The P300 reflects a fast learning process based on working memory processes. In contrast, the frontal positivity reflects an attentional orienting response that

precedes slower learning of correct response information. Finally, we obtained an FRN, which, however, was not positively correlated with learning success. This finding suggests that feedback in multiple-choice testing is processed as a reinforcer, although reinforcement learning can have even detrimental consequences for future performance.

II.2. Study 2: Effects of Invalid Feedback on Learning and Feedback-Related Brain Activity in Decision-Making

This section was submitted for publication as

Ernst, B., & Steinhauser, M. (under revision). Effects of invalid feedback on learning and feedback-related brain activity in decision-making. Manuscript submitted for publication.

II.2.1 Abstract

The present study investigated how learning from feedback in decision-making is impaired when relevant feedback is combined with irrelevant and potentially invalid feedback. We analyzed electrophysiological markers of reinforcement learning (FRN, feedback-related negativity) and feedback processing in working memory (feedback-locked P300) in a simple decision-making task, in which participants processed feedback stimuli consisting of relevant and irrelevant feedback provided by the color and meaning of a Stroop stimulus. Whereas invalid, irrelevant feedback impaired learning, the absence of an FRN to irrelevant feedback indicated that this effect was not due to reinforcement learning. Rather, irrelevant feedback valence influenced the P300 to relevant feedback, suggesting that learning decrements reflect an interfering effect of irrelevant feedback on the processing of relevant feedback. These results indicate that detrimental effects of invalid, irrelevant feedback result from failures of feedback processing in working memory rather than from automatic

reinforcement.

(143 words) Keywords: decision making; feedback processing; event-related potentials; feedback-related negativity; P300

II.2.2 Introduction

Optimal decision-making crucially relies on the ability to improve decisions based on the evaluation of feedback. However, feedback is often ambiguous providing a mixture of valid and invalid information. For instance, a teacher may tell a student that her answer was correct while making an annoyed facial expression. Even if the student knows that the oral feedback is relevant and valid, the irrelevant and invalid facial expression might impair learning. The goal of the present study was to investigate whether learning is impaired when relevant and valid feedback is presented together with irrelevant and potentially invalid feedback. By considering electrophysiological indices of feedback processing, we aimed at examining two potential mechanisms underlying such an effect: First, we tested whether processing irrelevant feedback directly triggers erroneous learning. Second, we examined whether processing irrelevant feedback impairs learning from relevant feedback.

In recent years, it has been shown that feedback about the outcome of a simple decision triggers a cascade of event-related potentials (ERPs) that reflect different aspects of learning and feedback processing. The so-called feedback-related negativity (FRN) refers to a negative deflection reaching its maximum around 200 to 300 ms after feedback onset at fronto-central electrode sites (Miltner, Braun, & Coles, 1997; Holroyd & Coles, 2002).

Because the FRN is more pronounced for negative feedback than for positive feedback, Holroyd and Coles (2002) proposed that it reflects a negative prediction error conveyed by the midbrain dopamine system, which indicates that the outcome of a decision is worse than expected, and which is used as a reinforcement signal that guides learning in the basal ganglia. This account received support from the finding that the FRN is not only influenced by feedback valence but also by the expectedness of positive or negative feedback (Holroyd

& Krigolson, 2007; Holroyd, Pakzad Vaezi, & Krigolson, 2008) and that the FRN amplitude predicts the strength of learning from feedback (Bellebaum & Daum, 2008; Cohen &

Ranganath, 2007; Philiastides, Biele, Vavatzanidis, Kazzer, & Heekeren, 2010; van der Helden, Boksem, & Blom, 2010).

A second feedback-related component - the feedback-locked P300 – is a positivity peaking at posterior electrode sites between 200 and 600 ms after feedback onset. While the FRN is generally larger for negative feedback, effects of feedback valence on the P300 amplitude are rather inconsistent. Most studies found the P300 to be larger for positive feedback (Bellebaum & Daum, 2008; Bellebaum, Polezzi, & Daum, 2010; Ernst &

Steinhauser, 2012; Hajcak, Moser, Holroyd, & Simons, 2007; Holroyd, Baker, Kerns, &

Müller, 2008; Wu & Zhou, 2009; Zhou, Yu, & Zhou, 2010) while others showed a larger P300 for negative feedback (Mathewson, Dywan, Snyder, Tays, & Segalowitz, 2008; Frank, Woroch, & Curran, 2005) or no valence effect at all (Holroyd & Krigolson, 2007; Li, Han, Lei, Holroyd, & Li, 2011; Yeung & Sanfey, 2004). Whereas the P300 following stimuli in simple decision tasks has been associated with attentional processes or the updating of working memory (Donchin & Coles, 1988; Nieuwenhuis, Aston-Jones, & Cohen, 2005;

Polich, 2007), the P300 following feedback has been interpreted as reflecting the evaluation of action outcomes (Squires, Hillyard, & Lindsay, 1973; Holroyd & Coles, 2002; Sato et al., 2005; Yeung & Sanfey, 2004). Despite of the ongoing debate about the exact functional significance of these components, the evidence described above suggests that the FRN is more related to fast and automatic reinforcement learning in the basal ganglia whereas the feedback-locked P300 is associated with feedback evaluation in working memory (see Case, Swainson, Durham, Benham, & Cools, 2010).

In the present study, we considered these components to examine how irrelevant and potentially invalid feedback influences learning in decision-making. To achieve this, we constructed a simple task in which participants could optimize their decisions (and thus maximize their pay-off) by learning from feedback. The task required participants to decide which one of two characters was associated with a reward. Each character pair was presented a first time in a learning phase and a second time in a test phase. In the learning phase, the decision relied entirely on guessing and feedback had to be evaluated to learn the correct response. Then, in the test phase, correct responding was associated with a reward. In this way, performance in the test phase could be used as an indicator of how efficiently participants learned from feedback in the learning phase.

Crucially, the feedback stimulus presented in the learning phase was ambiguous and provided two types of feedback. On the one hand, there was a relevant feedback that validly indicated whether the response was correct or not. On the other hand, there was an irrelevant feedback that also provided information about the correctness of the response, but this

information was valid on half of the trials only. Participants knew at any time which feedback was relevant and which was irrelevant. To ensure that the irrelevant feedback was still

processed under these conditions, relevant and irrelevant feedback was realized using Stroop stimuli (Stroop, 1935), that is, colored words whose meaning also referred to a color (e.g., the

word BLUE in yellow color). In the present case, the relevant feedback dimension was the word color (e.g., blue for correct, yellow for incorrect), whereas the irrelevant feedback was the word meaning, which could be valid or invalid depending on whether it referred to the same (e.g., BLUE in blue color) or to the alternative color (e.g., YELLOW in blue color).

A first question was whether the irrelevant feedback has a detrimental effect on learning from relevant feedback, even if participants know that the word is irrelevant. The advantage of using Stroop stimuli is that it is virtually impossible to ignore the word meaning, which is demonstrated by the finding that speeded naming of the color is typically strongly affected by the nature of the word (Stroop effect; for a review, see MacLeod, 1991). However, even if the word is encoded automatically and even if this delays the identification of the color, this does not necessarily imply that it also impairs learning from feedback provided by the color. To examine whether this is the case, we first analyzed whether performance in the test phase was impaired if the feedback stimulus in the learning phase contained an invalid word as compared to when it contained a valid word.

Provided that such an effect exists, a second question was whether the detrimental effect of irrelevant feedback reflects either learning from irrelevant feedback, or impaired learning from relevant feedback, or both. To examine this, we analyzed feedback-locked ERPs elicited by the relevant and the irrelevant feedback dimension. To distinguish between ERPs elicited by words and colors, we separated the onsets of the two stimulus dimensions by first presenting the word in a neutral color, and after a delay, turning this neutral color into a color associated with positive or negative feedback. While this procedure has been shown to preserve the automatic encoding of the word (Glaser & Glaser, 1982), it allows for

distinguishing ERPs elicited by the word onset from ERPs elicited by the color onset.

We hypothesized that if irrelevant feedback triggers learning, this should become evident from ERPs elicited by the irrelevant word. For instance, the irrelevant feedback might evoke an FRN that is larger for words associated with negative feedback than for words associated with positive feedback, thus indicating reinforcement triggered by the irrelevant feedback. In contrast, if irrelevant feedback impairs learning from relevant feedback, the meaning of the word should affect ERPs elicited by the relevant color. Depending on whether reinforcement learning or feedback evaluation is affected, the FRN or the P300 following the relevant color should be enhanced when the word provided valid feedback but should be reduced when the word provided invalid feedback.