• Keine Ergebnisse gefunden

II. STUDIES

II.1 Study 1:Feedback-Related Brain Activity Predicts Learning from Feedback in

II.2.5 Discussion

The aim of the present study was to investigate how irrelevant and potentially invalid feedback influences learning in decision making. To this end, we measured feedback-related ERPs in a simple decision-making task in which participants could maximize their pay-off in a test phase by learning from feedback provided during a learning phase. Crucially, the feedback stimuli included irrelevant, potentially invalid feedback and relevant, valid

feedback. Irrelevant and relevant feedback was implemented by first presenting a color word (the irrelevant feedback) which then adopted a specific color (the relevant feedback). Using Stroop stimuli with delayed color onset allowed for distinguishing between ERPs elicited by relevant and irrelevant feedback while ensuring that the irrelevant feedback was still

automatically encoded (Glaser & Glaser, 1989).

As expected, we found performance in the test phase to be influenced by irrelevant feedback provided during the learning phase. Learning from feedback was significantly impaired when the irrelevant word provided invalid feedback as compared to when it provided valid feedback. This clearly demonstrates that irrelevant feedback can impair learning even under conditions where the learner is fully aware that a stimulus provides irrelevant feedback. In a further step, we analyzed feedback-locked ERPs to reveal the source of this effect. We hypothesized that learning could be impaired either because the presentation of the irrelevant color word causes erroneous learning from irrelevant feedback, or because irrelevant feedback impairs learning from relevant feedback.

We first analyzed feedback-locked ERPs following the irrelevant word. The unfiltered waveforms (Fig. 9AB) suggest that feedback-locked ERPs consist of two overlapping

waveforms: A more anterior FRN peaking at around 300 ms and a more posterior P300 ranging from about 150 ms to 400 ms. Figure 10CD show that we successfully separated these components by applying 4Hz-high pass filtering (for the FRN) and 4Hz-low pass filtering (for the P300). According to the reinforcement learning theory by Holroyd and Coles (2002), feedback-based reinforcement should be reflected by an increased FRN following negative feedback as compared to positive feedback. However, neither the analysis of mean

amplitudes nor peak-to-peak analysis revealed such a difference. Because the FRN is typically defined as a negativity that is larger for negative feedback than for positive feedback, we conclude that this component might rather reflect a feedback-related N2 – a component that is not related to reinforcement learning but overlaps with the FRN in feedback-locked ERPs (Baker & Holroyd, 2011; Holroyd et al., 2008). This suggests that automatic reinforcement induced by the irrelevant feedback stimulus seems not to be responsible for the effect of irrelevant feedback on learning. Furthermore, we also found no significant effect of valence on the feedback-locked P300. Given that such a valence effect was obtained in other studies (Bellebaum & Daum, 2008; Bellebaum et al., 2010; Ernst &

Steinhauser, 2012; Frank et al., 2005; Hajcak et al., 2007; Holroyd et al., 2008; Mathewson et al., 2008; Wu & Zhou, 2009; Zhou et al., 2010) as well as in our analysis of relevant

feedback, there is no reason to conclude that the word stimulus was processed as feedback in working memory.

In a next step, we analyzed feedback-locked ERPs following relevant feedback. Again, we successfully separated an FRN-like component and a feedback-locked P300 (Figs. 10CD).

However, similar to the irrelevant feedback, we obtained no valence effect for what appears to be the FRN. Although the unfiltered waveforms at electrode FCz suggest that there is a

difference between positive and negative feedback in the time range of the FRN, this effect seems to reflect the P300 rather than the FRN. Again, it seems that what we identified as the FRN is actually a feedback-related N2 that is unrelated to reinforcement learning (Baker &

Holroyd, 2011; Holroyd et al., 2008). The absence of an FRN is surprising and might reflect that no reinforcement learning took place in our paradigm. On the one hand, this could be due to the fact that our task required explicit learning rather than implicit, reinforcement-based learning. However, recent studies have found that even explicit learning tasks show an FRN but this FRN did not predict later performance (e.g., Butterfield & Mangels, 2003; Ernst &

Steinhauser, 2012). On the other hand, the presence of invalid feedback stimuli could have triggered a control mechanism that suppressed reinforcement learning. This raises important questions regarding the automatic nature of reinforcement learning which should be

investigated in future research.

In contrast to the FRN, we found a robust valence effect on the feedback-locked P300.

Moreover, the P300 to relevant feedback was reduced if the preceding irrelevant feedback was invalid. This suggests that the detrimental effects of irrelevant feedback on learning are

related to a reduced feedback-locked P300 to relevant feedback. Several accounts have been proposed to explain the mechanisms underlying the P300 in general and the P300 elicited by feedback stimuli. First, the P300 amplitude has frequently been shown to reflect the

expectedness of a stimulus with a smaller P300 obtained for more expected stimuli (for an overview, see Polich & Kok, 1995). In our task, the irrelevant word does not predict the valence of the relevant color stimulus. Even if participants erroneously generate the

expectation that the valence of the irrelevant word predicts the valence of the color, we would have obtained a larger P300 following invalid feedback, which is the opposite to what we have obtained in our data.

Second, another frequent explanation of the feedback-locked P300 is that its amplitude reflects the response of the locus coeruleus-norepinephrine system which varies according to the motivational significance of a stimulus (Nieuwenhuis, Aston-Jones, & Cohen, 2005).

Some authors applied this idea to explain either increased P300 amplitudes for positive feedback (Wu and Zhou, 2009; Zhou et al., 2010) or for negative feedback (Mathewson et al., 2008). Although increased motivational significance of positive feedback provides a potential explanation for the valence effect on the feedback-locked P300 in the present study, it is unclear how this account could explain a reduced P300 on invalid feedback trials. It seems unlikely that an invalid word decreases the motivational significance of the subsequent color feedback.

Instead, we propose that the feedback-locked P300 reflects a feedback-based evaluation of the initial choice which supports explicit learning from feedback (Ernst &

Steinhauser, 2012). To efficiently learn from feedback, participants have to keep the chosen stimulus in working memory and to integrate this with information about feedback valence.

The feedback-locked P300 could reflect a decision process by which the initial choice is evaluated (for a similar idea in the context of error detection, see Steinhauser & Yeung, 2010). The presence of invalid feedback might influence this decision by activating the invalid feedback category in working memory. Whereas a valid word would facilitate feedback evaluation, an invalid word would interfere with feedback evaluation and this interfering effect could be reflected by a reduced feedback-locked P300. This interference effect bears some resemblance with the well-known Stroop effect (Stroop, 1935; MacLeod, 1991), that is, the observation that an incongruent color word delays the naming of a color.

Although the Stroop effect has typically been explained as reflecting response conflict (e.g., J.

D. Cohen, Dunbar, & McClelland, 1990), a portion of the Stroop effect could be attributed to interference on the level of stimulus encoding (De Hower, 2003; Steinhauser & Hübner, 2009). It might be this interference that impairs the identification of the relevant feedback and the proper evaluation of the initial choice in the present task19.

Taken together, the present results demonstrate that ambiguous feedback can impair learning even if the learner is fully aware about which feedback stimulus is valid and which is potentially invalid. However, this phenomenon is not due to learning from irrelevant

feedback, e.g., by means of automatic reinforcement. Our data suggest that it rather reflects an interfering effect of irrelevant feedback on the processing of relevant feedback. As a

consequence, these results have two important implications for research on the neural and cognitive bases of feedback processing and learning from feedback: A first implication is that even if a feedback-like stimulus is automatically encoded, it is not necessarily capable of triggering automatic reinforcement. In the present paradigm, we even failed to find an FRN to relevant feedback suggesting that the presence of invalid feedback leads to a suppression of reinforcement. A second implication is that working-memory-based processes are crucially involved in learning from feedback in decision-making. Whereas literature on learning from feedback is currently strongly dominated by studies focusing on reinforcement and the FRN, the P300 but not the FRN indicated learning decrements in the present study (see also Chase et al., 2010). Future research is needed, which specifies the exact mechanism that underlies the role of working memory in feedback processing .

19 Note that although the naming of incongruent Stroop stimuli has also been shown to imply decreased P300 amplitudes (e.g., Ila & Polich, 1999; Shen, 2006), this does not necessarily reflect the same process as the P300 effect in the present paradigm. Whereas the P300 in a Stroop task is presumably related to response selection, the valence effect suggests that the P300 in the present study reflects feedback-based evaluation of the initial choice.

II.3 Study 3: The effect of feedback validity and feedback