• Keine Ergebnisse gefunden

A Model of Basal Ganglia Functioning – the BD-DA Model 12

I. INTRODUCTION

I.2 Automatic Feedback Processing

I.2.1 The Reward Prediction Error and the Dopamine System

I.2.2.1 A Model of Basal Ganglia Functioning – the BD-DA Model 12

Learning is the acquisition and modification of knowledge and ultimately serves adaptive behavior. On a basic level, this boils down to selecting an action over another - for instance, turning left at a crossroad or ticking answer A instead of B in a multiple-choice test.

Action selection in vertebrates, especially on the motor level, is associated with the basal ganglia (Redgrave, Prescott, & Gurney, 1999; Stocco, Lebiere, & Anderson, 2010). The nuclei that constitute the basal ganglia are interconnected with each other (Parent & Hazrati, 1995b) and with different areas of the cerebral cortex, as well as the thalamus and the limbic system (Parent & Hazrati, 1995a).

In a basal ganglia model proposed by Michael Frank and colleagues (e.g., Frank &

Claus, 2006) , information about stimuli and the environment is transmitted from the cerebral cortex to the basal ganglia and activates representations of different actions, or ‘channels’ (see Figure 2 for an illustration of the model). Because only one action can be executed at a time, it is the function of the basal ganglia to resolve the resulting conflict between action channels

by further activating only the appropriate channel, while inhibiting competing actions. Frank and colleagues (M. X. Cohen & Frank, 2009; Frank, 2004; Frank et al., 2004) assume that this involves three pathways; a direct, an indirect, and a hyperdirect pathway. While the direct pathway acts to reinforce the representation of a certain action, the indirect pathway inhibits inappropriate action representations. Formulated more specifically in terms of the authors’

computational model, cortical input units transmit information about encountered stimuli to the premotor cortex and the striatum. This leads to a basic activation of candidate actions in the premotor cortex.

Meanwhile, the so-called Go and NoGo units in the striatum are also activated. Every candidate action is represented by activation of a specific Go and NoGo unit. While the striatal Go units can activate the thalamus via the direct pathway (involving only the globus pallidus interna, GPi), NoGo units inhibit representations of actions in the thalamus via the indirect pathway (involving not only the GPi, but also the globus pallidus externa, GPe).

Because the thalamus activates or inhibits action representations in the premotor cortex, an action is very likely to be implemented, when the respective Go unit is activated, whereas strong activation of the respective NoGo unit would prevent this action. Thereby, the

thalamus and the basal ganglia act as a gate that activates and thus facilitates the execution of a certain response, while inhibiting conflicting responses. Finally, the hyperdirect pathway involves the subthalamic nuclei and initially sends a NoGo signal to all action representations in the GPi and GPe, effectively suppressing premature responses due to random activity in these units (Frank, Samanta, et al., 2007).

Obviously, it is essential that a certain action is implemented in a certain situation, that is, that given a specific input the correct Go and NoGo units are activated. In order to achieve this, the DA signal from the SNpc is used in the striatum to adjust synaptic weights. Go units are associated with D1 receptors which have an excitatory effect on the unit when DA binds to them and when the unit itself is already activated, whereas NoGo units have D2 receptors which act inhibitory upon exposure to DA without additional prerequisites (Hikida, Kimura, Wada, Funabiki, & Nakanishi, 2010). While according to Frank, this further supports action selection, it has a more important effect on learning. When a selected action led to a reward, a DA burst will reach the striatum via the SNpc. Because both – the Go unit associated with the implemented action as well as the input unit – will still be activated, DA will further increase

the activation of this Go unit by docking to its D1 receptors, eventually increasing the

synaptic weight between both units6. The next time when this specific stimulus and situation

Figure 2. Simplified overview of the initial BG-DA model and the augmented model (gray background).

is presented, the input layer will activate this Go unit more strongly, which will increase the likelihood that the respective response will be selected over other actions and be

implemented. In contrast, if the response did not lead to a reward or even resulted in

punishment, a DA dip will reach the striatum. In addition to the fact that the Go unit will not be activated, the NoGo unit will be disinhibited. Due to the decrease in DA, less DA is binds to inhibiting D2 receptor and thus the NoGo unit is activated more strongly. Again, this

6The activation of the D1 receptor will not only result in increased activation of this unit which leads to a short-term change in behavior, but will also result in a second messenger cascade and thus structural changes that effectively result in long-term changes in connectivity.

results in strengthening of synaptic weights, but this time between the stimulus representation in the input layer and the NoGo unit for a specific response. Thereby, the stimulus will

activate the NoGo unit more strongly the next time it is encountered, effectively inhibiting the response. In brief, the temporal difference error coded by the dopamine signal result in

changes in the connectivity in the striatum that lead to progressively more adaptive responses to certain stimuli, while non-adaptive responses are inhibited and thus avoided.

I.2.2.2 The Reinforcement Learning Model by Holroyd & Coles (2002)

As already indicated by the name, the reinforcement learning model was designed as a framework for the neural basis of reinforcement learning, like the aforementioned BG-DA model. Its unique feature, however, is the integration and focus on the ACC, this brain regions strong link to the ERN and the FRN, and its function role in error processing.

As these components will be discussed in detail later, it is for now sufficient to note that the ACC receives projections from multiple brain regions and is believed to be involved in attention (Weissman, Gopalakrishnan, Hazlett, & Woldorff, 2005), error monitoring (e.g., Bush, Luu, & Posner, 2000) and conflict resolution (e.g., Botvinick, Braver, Barch, Carter, &

Cohen, 2001; D'Esposito et al., 1995; Posner & DiGirolamo, 1998). Holroyd and Coles (2002) propose that multiple, parallel motor controllers project to the ACC, which again is connected to the motor system. Motor controllers, in this model, correspond to different subsystems or brain regions (e.g., the OFC, the amygdala, the DLPFC, etc.) that direct the motor system to implement or avoid a certain behavior given a certain stimulus input from the sensory cortex. The ACC acts as filter and decides which of the possibly conflicting behaviors put forward by the different motor controllers are enacted. For instance, in an Eriksen Flanker task, a target letter is presented and accompanied by distractor letters and participants have to react to the target word. Target and distractor can be incongruent, i.e., the distractors suggest a different response (e.g., pressing the left button) than the target stimulus (e.g., pressing the right button). Put in terms of the model, in this situation, two different motor controllers demand for control over the motor system. This conflict is solved by the ACC that ideally enables the correct motor controller to take command.

An important point is how the ACC is trained to delegate motor control appropriately.

The RL model proposes an actor-critic architecture in which the basal ganglia, particularly the VTA, act as an adaptive critic that send a temporal difference error signal to the actor, which

is mainly the ACC, but also the motor controllers (Holroyd & Coles, 2002) . Consequently, learning not only occurs in the motor controllers, but also in the ACC, which can use this reinforcement learning signal to adaptively select the most appropriate motor controller, optimizing task performance. Furthermore, the authors suggest that an efference copy of the motor command might be immediately fed back to the adaptive critic, which would allow for an on the fly adaption of suboptimal behavior.

I.2.2.2.1 EEG correlates of ACC activity.

Initially, the RL model focused mainly on the effect of erroneous behavior. The mesencephalic dopamine signal sent to the ACC inhibits local pyramidal cells, whose apical dendrites in turn receive excitatory input from the motor controllers. As a consequence, the dopamine dip associated with an error leads to a disinhibition of these neurons if a motor controller is activated. Because of the simultaneous disinhibition and the parallel alignment of the pyramidal cells in the ACC, a strong negative scalp potential is generated at frontocentral scalp positions – the ERN or the FRN, respectively. In other words, the ERN reflects adaptive changes in neural activity in the ACC following internal feedback. When the correct response is obvious, motor behavior can be compared to an already established internal reference, which would make external feedback unnecessary – the cognitive system already ‘knows’ that the response was erroneous. In contrast, when the outcome of a behavior is still unclear, the system has to rely on external feedback to evaluate it and accordingly implement adaptive changes. In short, the ERN that follows an incorrect response reflects the evaluation of internal feedback, while the FRN that follows the presentation of performance information obviously reflects the evaluation of external feedback. Both, internal as well as external negative feedback signals indicate that an action resulted in an outcome that is worse than expected. Further, Holroyd and Coles (2002) were able to simulate in their computational model as well as show empirically that the dopaminergic signal underlying both ERP components “propagates back in time” (p. 682) as predicted by the method of temporal difference. External negative feedback after an erroneous response initially evokes a strong frontocentral negativity in the EEG, whereas the response itself does not. Over the course of a learning session the correct behavior is learned and the response itself is predictive of the outcome, whereas external feedback loses its informational value, as its valence is already predicted by the preceding response itself. Conversely, an incorrect response is immediately

followed by a negativity during later trials in the session, while at the same time the feedback negativity is strongly reduced (Holroyd & Coles, 2002).

Later refinements of the RL theory focused on the role of positive feedback (Holroyd

& Krigolson, 2007; Holroyd, Krigolson, & Lee, 2011a; Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Pakzad-Vaezi, Krigolson, & Holroyd, 2006). Although the initial argumentation that the FRN is generated by the disinhibition of apical dendrites in the ACC appears compelling at first glance, it is challenged by the existence of the N200, a negativity similar in timing and scalp maximum. If, by default, feedback was followed by a strong negativity in the typical FRN time window (200 - 400 ms), then the absence of an FRN after positive feedback could reflect a positive component overlapping this N200. This positivity could potentially be a manifestation of the positive prediction error. Comparing the individual amplitudes of the N200 in an oddball task and the FRN in an Eriksen flanker task for each participant, (Holroyd, Pakzad-Vaezi, et al., 2008) found that both components were not only strongly correlated, but that they also shared the same latency and scalp distribution as revealed by a principle component analysis (PCA). With respect to the finding that neutral stimuli also evoke an FRN (Holroyd, Hajcak, & Larsen, 2006), the authors proposed that “these findings suggest that events that fail to indicate that a task goal has been achieved (including the occurrence of both neutral and error feedback stimuli) elicit the N200” (Holroyd, Pakzad-Vaezi, et al., 2008), p. 695), whereas a positive component, the “feedback correct-related positivity” (fCRP), represents a reward signal from the midbrain dopamine system when a task goal has been met.

I.2.3 Summary

Processing of the temporal difference error is central for reinforcement learning. In the brain of vertebrates the TD error is conveyed by a DA signal. Both the BG-DA model and the RL theory of the FRN provide a framework for how the DA signal is utilized in the basal ganglia and the ACC, respectively, to adapt future behavior. Consequently, knowledge of the effect of certain manipulations on the size of the DA signal is vital for research on the

understanding of automatic feedback processing. As a correlate of the DA signal and thus the TD error, the FRN is therefore well suited for feedback research. However, as reinforcement learning is only one process that contributes to learning from feedback, it is important not to

neglect controlled processes and to integrate them with existing models. Further, it is

necessary on the one hand to understand whether and how the FRN is affected by controlled processes and on the other hand to identify a component that is more indicative of the quality of controlled feedback processes.

I.3 Controlled Feedback Processing