• Keine Ergebnisse gefunden

I. INTRODUCTION

I.2 Automatic Feedback Processing

I.2.1 The Reward Prediction Error and the Dopamine System

I.2.1.1 The Reward Prediction Error 6

The idea of ‘pleasurable’ and ‘discomfortable’ results was already refused by Skinner (1953)3 and later replaced by reward expectancy. Based on research on the blocking effect (Kamin, 1969), Rescorla and Wagner (1972) introduced the idea that learning depends not on the mere pairing of stimulus or action and reward, but on the extent to which a reinforcer was unpredicted and to which a stimulus or action is predictive of the reinforcer. As further

fleshed out in later theories (Schultz, 1998; R. S. Sutton & Barto, 1998), the degree to which a reward occurs unpredictably, the so-called prediction error, can inform a system to what extent associations between the cognitive presentations of an event and the reward should be adjusted. When a reward is initially presented in a novel situation, its occurrence is

unpredicted and thus the prediction error is large. Further, when in this situation a stimulus preceded this reward, the prediction error can be used to increase the association between said event and the reward by a certain degree. The consequence of the newly established

association is that the reoccurrence of the stimulus activates the representation of the reward – that is, the stimulus predicts the reward. The association, however, is initially not perfect and thus the reward is not fully predicted, which again leads to a prediction error, although to a lesser degree. Eventually, when stimulus and reward have been presented together

consistently, the stimulus predicts the reward perfectly, the resulting the prediction error is

2 Throughout this thesis, the term “dopaminergic” system will also be employed. Note that while the dopamine system refers to DA neurons themselves, “dopaminergic system” also refers to brain regions that are affected by DA.

3 “If we then go on to say that a stimulus is reinforcing because it is pleasant, what purports to be an explanation in terms of two effects is in reality a redundant description of one” (Skinner, 1953, p. 82, italics in the original).

zero, and the association is not altered in any way. However, it is obviously adaptive to have a learning system that implements changes in contingency – for instance, that an event might no longer be predictive of a reward. Under circumstances where the reward fails to occur after the stimulus, the prediction error becomes negative and the association between stimulus and reward is reduced.

The concept of the prediction error proved to be very productive and was is realized in neuronal network models as the so-called delta rule (Rumelhart, Hintont, & Williams, 1986;

R. S. Sutton & Barto, 1981; Widrow & Hoff, 1960) where it represents the difference between a desired and an actual output and is used to adjust synaptic weights between nodes in the network. Over time, it was further refined in reinforcement algorithms that make use of temporal difference error(R. S. Sutton & Barto, 1998) which also incorporates information about the exact timing of reinforcement, allowing for the integration of temporal discounting.

In this model, it is assumed that a CS, i.e., a stimulus that was associated with the reward before, can act as reinforcer itself and thereby instigate a prediction error. By this, “the effective reinforcement signal moves backward in time from the primary reinforcer to the earliest reinforcer-predicting stimulus” (Schultz, 1998, p. 13), a fact that will be important in the later discussion of correlates of the prediction error in this thesis (Holroyd & Coles, 2002).

Moreover, the temporal difference error plays an important role in neuronal approaches addressing the question of control and decision making, namely the actor-critic architectures (Barto, 1995; Witten, 1977). In these, a so-called ‘critic’ evaluates the choice of an ‘actor’ and sends reinforcement signals to it that serve to optimize the actor’s future choices. This

reinforcement signal results from the evaluation of the action that resulted from the actor’s choice. The signal takes the form of a temporal difference error, that is, the critic uses a representation of the expected outcome and determines whether the actual outcome of an action, as conveyed by feedback from the environment, is better or worse than expected.

Actor-critic architectures play a central role when it comes to the function of the basal ganglia and the dopaminergic system and have recently been extended to describe the interplay of several brain areas, namely the basal ganglia, the anterior cingulated cortex (ACC), the orbitofrontal cortex (OFC) and the dorsolateral prefrontal cortex (DLPFC; (Holroyd &

Yeung, 2011). But before these models are described in further detail, the following section will give a rough overview of the dopamine system and its connection to the prediction error.

While the prediction error can explain learning on an algorithmic level an important step was the discovery of how it is implemented on a neurobiological level. Schultz (Schultz, Apicella, & Ljungberg, 1993) pointed out the connection between the prediction error and the dopaminergic system and proposed in his seminal paper (Schultz, 1998) that a mesencephalic DA signal encodes the prediction error.

I.2.1.1The Anatomy of the Dopamine System

. Dopamine is neurotransmitter and neuromodulator in the central nervous system and early research on intracranial self-stimulation in rats (Olds & Milner, 1954) identified brain areas associated with DA as being involved in the processing of reward (for an overview, see (Berridge & Robinson, 1998; Wise & Rompré, 1989). Central for the reward system are nuclei of the midbrain – the ventral tegmental area (VTA) and the substantia nigra pars compacta (SNpc) – from which DA neurons project to other brain regions4. As depicted in Figure 1, there are three major projection pathways: the mesocortical DA system that involves DA neurons that originate from the VTA and end in the prefrontal cortex (PFC), the cingulate and the perirhinal cortex, while neurons of the mesolimbic system terminate in the nucleus accumbens (NAcc), but also innervate the hippocampus and the amygdala (for an overview, see (Arias-Carrión, Stamelou, Murillo-Rodríguez, Menéndez-González, & Pöppel, 2010;

Björklund & Dunnett, 2007). Finally, the nigrostriatal pathway projects from the SNpc to the basal ganglia, specifically the striatum. Electrical stimulation of mesencephalic neurons – and prominently, the NAcc – induces appetitive behavior in rats (e.g., Corbett & Wise, 1980;

Olds, 1958; Olds & Milner, 1954; Wise, 1981), unless DA antagonists, like haloperidol (Wauquier, Niemegeers, & Lal, 1974) or clozapine (Atrens, Ljungberg, & Ungerstedt, 1976), are administered, that abolish this behavior (see, also Gallistel, 1986; Nakajima & McKenzie, 1986). Moreover, drugs like cocaine (Phillips, Stuber, Heien, Wightman, & Carelli, 2003;

Wise, 1998) and amphetamines (Knutson et al., 2004), but also nicotine (Rice & Cragg, 2004) and indirectly opiates (Di Chiara, 1995; Di Chiara & Alan North, 1992) and cannaboids (Cheer, Wassum, Heien, Phillips, & Wightman, 2004; Freund, Katona, & Piomelli, 2003) target the reward pathways and act as DA agonists, leading to increased activity in

4 DA also plays a role in the hypothalamus (see, (Ben-Jonathan & Hnasko, 2001)), which, however, is irrelevant for this thesis.

related regions like the NAcc and the striatum (Breiter et al., 1997; Dixon et al., 2005;

Knutson et al., 2004; Völlm et al., 2004).

Figure 1. Overview of the basal ganglia and projections of dopamine neurons originating from the substantia nigra and the ventral tegmental area (yellow).

Finally, diseases like Parkinson’s disease (PD, Ehringer & Hornykiewicz, 1960) and schizophrenia (van Os & Kapur, 2009) that are marked by altered levels of DA, reportedly involve problems in reward-learning (for findings in PD patients, see Frank, Samanta, Moustafa, & Sherman, 2007; Frank et al., 2004; for schizophrenia patients, see Gold, Waltz,

Prentice, Morris, & Heerey, 2008; Waltz, Frank, Robinson, & Gold, 2007; Waltz, Frank, Wiecki, & Gold, 2010). Taken together, this indicates that DA is involved in reward-learning (see also Beninger, 1983; for an alternative perspective, however, see Redgrave & Gurney, 2006).

I.2.1.2 The Effect of Dopamine on the Neuronal Level

It is, however, crucial for the understanding of the dopaminergic system to be aware that DA is only the messenger, whereas its effect on postsynaptic neurons depends on the presence or absence of different DA receptors and usually involves other afferent neurons (e.g., Goldman-Rakic, Leranth, Williams, Mons, & Geffard, 1989; Smith, Bennett, Bolam, Parent, & Sadikot, 1994). All DA receptors are linked to second-messenger proteins that are released when DA binds to the receptor and instigate short-term and long-term changes in the respective neuron related to long-term potentiation (LTP) or long-term depression (LTD; e.g., Kerr & Wickens, 2001). Roughly speaking, DA receptors can be categorized into two classes according to their function, with D1 and D5 receptors eventually inducing phosphorylation, and D2, D3 and D4 receptors opposing it (Lachowicz & Sibley, 1997). Phosphorylation increases AMPA receptor activity and availability and by this temporarily increases the excitability of the neuron. In short, through phosphorylation, DA affects whether a neuron is activated more or less easily by an afferent neuron. Moreover, the cascade induced by DA can induce protein synthesis in both neurons leading to lasting changes in receptor availability and modulation of the volume of neurotransmitter excreted, which increases the connectivity between pre- and postsynaptic neurons (Kelleher, Govindarajan, & Tonegawa, 2004).

Speaking in terms of neuronal modeling, DA can change synaptic weights.

I.2.1.3 Reward Signaling by Dopamine Neurons

On the basis of this evidence and informed by prior research (Schultz, 1998; Schultz et al., 1997) devised an influential model of reward signaling by DA neurons. After the

presentation of an unexpected reward, mesencephalic neurons briefly increase their firing rate above an original baseline level, whereas omission of an expected reward or presentation of an aversive stimulus reduces the firing rate below the baseline. Importantly, this firing pattern is uninfluenced by the nature of the reward and its stimulus attributes, but depends on

previously learned information which is stored in the system (Schultz et al., 1993). Multiple studies have shown that the extent of a burst (Ljungberg, Apicella, & Schultz, 1992;

Mirenowicz & Schultz, 1994; Montague, Dayan, & Sejnowski, 1996) or a dip (Frank,

Moustafa, Haughey, Curran, & Hutchison, 2007; Hollerman & Schultz, 1998; Schultz, 2002;

Schultz et al., 1993) depends on the expectedness of the event, with larger bursts after the presentation of an unpredicted compared to a predicted reward, and greater dips after the omission of a strongly expected reward5. Schultz argues that by this dopaminergic neurons bidirectionally encode the neuronal equivalent of the reward-prediction error with the firing rate and the resulting DA response being determined by the difference between the actual and predicted reward. Furthermore, research showed that during conditioning the presentation of an unconditioned stimulus (US) initially does not evoke a DA neuron response (e.g., Schultz, 1998; Schultz et al., 1995). Later, however, when the US has become a conditioned stimulus (CS), the presentation of the reward no longer evokes a strong burst - instead, the

presentation of the CS does. Apparently, the prediction error “wandered back in time”, similar to the temporal difference error. However, if the reward was already fully predicted by a different CS, then the DA neuron will not become sensitive to the presence of the US and no learning will occur, similar to the well-established blocking effect (Schultz et al., 1993).

Moreover, the DA neurons show another property of the temporal difference error as they are sensitive to the timing of the CS onset and also exhibit a firing behavior reminiscent of temporal discounting: compared to an early reward (2 s), a later reward (4-16 s later) leads to a monotonically reduced burst intensity, despite having the same objective value (Kobayashi

& Schultz, 2008).

Taken together the dopaminergic neurons convey a temporal difference error to different areas of the brain by inducing a phasic increase in DA in the respective regions.

Mirroring actor-critic models, the midbrain acts in conjunction with neurons at the terminal sites as the critic which sends a teaching signal to neurons that represent the actor, inducing a change in synaptic weights, i.e., connectivity.

5 Recently, Matsumoto and Hikosaka (2009) argued that only a subgroup of DA neurons encode the prediction error and that a larger subset actually reacts to unpredictedness of an event, irrespective of feedback valence (see, however, Frank & Surmeier, 2009; Wang, Tsien, & Tanimoto, 2011)

I.2.2 Models of Automatic Feedback Processing

Schultz’ research on reward signaling by DA neurons contributed to the conception of several theories and models of which two models of basic feedback processing proved to be influential in current research: The reinforcement learning theory of the FRN (RL theory;

Holroyd & Coles, 2002) and the basal ganglia/dopamine model (BG-DA; Frank & Claus, 2006); Both provide an actor-critic framework of how dopamine-based reinforcement

learning affects behavioral adjustment over time. However, they differ in their focus: The RL theory is mostly concerned with the effect of DA on the actor, which is in this theory the anterior cingulate cortex (ACC), and two electrophysiological correlates of this effect, the error-related negativity or error negativity (ERN/Ne; Falkenstein, Hohnsbein, Hoormann, &

Blanke, 1990; Gehring, Goss, Coles, Meyer, & Donchin, 1993) and the feedback-related negativity (FRN; Miltner et al., 1997). The BG-DA model focuses more on the dopamine-mediated updating of reinforcement expectations in the critic, the basal ganglia, and the effect of psychopathology, neuronal diseases, and individual differences in learning from feedback.

Together, they proof to be central for the understanding of automatic feedback processing and thus will be presented in the following.