Bernie C. Till - Learning, ERN, FRN and N250

Smith, A C, Frank, L M, Wirth, S, Yanike, M, Hu, D, Kubota, Y, Graybiel, A M, Suzuki, W A, & Brown, E N, 2004: Dynamic Analysis of Learning in Behavioral Experiments. J. Neurosci., 24(2):447-461.

Understanding how an animal's ability to learn relates to neural activity or is altered by lesions, different attentional states, pharmacological interventions, or genetic manipulations are central questions in neuroscience. Although learning is a dynamic process, current analyses do not use dynamic estimation methods, require many trials across many animals to establish the occurrence of learning, and provide no consensus as how best to identify when learning has occurred. We develop a state-space model paradigm to characterize learning as the probability of a correct response as a function of trial number (learning curve). We compute the learning curve and its confidence intervals using a state-space smoothing algorithm and define the learning trial as the first trial on which there is reasonable certainty (>0.95) that a subject performs better than chance for the balance of the experiment. For a range of simulated learning experiments, the smoothing algorithm estimated learning curves with smaller mean integrated squared error and identified the learning trials with greater reliability than commonly used methods. The smoothing algorithm tracked easily the rapid learning of a monkey during a single session of an association learning experiment and identified learning 2 to 4 days earlier than accepted criteria for a rat in a 47 day procedural learning experiment. Our state-space paradigm estimates learning curves for single animals, gives a precise definition of learning, and suggests a coherent statistical framework for the design and analysis of learning experiments that could reduce the number of animals and trials per animal that these studies require.

Petrov, A A, Dosher, B A, & Lu, Z-L, 2005: The Dynamics of Perceptual Learning: An Incremental Reweighting Model. Psychological Review, 112(4):715-743.

The mechanisms of perceptual learning are analyzed theoretically, probed in an orientation -discrimination experiment involving a novel nonstationary context manipulation, and instantiated in a detailed computational model. Two hypotheses are examined: modification of early cortical representations versus task-specific selective reweighting. Representation modification seems neither functionally necessary nor implied by the available psychophysical and physiological evidence. Computer simulations and mathematical analyses demonstrate the functional and empirical adequacy of selective reweighting as a perceptual learning mechanism. The stimulus images are processed by standard orientation- and frequency-tuned representational units, divisively normalized. Learning occurs only in the "read-out" connections to a decision unit; the stimulus representations never change. An incremental Hebbian rule tracks the task-dependent predictive value of each unit, thereby improving the signal-to-noise ratio of their weighted combination. Each abrupt change in the environmental statistics induces a switch cost in the learning curves as the system temporarily works with suboptimal weights.

Santana, E, Barros, A K, & Freire, R C S, 2007: On The Time Constant Under General Error Criterion. IEEE Sig. Proc. Lett., 14(8):533-536.

Time constant along with misadjustment offers a manner of analyzing the convergence behavior of adaptive algorithms. In particular, there are some advantages of using nonlinear functions of the error instead of linear ones to have enhanced convergence behavior. However, some equations for the time constant suggested in the literature are noise dependent, yielding an infinite value for the noiseless case, which is obviously wrong. This problem may explain the fact that no works compared the time constants theoretically found to those derived in practice. In this letter, we derive a new time constant which depends on both the inputs and the noise. The results show that the found equation conforms to practical results.

Error Detection and Reward Expectancy: ERN and FRN

Holroyd, C B, Pakzad-Vaezi, K L, & Krigolson, O E, 2008: The feedback correct-related positivity: Sensitivity of the event-related brain potential to unexpected positive feedback. Psychophysiology, in press.

The N200 and the feedback error-related negativity (fERN) are two components of the event-related brain potential (ERP) that share similar scalp distributions, time courses, morphologies, and functional dependencies, which raises the question as to whether they are actually the same phenomenon. To investigate this issue, we recorded the ERP from participants engaged in two tasks that independently elicited the N200 and fERN. Our results indicate that they are, in fact, the same ERP component and further suggest that positive feedback elicits a positive-going deflection in the time range of the fERN. Taken together, these results indicate that negative feedback elicits a common N200 and that modulation of fERN amplitude results from the superposition on correct trials of a positive-going deflection that we term the feedback correct-related positivity.

Holroyd, C B, & Coles, M G H, 2008: Dorsal anterior cingulate cortex integrates reinforcement history to guide voluntary behavior. Cortex, 44(5):548-559.

Two competing types of theory have been proposed about the function of dorsal anterior cingulate cortex (dACC): evaluative theories hold that dACC monitors ongoing behavior to detect errors or conflict, whereas response selection theories hold that dACC is directly involved in the decision making process. In particular, one response selection theory proposes that dACC utilizes reward prediction error signals carried by the midbrain dopamine system to decide which of several competing motor control systems should be given control over the motor system (Holroyd and Coles, 2002). The theory further proposes that the impact of these dopamine signals on dACC determines the amplitude of a component of the event-related brain potential called the error-related negativity (ERN). In the present study, we applied this theory to a decision making problem that requires participants to select between two response options in which an erroneous choice is not clearly defined. Rather, the reward received for a particular response evolves in relation to the individual's previous behavior. We adapted a computational model associated with the theory to simulate human performance and the ERN in the task, and tested the predictions of the model against empirical ERP data. Our results indicate that ERN amplitude reflects the subjective value attributed by each participant to their response options as derived from their recent reward history. This finding is consistent with the position that dACC integrates the recent history of reinforcements to guide voluntary choice behavior, as opposed to evaluating behaviors per se.

Bellebaum, C, & Daum, I, 2008: Learning-related changes in reward expectancy are reflected in the feedback-related negativity. Eur. J. Neurosci, 27(7):1823-1835.

The feedback-related negativity (FRN) has been hypothesized to be linked to reward-based learning. While many studies have shown that the FRN only occurs in response to unexpected negative outcomes, the relationship between the magnitude of negative prediction errors and FRN amplitude remains a matter of debate. The present study aimed to elucidate this relationship with a new behavioural procedure that allowed subjects to predict precise reward probabilities by learning an explicit rule. Insight into the rule did not only influence subjects' choice behaviour, but also outcome-related event -related potentials. After subjects had learned the rule, the FRN amplitude difference between non -reward and reward mirrored the magnitude of the negative prediction error, i.e. it was larger for less likely negative outcomes. Source analysis linked this effect to the anterior cingulate cortex. P300 amplitude was also modulated by outcome valence and expectancy. It was larger for positive and unexpected outcomes. It remains to be clarified, however, whether the P300 reflects a positive prediction error.

Holroyd, C B, & Krigolson, O E, 2007: Reward prediction error signals associated with a modified time estimation task. Psychophysiology, 44(6):913-917.

The feedback error-related negativity (fERN) is a component of the human event-related brain potential (ERP) elicited by feedback stimuli. A recent theory holds that the fERN indexes a reward prediction error signal associated with the adaptive modification of behavior. Here we present behavioral and ERP data recorded from participants engaged in a modified time estimation task. As predicted by the theory, our results indicate that fERN amplitude reflects a reward prediction error signal and that the size of this error signal is correlated across participants with changes in task performance.

Mars, R B, Coles, M G H, Grol, M J, Holroyd, C B, Nieuwenhuis, S, Hulstijn, W, & Toni, I, 2005: Neural dynamics of error processing in medial frontal cortex. NeuroImage, 28:1007-1013.

Adaptive behavior requires an organism to evaluate the outcome of its actions, such that future behavior can be adjusted accordingly and the appropriate response selected. During associative learning, the time at which such evaluative information is available changes as learning progresses, from the delivery of performance feedback early in learning to the execution of the response itself during learned performance. Here, we report a learning-dependent shift in the timing of activation in the rostral cingulate zone of the anterior cingulate cortex from external error feedback to internal error detection. This pattern of activity is seen only in the anterior cingulate, not in the presupplementary motor area. The dynamics of these reciprocal changes are consistent with the claim that the rostral cingulate zone is involved in response selection on the basis of the expected outcome of an action. Specifically, these data illustrate how the anterior cingulate receives evaluative information, indicating that an action has not produced the desired result.

Nieuwenhuis, S, Slagter, H A, Alting von Geusau, N J, Dirk J. Heslenfeld, D J & Holroyd, C B, 2005: Knowing good from bad: differential activation of human cortical areas by positive and negative outcomes. Eur. J. Neurosci., 21(11):3161-3168.

Previous research has identified a component of the event-related brain potential (ERP), the feedback -related negativity, that is elicited by feedback stimuli associated with unfavourable outcomes. In the present research we used event-related functional magnetic resonance imaging (fMRI) and electroencephalographic (EEG) recordings to test the common hypothesis that this component is generated in the caudal anterior cingulate cortex. The EEG results indicated that our paradigm, a time estimation task with trial-to-trial performance feedback, elicited a large feedback-related negativity ( FRN). Nevertheless, the fMRI results did not reveal any area in the caudal anterior cingulate cortex that was differentially activated by positive and negative performance feedback, casting doubt on the notion that the FRN is generated in this brain region. In contrast, we found a number of brain areas outside the posterior medial frontal cortex that were activated more strongly by positive feedback than by negative feedback. These included areas in the rostral anterior cingulate cortex, posterior cingulate cortex, right superior frontal gyrus, and striatum. An anatomically constrained source model assuming equivalent dipole generators in the rostral anterior cingulate, posterior cingulate, and right superior frontal gyrus produced a simulated scalp distribution that corresponded closely to the observed scalp distribution of the FRN. These results support a new hypothesis regarding the neural generators of the FRN, and have important implications for the use of this component as an electrophysiological index of performance monitoring and reward processing.

Holroyd, C B, Yeung, N, Coles, M G H, & Cohen, J D, 2005: A Mechanism for Error Detection in Speeded Response Time Tasks. J. Experim. Psych.: Gen., 134(2):163-191.

The concept of error detection plays a central role in theories of executive control. In this article, the authors present a mechanism that can rapidly detect errors in speeded response time tasks. This error monitor assigns values to the output of cognitive processes involved in stimulus categorization and response generation and detects errors by identifying states of the system associated with negative value. The mechanism is formalized in a computational model based on a recent theoretical framework for understanding error processing in humans (C. B. Holroyd & M. G. H. Coles, 2002). The model is used to simulate behavioral and event-related brain potential data in a speeded response time task, and the results of the simulation are compared with empirical data.

Gehring, W J, & Willoughby, A R, 2002: The medial frontal cortex and the rapid processing of monetary gains and losses. Science, 295(5563):2279-2282.

We report the observation of neural processing that occurs within 265 milliseconds after outcome stimuli that inform human participants about gains and losses in a gambling task. A negative-polarity event -related brain potential, probably generated by a medial-frontal region in or near the anterior cingulate cortex, was greater in amplitude when a participant's choice between two alternatives resulted in a loss than when it resulted in a gain. The sensitivity to losses was not simply a reflection of detecting an error; gains did not elicit the medial-frontal activity when the alternative choice would have yielded a greater gain, and losses elicited the activity even when the alternative choice would have yielded a greater loss. Choices made after losses were riskier and were associated with greater loss-related activity than choices made after gains. It follows that medial-frontal computations may contribute to mental states that participate in higher level decisions, including economic choices.

Gehring, W J, Goss, B, Coles, M G H, Meyer, D E, & Donchin, E, 1993: A Neural System for Error Detection and Compensation. Psychological Science, 4(6):385-390.

Humans can monitor actions and compensate for errors. Analysis of the human event-related brain potentials (ERPs) accompanying errors provides evidence for a neural process whose activity is specifically associated with monitoring and compensating for erroneous behavior. This error-related activity is enhanced when subjects strive for accurate performance but is diminished when response speed is emphasized at the expense of accuracy. The activity is also related to attempts to compensate for the erroneous behavior.

Holroyd, C B, & Coles, M G H, 2002: The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109(4):679-709.

The authors present a unified account of 2 neural systems concerned with the development and expression of adaptive behaviors: a mesencephalic dopamine system for reinforcement learning and a "generic" error-processing system associated with the anterior cingulate cortex. The existence of the error-processing system has been inferred from the error-related negativity (ERN), a component of the event-related brain potential elicited when human participants commit errors in reaction-time tasks. The authors propose that the ERN is generated when a negative reinforcement learning signal is conveyed to the anterior cingulate cortex via the mesencephalic dopamine system and that this signal is used by the anterior cingulate cortex to modify performance on the task at hand. They provide support for this proposal using both computational modeling and psychophysiological experimentation.

Nieuwenhuis, S, Ridderinkhof, K R, Talsma, D, Coles, M G H, Holroyd, C B, Kok, A, &, 2002: A computational account of altered error processing in older age: Dopamine and the error-related negativity. Cogn. Affec. Behav. Neurosci., 2(1):19-36.

When participants commit errors or receive feedback signaling that they have made an error, a negative brain potential is elicited. According to Holroyd and Coles's (in press) neurocomputational model of error processing, this error-related negativity (ERN) is elicited when the brain first detects that the consequences of an action are worse than expected. To study age-related changes in error processing, we obtained performance and ERN measures of younger and high-functioning older adults. Experiment 1 demonstrated reduced ERN amplitudes in older adults in the context of otherwise intact brain potentials. This result could not be attributed to uncertainty about the required response in older adults. Experiment 2 revealed impaired performance and reduced response- and feedback-related ERNs of older adults in a probabilistic learning task. These age changes could be simulated by manipulation of a single parameter of the neurocomputational model, this manipulation corresponding to weakened phasic activity of the mesencephalic dopamine system.

Frank, M J, D'Lauro, C, & Curran, T, 2007: Cross-task individual differences in error processing: neural, electrophysiological, and genetic components. Cogn Affect Behav Neurosci., 7(4):297-308.

The error-related negativity (ERN) and error positivity (Pe) are electrophysiological markers of error processing thought to originate in the medial frontal cortex. Previous studies using probabilistic reinforcement showed that individuals who learn more from negative than from positive feedback (negative learners) had larger ERNs than did positive learners. These findings support the dopamine ( DA) reinforcement-learning hypothesis of the ERN and associated computational models. However, it remains unclear (1) to what extent these effects generalize to tasks outside the restricted probabilistic reinforcement-learning domain and (2) whether there is a dopaminergic source of these effects. To address these issues, we tested subjects' reinforcement-learning biases behaviorally and recorded EEG during an unrelated recognition memory experiment. Initial recognition responses were speeded, but the subjects were subsequently allowed to self-correct their responses. We found that negative learners, as assessed via probabilistic learning, had larger ERNs in the recognition memory task, suggestive of a common underlying enhanced error-processing mechanism. Negative learners also had enhanced Pes when self-correcting errors than did positive learners. Moreover, the ERN and Pe components contributed independently to negative learning. We also tested for a dopaminergic genetic basis of these ERP components. We analyzed the COMT val/met polymorphism, which has been linked to frontal DA levels. The COMT genotype affected Pe (but not ERN) magnitude; met/met homozygotes showed enhanced Pes to self-corrected errors, as compared with val carriers. These results are consistent with a role for the Pe and frontal monoamines in error awareness.

Ridderinkhof, K R, Nieuwenhuis, S, & Braver, T S, 2007: Medial frontal cortex function: An introduction and overview. Cogn. Affect. Behav. Neurosci., 7(4):297-308.

The growing attention being given to medial frontal cortex (MFC) in cognitive neuroscience studies has fostered a number of theoretical and paradigmatic perspectives that diverge in important ways. This has led to a great deal of research fractionation, with investigators studying domains and issues in MFC function that sometimes bear (at least at the surface) little relation to the questions addressed by others studying the same brain region. The present issue of Cognitive, Affective, & Behavioral Neuroscience presents articles inspired by a conference bringing together views from across this diversity of research, highlighting both the richness and vibrancy of the field and the challenges to be faced in terms of integration, synthesis, and precision among the theoretical accounts. The present article presents a brief introduction, overview, and road map to the field and to the special issue devoted to MFC function.

Cohen, M X, Elger, C E, & Ranganath, C, 2007: Reward Expectation Modulates Feedback-Related Negativity and EEG Spectra. Neuroimage, 35(2):968-978.

The ability to evaluate outcomes of previous decisions is critical to adaptive decision-making. The feedback-related negativity (FRN) is an event-related potential (ERP) modulation that distinguishes losses from wins, but little is known about the effects of outcome probability on these ERP responses. Further, little is known about the frequency characteristics of feedback processing, for example, event -related oscillations and phase synchronizations. Here, we report an EEG experiment designed to address these issues. Subjects engaged in a probabilistic reinforcement learning task in which we manipulated, across blocks, the probability of winning and losing to each of two possible decision options. Behaviorally, all subjects quickly adapted their decision-making to maximize rewards. ERP analyses revealed that the probability of reward modulated neural responses to wins, but not to losses. This was seen both across blocks as well as within blocks, as learning progressed. Frequency decomposition via complex wavelets revealed that EEG responses to losses, compared to wins, were associated with enhanced power and phase coherence in the theta frequency band. As in the ERP analyses, power and phase coherence values following wins but not losses were modulated by reward probability. Some findings between ERP and frequency analyses diverged, suggesting that these analytic approaches provide complementary insights into neural processing. These findings suggest that the neural mechanisms of feedback processing may differ between wins and losses.

Cohen, M X, & Ranganath, C, 2007: Reinforcement Learning Signals Predict Future Decisions. J. Neurosci., 27(2):371-378.

Optimal behavior in a competitive world requires the flexibility to adapt decision strategies based on recent outcomes. In the present study, we tested the hypothesis that this flexibility emerges through a reinforcement learning process, in which reward prediction errors are used dynamically to adjust representations of decision options. We recorded event-related brain potentials (ERPs) while subjects played a strategic economic game against a computer opponent to evaluate how neural responses to outcomes related to subsequent decision-making. Analyses of ERP data focused on the feedback-related negativity (FRN), an outcome-locked potential thought to reflect a neural prediction error signal. Consistent with predictions of a computational reinforcement learning model, we found that the magnitude of ERPs after losing to the computer opponent predicted whether subjects would change decision behavior on the subsequent trial. Furthermore, FRNs to decision outcomes were disproportionately larger over the motor cortex contralateral to the response hand that was used to make the decision. These findings provide novel evidence that humans engage a reinforcement learning process to adjust representations of competing decision options.

Fletcher, P C, Anderson, J M, Shanks, D R, Honey, R, Carpenter, T A, Donovan, T, Papadakis, N, & Bullmore, E T, 2001: Responses of human frontal cortex to surprising events are predicted by formal associative learning theory. Nat. Neurosci., 4(10):1043-1048.

Learning depends on surprise and is not engendered by predictable occurrences. In this functional magnetic resonance imaging (fMRI) study of causal associative learning, we show that dorsolateral prefrontal cortex (DLPFC) is associated specifically with the adjustment of inferential learning on the basis of unpredictability. At the outset, when all associations were unpredictable, DLPFC activation was maximal. This response attenuated with learning but, subsequently, activation here was evoked by surprise violations of the learned association. Furthermore, the magnitude of DLPFC response to a surprise event was sensitive to the relationship that had been learned and was predictive of subsequent behavioral change. In short, the physiological response properties of right DLPFC satisfied specific predictions made by associative learning theory.

Miltner, W H R, Braun, C H, & Coles, M G H, 1997: Event-related brain potentials following incorrect feedback in a time-estimation task: Evidence for a "generic" neural system for error detection. J. Cognitive Neurosci., 9(6):788-798.

Examines the scalp-recorded event-related potentials (ERP) following feedback stimuli in a time -estimation task. Characteristics of error-related negativity; ERP response to feedback; Equivalent dipole analysis; Evaluation of errors in reaction time task.

Frank, M J, 2004: Dynamic Dopamine Modulation of Striato-Cortical Circuits in Cognition: Converging Neuropsychological, Psychopharmacological and Computational Studies, Ph.D. Thesis, University of Colorado.

How do we produce complex motor sequences? To what extent do we learn from the positive versus negative consequences of our decisions? How do we maintain task-relevant information in working memory while ignoring distracting information? This dissertation provides a mechanistic framework that explores how these seemingly unrelated processes recruit remarkably similar neural circuits linking the basal ganglia (BG) with frontal cortex. Drawing from neuroanatomical and biochemical considerations, this framework suggests that the BG facilitate or suppress cortical "actions" (e.g., motor responses and working memory updating) via separate Go and NoGo pathways projecting to frontal cortex, and that the relative balance of these pathways is dynamically modulated by dopamine (DA). Transient DA bursts and dips during positive and negative reinforcement support Go and NoGo learning via D1 and D2 receptors, respectively. Computational neural network models instantiate key biological properties and provide insight into the underlying role of BG/DA interactions during the learning and execution of cognitive tasks. These models account for complex medication-dependent cognitive deficits in Parkinson's disease, and make simple predictions for the underlying source of these deficits, emphasizing the importance of the dynamic range of DA signals. These predictions have been subsequently confirmed in medicated and non-medicated Parkinson's patients and in healthy individuals under pharmacologically-induced DA manipulation. In all of these studies, elevated levels of phasic DA release led to greater Go learning from positive outcomes of decisions, whereas diminished DA levels led to better NoGo learning to avoid negative outcomes. Tonic DA stimulation led to more overall Go responding. These effects extended to higher level cognitive function: tonic DA stimulation led to more overall working memory updating and concomitant distractibility, whereas enhanced phasic DA release led to greater selective updating for task-relevant (i.e., "positively-valenced") information, but difficulty in ignoring this information in a subsequent set-shift. Drug effects also interacted with baseline working memory span. Taken together, these results provide substantial support for a unified account of the role of DA in modulating cognitive processes that depend on the basal ganglia.

ERN and Theta Phase Resetting

Trujillo, L T, & Allen, J J, 2007: Theta EEG dynamics of the error-related negativity. Clin. Neurophysiol., 118(3):645-668.

Objective: The error-related negativity (ERN) is a response-locked brain potential (ERP) occurring 80 -100 ms following response errors. This report contrasts three views of the genesis of the ERN, testing the classic view that time-locked phasic bursts give rise to the ERN against the view that the ERN arises from a pure phase-resetting of ongoing theta (4-7 Hz) EEG activity and the view that the ERN is generated - at least in part - by a phase-resetting and amplitude enhancement of ongoing theta EEG activity.
Methods: Time-domain ERP analyses were augmented with time-frequency investigations of phase -locked and non-phase-locked spectral power, and inter-trial phase coherence (ITPC) computed from individual EEG trials, examining time courses and scalp topographies. Simulations based on the assumptions of the classic, pure phase-resetting, and phase-resetting plus enhancement views, using parameters from each subject's empirical data, were used to contrast the time-frequency findings that could be expected if one or more of these hypotheses adequately modeled the data.

Results: Error responses produced larger amplitude activity than correct responses in time-domain ERPs immediately following responses, as expected. Time-frequency analyses revealed that significant error-related post-response increases in total spectral power (phase- and non-phase-locked), phase -locked power, and ITPC were primarily restricted to the theta range, with this effect located over midfrontocentral sites, with a temporal distribution from �150-200 ms prior to the button press and persisting up to 400 ms post-button press. The increase in non-phase-locked power (total power minus phase-locked power) was larger than phase-locked power, indicating that the bulk of the theta event -related dynamics were not phase-locked to response. Results of the simulations revealed a good fit for data simulated according to the phase-locking with amplitude enhancement perspective, and a poor fit for data simulated according to the classic view and the pure phase-resetting view.
Conclusions: Error responses produce not only phase-locked increases in theta EEG activity, but also increases in non-phase-locked theta, both of which share a similar topography.
Significance: The findings are thus consistent with the notion advanced by Luu et al. [Luu P, Tucker DM, Makeig S. Frontal midline theta and the error-related negativity; neurophysiological mechanisms of action regulation. Clin Neurophysiol 2004;115:1821-35] that the ERN emerges, at least in part, from a phase-resetting and phase-locking of ongoing theta-band activity, in the context of a general increase in theta power following errors.

Yeung, N, Bogacz, R, Holroyd, C B, Nieuwenhuis, S, & Cohen, J D, 2007: Theta phase resetting and the error-related negativity. Psychophysiology, 44(1):39-49.

It has been proposed that the error-related negativity (ERN) is generated by phase resetting of theta -band EEG oscillations. The present research evaluates a set of analysis methods that have recently been used to provide evidence for this hypothesis. To evaluate these methods, we apply each of them to two simulated data sets: one set that includes theta phase resetting and a second that comprises phasic peaks embedded in EEG noise. The results indicate that the analysis methods do not effectively distinguish between the two simulated data sets. In particular, the simulated data set constructed from phasic peaks, though containing no synchronization of ongoing EEG activity, demonstrates properties previously interpreted as supporting the synchronized oscillation account of the ERN. These findings suggest that the proposed analysis methods cannot provide unambiguous evidence that the ERN is generated by phase resetting of ongoing oscillations.

Luu, P, & Tucker, D M, 2001: Regulating action: Alternating activation of midline frontal and motor cortical networks. Clin. Neurophysiol., 112(7):1295-1306.

Objectives: Focal electrical fields recorded over the midline prefrontal cortex have been found to index rapid evaluative decisions, including the recognition of having made an error in a speeded response task. The nature of these electrical fields and how they are related to cortical areas involved in response execution remains to be clarified.
Methods: As subjects performed a speeded response task the EEG was recorded with a 128-channel sensor array. By filtering out the large slow waves of the event-related potential, we found that the error-related negativity (Ne/ERN) arises from a midline frontal oscillation that alternates with oscillations over lateral sensorimotor cortex. Electrical source analyses were used to determine the brain sources involved in the generation of these oscillations.
Results: The results show that the midline and lateral oscillations have a period of about 200 ms (theta), and they are present for both correct and error responses. When an error is made, the midline error oscillation is recruited strongly, and it becomes correlated with the motor oscillation. Source analyses localized the midline error oscillation to centromedial frontal cortex and the lateral oscillation to sensorimotor cortices.
Conclusions: Because of the similarity between the midline oscillation observed in the present study and frontal midline theta, the nature of the Ne/ERN may be clarified by the frontal midline theta literature. The correlation between the midline and sensorimotor oscillations suggests a possible mechanism for how midline frontal evaluative and monitoring networks contribute to action regulation.

Luu, P, Tucker, D M, & Makeig, S, 2004: Frontal midline theta and the error-related negativity: Neurophysiological mechanisms of action regulation. Clin. Neurophysiol., 115(8):1821-1835.

Objective: The error-related negativity (ERN) is an event-related potential (ERP) peak occurring between 50 and 100 ms after the commission of a speeded motor response that the subject immediately realizes to be in error. The ERN is believed to index brain processes that monitor action outcomes. Our previous analyses of ERP and EEG data suggested that the ERN is dominated by partial phase-locking of intermittent theta-band EEG activity. In this paper, this possibility is further evaluated.
Methods: The possibility that the ERN is produced by phase-locking of theta-band EEG activity was examined by analyzing the single-trial EEG traces from a forced-choice speeded response paradigm before and after applying theta-band (4-7 Hz) filtering and by comparing the averaged and single-trial phase-locked (ERP) and non-phase-locked (other) EEG data. Electrical source analyses were used to estimate the brain sources involved in the generation of the ERN.
Results: Beginning just before incorrect button presses in a speeded choice response paradigm, midfrontal theta-band activity increased in amplitude and became partially and transiently phase-locked to the subject's motor response, accounting for 57% of ERN peak amplitude. The portion of the theta -EEG activity increase remaining after subtracting the response-locked ERP from each trial was larger and longer lasting after error responses than after correct responses, extending on average 400 ms beyond the ERN peak. Multiple equivalent-dipole source analysis suggested 3 possible equivalent dipole sources of the theta-bandpassed ERN, while the scalp distribution of non-phase-locked theta amplitude suggested the presence of additional frontal theta-EEG sources.
Conclusions: These results appear consistent with a body of research that demonstrates a relationship between limbic theta activity and action regulation, including error monitoring and learning.

Yeung, N, Bogacz, R, Holroyd, C B, & Cohen, J D, 2004: Detection of synchronized oscillations in the electroencephalogram: An evaluation of methods. Psychophysiology, 41(6):822-832.

The signal averaging approach typically used in ERP research assumes that peaks in ERP waveforms reflect neural activity that is uncorrelated with activity in the ongoing EEG. However, this assumption has been challenged by research suggesting that ERP peaks reflect event-related synchronization of ongoing EEG oscillations. In this study, we investigated the validity of a set of methods that have been used to demonstrate that particular ERP peaks result from synchronized EEG oscillations. We simulated epochs of EEG data by superimposing phasic peaks on noise characterized by the power spectrum of the EEG. When applied to the simulated data, the methods in question produced results that have previously been interpreted as evidence of synchronized oscillations, even though no such synchrony was present. These findings suggest that proposed analysis methods may not effectively disambiguate competing views of ERP generation.

Yeung, N, Botvinick, M M, & Cohen, J D, 2004: The neural basis of error detection: Con?ict monitoring and the error-related negativity. Psychological Review, 111(4):931-959.

According to a recent theory, anterior cingulate cortex is sensitive to response conflict, the coactivation of mutually incompatible responses. The present research develops this theory to provide a new account of the error-related negativity (ERN), a scalp potential observed following errors. Connectionist simulations of response conflict in an attentional task demonstrated that the ERN-its timing and sensitivity to task parameters-can be explained in terms of the conflict theory. A new experiment confirmed predictions of this theory regarding the ERN and a second scalp potential, the N2, that is proposed to reflect conflict monitoring on correct response trials. Further analysis of the simulation data indicated that errors can be detected reliably on the basis of post-error conflict. It is concluded that the ERN can be explained in terms of response conflict and that monitoring for conflict may provide a simple mechanism for detecting errors.

Mesencephalic Dopamine Modulation and the Basal Ganglia

Ridderinkhof, K R, Ullsperger, M, Crone, E A, & Nieuwenhuis, S, 2004: The role of the medial frontal cortex in cognitive control. Science, 306(5695):443-447.

Adaptive goal-directed behavior involves monitoring of ongoing actions and performance outcomes, and subsequent adjustments of behavior and learning. We evaluate new findings in cognitive neuroscience concerning cortical interactions that subserve the recruitment and implementation of such cognitive control. A review of primate and human studies, along with a meta-analysis of the human functional neuroimaging literature, suggest that the detection of unfavorable outcomes, response errors, response conflict, and decision uncertainty elicits largely overlapping clusters of activation foci in an extensive part of the posterior medial frontal cortex (pMFC). A direct link is delineated between activity in this area and subsequent adjustments in performance. Emerging evidence points to functional interactions between the pMFC and the lateral prefrontal cortex (LPFC), so that monitoring-related pMFC activity serves as a signal that engages regulatory processes in the LPFC to implement performance adjustments.

Holroyd, C B, Nieuwenhuis, S, Mars, R B, & Coles, M G H, 2004: Anterior cingulate cortex, selection for action, and error processing. ed: Posner, M I, Cognitive Neuroscience of Attention, 219-231, Guilford Press, New York.

The concepts of "attention to action" and "selection for action" (Allport, 1987) refer to how particular cognitive intentions and sensory inputs are selected and coupled with the effector system for the control of action production. A central role in this process has been attributed to the anterior cingulate cortex (ACC). According to this view, the ACC contributes to executive or strategic aspects of motor control by allowing only particular sources of information access to the output system. More specifically, the ACC appears to be involved in selecting actions or action plans that are consistent with task goals, that is, to transform intentions into actions. This proposition has been supported by converging evidence from a broad array of empirical techniques, including functional neuroimaging, neuroanatomical, neurophysiological, intracranial stimulation, and lesion studies in humans and animals. This research has indicated that a caudal/dorsal area of the ACC appears to be involved in the cognitive control of motor behavior. Consistent with this role, the ACC is also thought to be involved in error processing. This position holds that the ACC is sensitive to incorrect or inappropriate behaviors, and suggests that one aspect of the ACC control function involves bringing erroneous behaviors in line with desired goals. The motivation for this proposal is due primarily to observations of the error-related negativity (ERN), a component of the event-related brain potential (ERP) associated with error commission, which appears to be generated in the ACC. In this chapter, we review these ERN studies and discuss what insights they have provided into ACC function. We begin by describing the initial investigations that demonstrated that ERN is produced by an error processing system. Then, we review studies that suggested that the ERN is generated in the ACC and, thus, that the ACC is involved in error processing. We next present a recent theory that holds that the ERN is produced by the impact of reinforcement learning signals conveyed by the mesencephalic dopamine system on the ACC, and that the ACC uses that information to improve performance on the task at hand. Lastly, we provide empirical support for this theory.

Luu, P, & Pederson, S M, 2004: The anterior cingulate cortex: regulating actions in context. ed: Posner, M I, Cognitive Neuroscience of Attention, 232-244, Guilford Press, New York.

There are many theories of ACC function. Common to most theories is the belief that the ACC is engaged when rapid changes in behaviors are required. That is, contributions from the ACC are required when ongoing actions are inadequate or do not match up with current demands. The concept of attention for action is used to describe the cognitive processes that are engaged under situations that require control, although cognitive control may not be implemented by the ACC. The results from ERP studies of action monitoring reviewed in this chapter reveal that this concept is still appropriate for describing ACC functions, particularly because it emphasizes the role of action in cognition. It is likely that the ACC has evolved to regulate behaviors such that they are adaptive to sudden changes in the environment and should be important to early stages of learning. Indeed, in animal models of associative learning, the ACC is a critical component of a network responsible for rapid association between a stimulus and a required response.

Klein, T A, Neumann, J, Reuter, M, Hennig, J, von Cramon, D Y, & Ullsperger, M, 2007: Genetically Determined Differences in Learning from Errors. Science, 318(5856):1642-1645.

The role of dopamine in monitoring negative action outcomes and feedback-based learning was tested in a neuroimaging study in humans grouped according to the dopamine D2 receptor gene polymorphism DRD2-TAQ-IA. In a probabilistic learning task, A1-allele carriers with reduced dopamine D2 receptor densities learned to avoid actions with negative consequences less efficiently. Their posterior medial frontal cortex (pMFC), involved in feedback monitoring, responded less to negative feedback than others' did. Dynamically changing interactions between pMFC and hippocampus found to underlie feedback -based learning were reduced in A1-allele carriers. This demonstrates that learning from errors requires dopaminergic signaling. Dopamine D2 receptor reduction seems to decrease sensitivity to negative action consequences, which may explain an increased risk of developing addictive behaviors in A1-allele carriers.

Smith, A J, Becker, S, & Kapur, S, 2005: A Computational Model of the Functional Role of the Ventral-Striatal D2 Receptor in the Expression of Previously Acquired Behaviors. Neural Computation, 17(2):361-395.

The functional role of dopamine has attracted a great deal of interest ever since it was empirically discovered that dopamine-blocking drugs could be used to treat psychosis. Specifically, the D2 receptor and its expression in the ventral striatum have emerged as pivotal in our understanding of the complex role of the neuromodulator in schizophrenia, reward, and motivation. Our departure from the ubiquitous temporal difference (TD) model of dopamine neuron firing allows us to account for a range of experimental evidence suggesting that ventral striatal dopamine D2 receptor manipulation selectively modulates motivated behavior for distal versus proximal outcomes. Whether an internal model or the TD approach (or a mixture) is better suited to a comprehensive exposition of tonic and phasic dopamine will have important implications for our understanding of reward, motivation, schizophrenia, and impulsivity. We also use the model to help unite some of the leading cognitive hypotheses of dopamine function under a computational umbrella. We have used the model ourselves to stimulate and focus new rounds of experimental research.

Allman, J M, Hakeem, A, Erwin, J M, Nimchinsky, E, & Hof, P, 2001: The Anterior Cingulate Cortex: The Evolution of an Interface between Emotion and Cognition. Ann. N.Y. Acad. Sci., 935:107-117.

We propose that the anterior cingulate cortex is a specialization of neocortex rather than a more primitive stage of cortical evolution. Functions central to intelligent behavior, that is, emotional self -control, focused problem solving, error recognition, and adaptive response to changing conditions, are juxtaposed with the emotions in this structure. Evidence of an important role for the anterior cingulate cortex in these functions has accumulated through single-neuron recording, electrical stimulation, EEG, PET, fMRI, and lesion studies. The anterior cingulate cortex contains a class of spindle-shaped neurons that are found only in humans and the great apes, and thus are a recent evolutionary specialization probably related to these functions. The spindle cells appear to be widely connected with diverse parts of the brain and may have a role in the coordination that would be essential in developing the capacity to focus on difficult problems. Furthermore, they emerge postnatally and their survival may be enhanced or reduced by environmental conditions of enrichment or stress, thus potentially influencing adult competence or dysfunction in emotional self-control and problem-solving capacity.

O'Connell, R G, Dockree, P M, Bellgrove, M A, Kelly, S P, Hester, R, Garavan, H, Robertson, I H, & Foxe, J J, 2007: The role of cingulate cortex in the detection of errors with and without awareness: a high-density electrical mapping study. Eur. J. Neurosci., 25(8):2571-2579.

Error-processing research has demonstrated that the brain uses a specialized neural network to detect errors during task performance but the brain regions necessary for conscious awareness of an error are poorly understood. In the present study we show that two well-known error-related event-related potential (ERP) components, the error-related negativity (ERN) and error positivity (Pe) have a differential relationship with awareness during performance of a manual response inhibition task optimized to examine error awareness. While the ERN was unaffected by the participants' conscious experience of errors, the Pe was only seen when participants were aware of committing an error. Source localization of these components indicated that the ERN was generated by a caudal region of the anterior cingulate cortex (ACC) while the Pe was associated with contributions from a more anterior ACC region and the posterior cingulate-precuneus. Tonic EEG measures of cortical arousal were correlated with individual rates of error awareness and showed a specific relationship with the amplitude of the Pe. The latter finding is consistent with evidence that the Pe represents a P3-like facilitation of information processing modulated by subcortical arousal systems. Our data suggest that the ACC might participate in both preconscious and conscious error detection and that cortical arousal provides a necessary setting condition for error awareness. These findings may be particularly important in the context of clinical studies in which a proper understanding of self-monitoring deficits requires an explicit measurement of error awareness.

Averbeck, B B, & Seo, M, 2008: The Statistical Neuroanatomy of Frontal Networks in the Macaque. PLoS Computational Biology, 4(4):e1000050.

We were interested in gaining insight into the functional properties of frontal networks based upon their anatomical inputs. We took a neuroinformatics approach, carrying out maximum likelihood hierarchical cluster analysis on 25 frontal cortical areas based upon their anatomical connections, with 68 input areas representing exterosensory, chemosensory, motor, limbic, and other frontal inputs. The analysis revealed a set of statistically robust clusters. We used these clusters to divide the frontal areas into 5 groups, including ventral-lateral, ventral-medial, dorsal-medial, dorsal-lateral, and caudal-orbital groups. Each of these groups was defined by a unique set of inputs. This organization provides insight into the differential roles of each group of areas and suggests a gradient by which orbital and ventral-medial areas may be responsible for decision-making processes based on emotion and primary reinforcers, and lateral frontal areas are more involved in integrating affective and rational information into a common framework.

Postuma, R B, & Dagher, A, 2006: Basal Ganglia Functional Connectivity Based on a Meta-Analysis of 126 Positron Emission Tomography and Functional Magnetic Resonance Imaging Publications. Cerebral Cortex, 16(10):1508-1521.

The striatum receives projections from the entire cerebral cortex. Different, but not mutually exclusive, models of corticostriatal connectivity have been proposed, including connectivity based on proximity, parallel loops, and a model of a tripartite division of the striatum into motor, associative, and limbic areas. All these models were largely based on studies of anatomic connectivity in nonhuman mammals and lesion studies in animals and humans. Functional neuroimaging has the potential to discern patterns of functional connectivity in humans in vivo. We analyzed the functional connectivity between the cortex and the striatum in a meta-analysis of 126 published functional neuroimaging studies. We mapped the peak activations listed in each publication into stereotaxic space and used standard functional imaging statistical methods to determine which cortical areas were most likely to coactivate with different parts of the striatum. The patterns of functional connectivity between the cortex and the different striatal nuclei are broadly consistent with the predictions of the parallel loop model. The rostrocaudal and dorsoventral patterns of corticostriatal functional connectivity are consistent with the tripartite division of the striatum into motor, associative, and limbic zones.

Lawrence, A D, Sahakian, B J, & Robbins, T W, 1998: Cognitive functions and corticostratal circuits: insights from Huntington's disease. Trends Cogn. Sci., 2(10):379-388.

The basic mechanisms of information processing by corticostriatal circuits are currently a matter of intense debate amongst cognitive scientists. Huntington's disease, an autosomal-dominant neurogenetic disorder characterized clinically by a triad of motor, cognitive, and affective disturbance, is associated with neuronal loss within corticostriatal circuits, and as such provides a valuable model for understanding the role of these circuits in normal behaviour, and their disruption in disease. We review findings from our studies of the breakdown of cognition in Huntington's disease, with a particular emphasis on executive functions and visual recognition memory. We show that Huntington's disease patients exhibit a neuropsychological profile that shows a discernible pattern of progression with advancing disease, and appears to result from a breakdown in the mechanisms of response selection. These findings are consistent with recent computational models that suggest that corticostriatal circuits compute the patterns of sensory input and response output which are of behavioural significance within a particular environmental context.

Koski, L, & Paus, T, 2000: Functional connectivity of the anterior cingulate cortex within the human frontal lobe: a brain-mapping metaanalysis. Exp. Brain Res., 133(1):55-65.

A database of positron-emission-tomography studies published between January 1993 and November 1996 was created to address several questions regarding the function and connectivity of the human anterior cingulate cortex (ACC). Using this database, we have previously reported on the relationship between behavioural variables and the probability of blood-flow response in distinct subdivisions of the ACC. The goal of the current analysis was to discover which areas of the frontal cortex show increased blood-flow co-occurring consistently with increased blood-flow in the ACC. Analyses of the frequency distributions of peaks in the ACC and the remaining frontal cortex (FC) yielded several important findings. First, FC peaks in the precentral gyrus, superior frontal gyrus, middle frontal gyrus, inferior frontal gyrus, medial frontal gyrus and orbitomedial frontal gyri were more frequent in subtractions that also yielded a peak in the ACC than in those that did not yield an ACC peak. Second, regional differences in the frequency distribution of these FC peaks were observed when the ACC peaks were subdivided into the rostral versus caudal ACC and supracallosal versus subcallosal ACC. Peaks in the precentral gyrus and in the vicinity of the supplementary motor area were more prevalent in subtractions with co -occurring peaks in the caudal than with the rostral ACC. Peaks in the middle frontal gyrus were more frequent in subtractions with co-occurring peaks in the paralimbic part of the supracallosal ACC, relative to the subcallosal or limbic supracallosal ACC. These observations are consistent with known differences in the anatomic connectivity in these cortical regions, as defined in non-human primates. Further analyses of the influence of behavioural variables on the relationships between the ACC and other regions of the frontal cortex suggested that this type of meta-analysis may provide testable hypotheses about functional and effective connectivity within the human frontal lobe.

Chang, C, Crottaz-Herbette, S, & Menon, V, 2007: Temporal dynamics of basal ganglia response and connectivity during verbal working memory. NeuroImage, 34:1253-1269.

Research on the neural basis of working memory (WM) has generally focused on neocortical regions; comparatively little is known about the role of subcortical structures. There is growing evidence that the basal ganglia are involved in WM, but their contribution to different component processes of WM is poorly understood. We examined the temporal dynamics of basal ganglia response and connectivity during the encoding, maintenance and response phases of a Sternberg WM task. During the encoding and maintenance phases, WM-loaddependent activation was observed in the left anterior caudate, anterior putamen and globus pallidus; activation in the right anterior caudate was observed only during the maintenance phase. During the response phase, the basal ganglia were equally active in both the high-load and low-load WM conditions. Caudate and putamen activations were primarily localized to the (rostral) associative parts of the basal ganglia, consistent with the putative role of these regions in cognitive processing. Effective connectivity analyses revealed increased WM-load-dependent interaction of the left anterior caudate with the left posterior parietal cortex during all three phases of the task; with the visual association cortex, including the fusiform gyrus and inferior temporal gyrus, only during the encoding phase; with the ventrolateral prefrontal cortex during the encoding and maintenance phases; with the pre-supplementary motor area during the maintenance and response phases; and with the dorsolateral prefrontal and anterior cingulate cortices only during the response phase. Taken together with known neuroanatomy of the basal ganglia, these results suggest that the anterior caudate helps to link signals in distinct functional networks during different phases of the WMtask. Our study offers new insight into the integrative and adaptive role of the basal ganglia in higher cognitive function.

Makin, J G, & Abate, A, 2007: A Neural Hybrid-System Model of the Basal Ganglia, Technical Report UCB/EECS-2007-16, Electrical Engineering and Computer Sciences Department, University of California at Berkeley.

The basal ganglia (BG) are a set of functionally related and structurally interconnected nuclei in the human brain which form part of a closed loop between cortex and thalamus, receiving input from the former and outputting to the latter. The BG have been implicated in motor control and cognitive switching tasks; in particular, it is believed that the BG function as a controller for motor tasks by selectively disinhibiting appropriate portions of the thalamus and hence activating, via a feedback loop, cortical regions. These switching behaviors are perforce discrete, whereas the underlying dynamics of neuron voltages and neurotransmitter levels are continuous-time, continuousstate phenomena. To this end, we propose and simulate a hybrid automaton for modeling individual neurons that affords explicit representation of voltage discharges and discrete outputs along with continuous voltage dynamics within a single, elegant model; and which is amenable both to the construction of large networks - in particular the cortico-basalthalamic loops - and to analysis on such networks.

Narayanan, S, 2003: The role of cortico-basal-thalamic loops in cognition: a computational model and preliminary results. Neurocomputing, 53-54:605-614.

Clinical and experimental research over the last decade has implicated neuroanatomic loops connecting the frontal cortex to the basal ganglia and thalamus in various aspects of planning and memory. We report on computational model whose central aspects are: (1) a model of cortical-striatal-thalamic loops in planning and executive control, and (2) a fine-grained model of basal-ganglia function that exploits specific component connectivity and dynamics. The model is biologically plausible given current literature on the neurophysiology and disease pathology of the relevant brain regions. Specifically, our model has implications for subjects with diseases affecting the relevant brain regions (Parkinson's disease and Huntington's disease).

Eppinger, B, Kray, J, Mock, B, & Mecklinger, A, 2008: Better or worse than expected? Aging, learning, and the ERN. Neuropsychologia, 46(2):521-539.

This study examined age differences in error processing and reinforcement learning. We were interested in whether the electrophysiological correlates of error processing, the error-related negativity (ERN) and the feedback-related negativity (FRN), reflect learning-related changes in younger and older adults. To do so, we applied a probabilistic learning task in which we manipulated the validity of feedback. The results of our study showed that learning-related changes were much more pronounced (a) in a response-locked positivity for correct trials compared to the ERN and (b) in a feedback-locked positivity for positive feedback compared to the FRN. These findings provide an important extension to recent theoretical accounts [Holroyd, C. B., & Coles, M. G. H. (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709; Nieuwenhuis, S., Ridderinkhof, K. R., Talsma, D., Coles, M. G. H., Holroyd, C. B., Kok, A., et al. (2002). A computational account of altered error processing in older age: Dopamine and the error -related negativity. Cognitive, Affective and Behavioral Neuroscience, 2, 19-36] since they suggest that positive learning signals on correct trials contribute to the reward-related variance in the response- and feedback-locked ERPs. This effect has been overlooked in previous studies that have focused on the role of errors and negative feedback for learning. Importantly, we did not find evidence for an age-related reduction of the ERN, when controlling for performance differences between age groups, which questions the view that older adults are generally impaired in error processing. Finally, we observed a substantial reduction of the FRN in the elderly, which indicates that older adults are less affected by negative feedback and rely more on positive feedback during learning. This finding points to an age-related asymmetry in the processing of feedback valence.

Suri, R E, Bargas, J, & Arbib, M A, 2001: Modeling functions of striatal dopamine modulation in learning and planning Neuroscience, 103(1):65-85.

The activity of midbrain dopamine neurons is strikingly similar to the reward prediction error of temporal difference reinforcement learning models. Experimental evidence and simulation studies suggest that dopamine neuron activity serves as an effective reinforcement signal for learning of sensorimotor associations in striatal matrisomes. In the current study, we simulate dopamine neuron activity with the extended temporal difference model of Pavlovian learning and examine the influences of this signal on medium spiny neurons in striatal matrisomes. The modeled influences include transient membrane effects of dopamine D1 receptor activation, dopamine-dependent long-term adaptations of corticostriatal transmission, and effects of dopamine on rhythmic fluctuations of the membrane potential between an elevated "up-state" and a hyperpolarized "down-state". The most dominant activity in the striatal matrisomes is assumed to elicit behaviors via projections from the basal ganglia to the thalamus and the cortex. This "standard model" performs successfully when tested for sensorimotor learning and goal -directed behavior (planning). To investigate the contributions of our model assumptions to learning and planning, we test the performance of several model variants that lack one of these mechanisms. These simulations show that the adaptation of the dopamine-like signal is necessary for sensorimotor learning and planning. Sensorimotor learning requires dopamine-dependent long-term adaptation of corticostriatal transmission. Lack of dopamine-like novelty responses decreases the number of exploratory acts, which impairs planning capabilities. The model loses its planning capabilities if the dopamine-like signal is simulated with the original temporal difference model, because the original temporal difference model does not form novel associative chains. Transient membrane effects of the dopamine-like signal on striatal firing substantially shorten the reaction time in the planning task. The capability for planning is improved by influences of dopamine on the durations of membrane potential fluctuations and by manipulations that prolong the reaction time of the model. These results suggest that responses of dopamine neurons to conditioned stimuli contribute to sensorimotor reward learning, novelty responses of dopamine neurons stimulate exploration, and transient dopamine membrane effects are important for planning.

Khamassi, M, Lach�ze, L, Girard, B, Berthoz, A, & Guillot, A: Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats. Adaptive Behavior, 13(2):131-148.

Since 1995, numerous Actor-Critic architectures for reinforcement learning have been proposed as models of dopamine-like reinforcement learning mechanisms in the rat's basal ganglia. However, these models were usually tested in different tasks, and it is then difficult to compare their efficiency for an autonomous animat. We present here the comparison of four architectures in an animat as it performs the same reward-seeking task. This will illustrate the consequences of different hypotheses about the management of different Actor submodules and Critic units, and their more or less autonomously determined coordination. We show that the classical method of coordination of modules by mixture of experts, depending on each module's performance, did not allow solving the task. Then we address the question of which principle should be applied to efficiently combine these units. Improvements for Critic modeling and accuracy of Actor-critic models for a natural task are finally discussed in the perspective of our Psikharpax project - an artificial rat having to survive autonomously in unpredictable environments.

Attention Capture and Target Detection: Right Occipito-temporal N250

Pierce, L, Krigolson, O, Tanaka, J, & Holroyd, C, 2008: Reinforcement learning and the acquisition of perceptual expertise in ERPs (abstract). Journal of Vision, 8(6):475.

In a category learning task, people are initially unaware when they have committed an error and therefore, require corrective feedback to modify their category decisions. Once the categories are learned, however, external feedback is no longer necessary. Electrophysiologically, the two phases of category learning are indicated by different event-related brain potentials (ERPs): the feedback ERN that is elicited following the presentation of negative feedback and the response ERN that is generated following an incorrect response. In a study of perceptual categorization, participants were asked to discriminate between very similar families of novel geometric shapes (blobs). Participants who learned the perceptual categories (i.e., expert learners) demonstrated a shift from a feedback ERN to the response ERN. The expert learners also showed an enhanced N250 response to the blob families - a component that is thought to index subordinate level representations. For the experts, the buildup of the N250 component was correlated with the shift in the ERN. In contrast, participants who were unable to learn the object families (i.e., novice learners ) failed to show a shift in their feedback-to-response ERN nor did they show an increased N250. Collectively, these results suggest that accompanying the acquisition of the subordinate categories, there is a change from an external source of error monitoring to an internal source.

D'Lauro, C, Tanaka, J W, & Curran, T, 2008: The preferred level of face categorization depends on discriminability. Psychonomic Bulletin & Review, 15(3):623-629.

People usually categorize objects more quickly at the basic level (e.g., "dog") than at the subordinate (e .g., "collie") or superordinate (e.g., "animal") levels. Notable exceptions to this rule include objects of expertise, faces, or atypical objects (e.g., "penguin," "poodle"), all of which show faster than normal subordinate-level categorization. We hypothesize that the subordinate-level reaction time advantage for faces is influenced by their discriminability relative to other faces in the stimulus set. First, we replicated the subordinate-level advantage for faces (Experiment 1) and then showed that a basic-level advantage for faces can be elicited by increasing the perceptual similarity of the face stimuli, making discrimination more difficult (Experiment 2). Finally, we repeated both effects within subjects, showing that individual faces were slower to be categorized in the context of similar faces and more quickly categorized among diverse faces (Experiment 3).

Scott, L S, Tanaka, J W, Sheinberg, D L, & Curran, T, 2008: The role of category learning in the acquisition and retention of perceptual expertise: A behavioral and neurophysiological study. Brain Research, 1210:204-215.

This study examined the neural mechanisms underlying perceptual categorization and expertise. Participants were either exposed to or learned to classify three categories of cars (sedans, SUVs, antiques) at either the basic or subordinate level. Event-Related Potentials (ERPs) as well as accuracy and reaction time were recorded before, immediately after, and 1-week after training. Behavioral results showed that only subordinate-level training led to better discrimination of trained cars, and this ability was retained a week after training. ERPs showed an equivalent increase in the N170 across all three training conditions whereas the N250 was only enhanced in response to subordinate-level training. The behavioral and electrophysiological results distinguish category learning at the subordinate level from category learning occurring at the basic level or from simple exposure. Together with data from previous investigations, the current results suggest that subordinate-level training, but not basic-level or exposure training, leads to expert-like improvements in categorization accuracy. These improvements are mirrored by changes in the N250 rather than the N170 component, and these effects persist at least a week after training, so are conceivably related to long-term learning processes supporting perceptual expertise.

Scott, L S, Tanaka, J W, Sheinberg, D L, & Curran, T, 2006: A reevaluation of the electrophysiological correlates of expert object processing. J. Cognitive Neurosci., 18(9):1453-1465.

Subordinate-level object processing is regarded as a hallmark of perceptual expepertise. However, the relative contribution of subordinate- and basic-level category experience in the acquisition on of perceptual expertise has not been clearly delineated. In this study, participants learned to classify wading birds and owls at either the basic (e.g., wading bird, owl) or the subordinate (e.g., egret, snow owl) level. After 6 days of training, behavioral results showed that subordinate-level but not basic-level training improved subordinate discrimination of trained exemplars, novel exemplars, and exemplars from novel species. Event-related potentials indicated that both basic- and subordinate- level training enhanced the early N170 component, but only subordinate-level training amplified the later N250 component. These results are consistent with models positing separate basic and subordinate learning mechanisms, and, contrary to perspectives attempting to explain visual expertise solely in terms of subordinate-level processing, suggest that expertise enhances neural responses of both basic and subordinate processing.

Tanaka, J W, Curran, T, Porterfield, A L, & Collins, D, 2006: Activation of preexisting and acquired face representations: the N250 event-related potential as an index of face familiarity. J. Cognitive Neurosci., 18(9):1488-1497.

Electrophysiological studies using event-related potentials have demonstrated that face stimuli elicit a greater negative brain potential in right posterior recording sites 170 msec after stimulus onset (N170) relative to nonface stimuli. Results from repetition priming paradigms have shown that repeated exposures of familiar faces elicit a larger negative brainwave (N250r) at inferior temporal sites compared to repetitions of unfamiliar faces. However, less is known about the time course and learning conditions under which the N250 face representation is acquired. In the familiarization phase of the Joe/no Joe task, subjects studied a target "Joe" face ("Jane" for female subjects) and, during the course of the experiment, identified a series of sequentially presented faces as either Joe or not Joe. The critical stimulus conditions included the subject's own face, a same-sex Joe (Jane) face and a same-sex "other" face. The main finding was that the subject's own face produced a focal negative deflection (N250) in posterior channels relative to nontarget faces. The task-relevant Joe target face was not differentiated from other nontarget faces in the first half of the experiment. However, in the second half, the Joe face produced an N250 response that was similar in magnitude to the own face. These findings suggest that the N250 indexes two types of face memories: a preexperimentally familiar face representation (i.e., the "own face") and a newly acquired face representation (i.e., the Joe/Jane face) that was formed during the course of the experiment.

Tanaka, J W, Curran, T, & Sheinberg, D L, 2005: The training and transfer of real-world perceptual expertise. Psychol. Sci., 16(2):145-151.

A hallmark of perceptual expertise is that experts classify objects at a more specific, subordinate level of abstraction than novices. To what extent does subordinate-level learning contribute to the transfer of perceptual expertise to novel exemplars and novel categories? In this study, participants learned to classify 10 varieties of wading birds and 10 varieties of owls at either the subordinate, species (e.g., ''great blue crown heron,'' ''eastern screech owl'') or the family (''wading bird,'' ''owl'') level of abstraction. During training, the amount of visual exposure was equated such that participants received an equal number of learning trials for wading birds and owls. Pre- and post-training performance was measured in a same/different discrimination task in which participants judged whether pairs of bird stimuli belonged to the same or different species. Participants trained in species-level discrimination demonstrated greater transfer to novel exemplars and novel species categories than participants trained in family-level discrimination. These findings suggest that perceptual categorization, not perceptual exposure per se, is important for the development and generalization of visual expertise.

Curran, T, Tanaka, J W, & Weiskopf, D M, 2002: An electrophysiological comparison of visual categorization and recognition memory. Cogn. Affect. Behav. Neurosci., 2(1):1-18.

Object categorization emphasizes the similarities that bind exemplars into categories, whereas recognition memory emphasizes the specific identification of previously encountered exemplars. Mathematical modeling has highlighted similarities in the computational requirements of these tasks, but neuropsychological research has suggested that categorization and recognition may depend on separate brain systems. Following training with families of novel visual shapes (blobs), event-related brain potentials (ERPs) were recorded during both categorization and recognition tasks. ERPs related to early visual processing (N1, 156-200 msec) were sensitive to category membership. Middle latency ERPs (FN400 effects, 300-500 msec) were sensitive to both category membership and old/new differences. Later ERPs (parietal effects, 400-800 msec) were primarily affected by old/new differences. Thus, there was a temporal transition so that earlier processes were more sensitive to categorical discrimination and later processes were more sensitive to recognition-related discrimination. Aspects of these results are consistent with both mathematical modeling and neuropsychological perspectives.

Tanaka, J W, & Curran, T, 2001: A neural basis for expert object recognition. Psychological Science, 12(1):43-47.

Although most adults are considered experts in the identification of faces, fewer people specialize in the recognition of other objects, such as birds and dogs. In thia research the neurophysiological processes associated with expert bird and dog recognition were investigated using event-related potentials. An enhanced early negative component (N170, 164 ms) was found when bird and dog experts categorized objects in their domain of expertise relative to when they categorized objects outside their domain of expertise. This finding indicates that objects from well-learned categories are neurologically different from objects from lesser-known categories at a relatively early stage of visual processing.

Tanaka, J W, Luu, P, Weisbrod, M, & Kiefer, M, 1999: Tracking the timecourse of object categorization using event-related potentials. NeuroReport, 10(4):829-835.

Object categorization processes were investigated by measuring event-related potentials while subjects categorized objects at the superordinate (e.g. animal), basic (e.g. dog) and subordinate (e.g. beagle) levels of abstraction. An enhanced negative deflection (N1) was found at posterior recording sites for subordinate level categorizations compared with basic level categorizations and was interpreted as a marker of increased visual analysis. In contrast, superordinate level categorizations produced a larger frontal negativity relative to basic level categorizations and were interpreted as an indicator of increased semantic processing. These results suggest a neurophysiological basis for the separate cognitive processes responsible for subordinate and superordinate object categorizations.

Tanaka, J W, & Taylor, M, 1991: Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 23(3):457-482.

Classic research on conceptual hierarchies has shown that the interaction between the human perceiver and objects in the environment specifies one level of abstraction for categorizing objects, called the basic level, which plays a primary role in cognition. The question of whether the special psychological status of the basic level can be modified by experience was addressed in three experiments comparing the performance of subjects in expert and novice domains. The main findings were that in the domain of expertise (a) subordinate-level categories were as differentiated as the basic-level categories, (b) subordinate-level names were used as frequently as basic-level names for identifying objects, and (c) subordinate-level categorizations were as fast as basic-level categorizations. Taken together, these results demonstrate that individual differences in domain-specific knowledge affect the extent that the basic level is central to categorization.

Miyakoshi, M, Nomura, M, & Ohira, H, 2007: An ERP study on self-relevant object recognition. Brain and Cognition, 63(2):182-189.

We performed an event-related potential study to investigate the self-relevance effect in object recognition. Three stimulus categories were prepared: SELF (participant's own objects), FAMILIAR (disposable and public objects, defined as objects with less-self-relevant familiarity), and UNFAMILIAR (others' objects). The participants' task was to watch the stimuli passively. Results showed that left -lateralized N250 activity differentiated SELF and FAMILIAR from UNFAMILIAR, but SELF and FAMILIAR were not differentiated. In the later time-course, SELF was dissociated from FAMILIAR, indicating the self-relevance effect in object recognition at this stage. This activity did not show consistent lateralization, in contrast to previous studies reporting right lateralization in self-relevant face and name recognition. We concluded that in object recognition, self-relevance was processed by higher-order cognitive functions later than 300 ms after stimulus onset.

Bukach, C M, Gauthier, I, & Tarr, M J, 2006: Beyond faces and modularity: the power of an expertise framework. Trends Cogn. Sci., 10(4):159-166.

Studies of perceptual expertise typically ask whether the mechanisms underlying face recognition are domain specific or domain general. This debate has so dominated the literature that it has masked the more general usefulness of the expertise framework for studying the phenomenon of category specialization. Here we argue that the value of an expertise framework is not solely dependent on its relevance to face recognition. Beyond offering an alternative to domain-specific accounts of face specialization in terms of interactions between experience, task demands, and neural biases, expertise studies reveal principles of perceptual learning that apply to many different domains and forms of expertise. As such the expertise framework provides a unique window onto the functional plasticity of the mind and brain.

Williams, L M, Palmer, D, Liddell, B J, Song, L, & Gordon, E, 2006: The 'when' and 'where' of perceiving signals of threat versus non-threat. NeuroImage, 31(1):458-467.

We tested the proposal that signals of potential threat are given precedence over positive and neutral signals, reflected in earlier and more pronounced changes in neural activity. The temporal sequence ('when') and source localization ('where') of event-related potentials (ERPs) elicited by fearful and happy facial expressions, compared to neutral control expressions, were examined for 219 healthy subjects. We scored ERPs over occipito-temporal sites (N80, 50-120 ms; P120, 80-180 ms; N170, 120-220 ms; P230, 180-290 ms; N250, 230-350 ms) and their polarity-reversed counterparts over medial sites (P80, 40-120 ms; N120, 80-150 ms; VPP, 120-220 ms; N200, 150-280 ms; P300, 280-450 ms). In addition to scoring peak amplitude and latency, the anatomical sources of activity were determined using low resolution brain electromagnetic tomography (LORETA). Fearful faces were distinguished by persistent increases in positivity, associated with a dynamical shift from temporo-frontal (first 120 ms) to more distributed cortical sources (120-220 ms) and back (220-450 ms). By contrast, expressions of happiness produced a discrete enhancement of negativity, later in the time course (230-350 ms) and localized to the fusiform region of the temporal cortex. In common, fear and happiness modulated the face-related N170, and produced generally greater right hemisphere activity. These findings support the proposal that fear signals are given precedence in the neural processing systems, such that processing of positive signals may be suppressed until vigilance for potential danger is completed. While fear may be processed via parallel pathways (one initiated prior to structural encoding), neural systems supporting positively valenced input may be more localized and rely on structural encoding.

Rossion, B, Kung, C-C , & Tarr, M J, 2004: Visual expertise with nonface objects leads to competition with the early perceptual processing of faces in the human occipitotemporal cortex. Proc. Nat. Acad. Sci., 101(40):14521-14526.

Human electrophysiological studies have found that the processing of faces and other objects differs reliably at 150 ms after stimulus onset, faces giving rise to a larger occipitotemporal field potential on the scalp, termed the N170. We hypothesize that visual expertise with nonface objects leads to the recruitment of early face-related categorization processes in the occipitotemporal cortex, as reflected by the N170. To test this hypothesis, the N170 in response to laterally presented faces was measured while subjects concurrently viewed centrally presented, novel, nonface objects (asymmetric "Greebles"). The task was simply to report the side of the screen on which each face was presented. Five subjects were tested during three event-related potential sessions interspersed throughout a training protocol during which they became experts with Greebles. After expertise training, the N170 in response to faces was substantially decreased (20% decrease in signal relative to that when subjects were novices) when concurrently processing a nonface object in the domain of expertise, but not when processing untrained objects of similar complexity. Thus, faces and nonface objects in a domain of expertise compete for early visual categorization processes in the occipitotemporal cortex.

Schweinberger, S R, Huddy, V, & Burton, M, 2004: N250r: a face-selective brain response to stimulus repetitions. Neuroreport, 15(9):1501-1505.

We investigated event-related brain potentials elicited by repetitions of cars, ape faces, and upright and inverted human faces. A face-selective N250r response to repetitions emerged over right temporal regions, consistent with a source in the fusiform gyrus. N250r was largest for human faces, clear for ape faces, non-significant for inverted faces, and completely absent for cars. Our results suggest that face -selective neural activity starting at ~200 ms and peaking at ~250-300 ms is sensitive to repetition and relates to individual recognition.

Schweinberger, S R, Pickering, E C, Jentzsch, I, Burton, A M, & Kaufmann, J M, 2002: Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions. Cognitive Brain Research, 14(3):398-409.

We investigated immediate repetition effects in the recognition of famous faces by recording event -related brain potentials (ERPs) and reaction times (RTs). Participants recognized celebrities' faces that were preceded by either the same picture, a different picture of the same celebrity, or a different famous face. Face repetition caused two distinct ERP modulations. Repetitions elicited a strong modulation of an N250 component (200-300 ms) over inferior temporal regions. The N250 modulation showed a degree of image specificity in that it was still significant for repetitions across different pictures, though reduced in amplitude. ERPs to repeated faces were also more positive than those to unprimed faces at parietal sites from 400 to 600 ms, but these later effects were largely independent of whether the same or a different image of the celebrity had served as prime. Finally, no influence of repetition was observed for the N170 component. Dipole source modelling suggested that the N250 repetition effect (N250r) may originate from the fusiform gyrus. In contrast, source localisation of the N170 implicated a significantly more posterior location, corresponding to a lateral occipitotemporal source outside the fusiform gyrus.

Sato, W, Kochiyama, T, Yoshikawa, S, & Matsumura, M, 2001: Emotional expression boosts early visual processing of the face: ERP recording and its decomposition by independent component analysis. NeuroReport, 12:709-714.

To investigate the hypothesis that early visual processing of stimuli might be boosted by signals of emotionality, we analyzed event related potentials (ERPs) of twelve right-handed normal subjects. Gray -scale still images of faces with emotional (fearful and happy) or neutral expressions were presented randomly while the subjects performed gender discrimination of the faces. The results demonstrated that the faces with emotion (both fear and happiness) elicited a larger negative peak at about 270 ms (N270) over the posterior temporal areas, covering a broad range of posterior visual areas. The result of independent component analysis (ICA) on the ERP data suggested that this posterior N270 had a synchronized positive activity at the frontal-midline electrode. These findings confirm that the emotional signal boosts early visual processing of the stimuli. This enhanced activity might be implemented by the amygdalar re-entrant projections.

Attention Capture and Target Detection: Left Occipito-temporal N250

Proverbio, A M, Wiedemann, F, Adorni, R, Rossi, V, Del Zotto, M, & Zani, A, 2007: Dissociating object familiarity from linguistic properties in mirror word reading. Behavioral and Brain Functions, 3(1):43.

Background: It is known that the orthographic properties of linguistic stimuli are processed within the left occipitotemporal cortex at about 150-200 ms. We recorded event-related potentials (ERPs) to words in standard or mirror orientation to investigate the role of visual word form in reading. Word inversion was performed to determine whether rotated words lose their linguistic properties.
Methods: About 1300 Italian words and legal pseudo-words were presented to 18 right-handed Italian students engaged in a letter detection task. EEG was recorded from 128 scalp sites.
Results: ERPs showed an early effect of word orientation at ~150 ms, with larger N1 amplitudes to rotated than to standard words. Low-resolution brain electromagnetic tomography (LORETA) revealed an increase in N1 to rotated words primarily in the right occipital lobe (BA 18), which may indicate an effect of stimulus familiarity. N1 was greater to target than to non-target letters at left lateral occipital sites, thus reflecting the first stage of orthographic processing. LORETA revealed a strong focus of activation for this effect in the left fusiform gyrus (BA 37), which is consistent with the so-called visual word form area (VWFA). Standard words (compared to pseudowords) elicited an enhancement of left occipito /temporal negativity at about 250-350 ms, followed by a larger anterior P3, a reduced frontal N400 and a huge late positivity. Lexical effects for rotated strings were delayed by about 100 ms at occipito /temporal sites, and were totally absent at later processing stages. This suggests the presence of implicit reading processes, which were pre-attentive and of perceptual nature for mirror strings.
Conclusion: The contrast between inverted and standard words did not lead to the identification of a purely linguistic brain region. This finding suggests some caveats in the interpretation of the inversion effect in subtractive paradigms.

Attention Capture and Target Detection: Centro-parietal N250

Kida, T, Nishihira, Y, Hatta, A, Wasaka, T, Nakata, H, Sakamoto, M, & Nakajima, T, 2003: Changes in the somatosensory N250 and P300 by the variation of reaction time. Eur. J. Appl. Physiol., 89(3-4):326-330.

We investigated the relationship between somatosensory event-related potentials (ERP) and the variation of reaction time (RT). For this purpose, we recorded the ERPs (N250 and P300) in fast- and slow-reaction trials during a somatosensory discrimination task. Strong, standard, and weak target electrical stimuli were randomly delivered to the left median nerve at the wrist with a random interstimulus interval (900-1,100 ms). All the subjects were instructed to respond by pressing a button with their right thumb as fast as possible whenever a target stimulus was presented. We divided all the trials into fast- and slow-RT trials and averaged the data. N250 latency tended to be delayed when the RT was slow, but not significantly. P300 latency was delayed significantly when the RT was slow, but to a much lesser extent than the RT delay, so we concluded that the change of RT was not fully determined by the processes reflected by the somatosensory N250 or P300. Furthermore, the larger and earlier P300 in the fast-RT trials implied that when larger amounts of attentional resources were allocated to a given task, the speed of stimulus evaluation somewhat increased and RT was shortened to a great extent. N250 amplitude did not significantly vary in the two RT clusters. In conclusion, the somatosensory N250 reflects active target detection, which is relatively independent of the modulation of the response speed, whereas the somatosensory P300 could change without manipulation of either the stimulus or the response processing demand.

Kekoni, J, H�m�l�inen, H, Cloud, V M, Reinikainen, K, & N��t�nen, R, 1996: Is the somatosensory N250 related to deviance discrimination or conscious target detection? Electroencephalog. Clin. Neurophysiol. / Evoked Potentials, 100(2):115-125.

Effects of attention to, and probability of sudden changes in, repetitive stimuli on somatosensory evoked potentials (SEP) were studied. Low- (30 Hz) and high-frequency (140 Hz) vibratory stimuli were delivered in random order to the middle finger of the left hand with different presentation probabilities in different blocks. Also ignore conditions were administered.
In the ignore conditions, the probability had no effect on SEPs. However, when the standard stimuli were omitted, the "deviants" elicited small N140 and P300 deflections not observed in response to deviants when standards were also present. In the attention conditions, deviant stimuli (targets) elicited large N250 and P300 deflections which increased in amplitude with a decreased target probability. However, when subjects counted infrequently presented "deviants" alone (standards omitted) the enhanced N140 and the P300 with shortened latency were elicited, but no N250 wave could be found. At the ipsilateral side, a distinct N200 deflection was seen which could be the N250 with a shorter latency because of an easier task (detection instead of discrimination). The results might be interpreted as suggesting that the somatosensory N250 is related to conscious detection of target stimuli.

Oddball Response: Centro-parietal N200

Holroyd, C, 2004: A Note on the Oddball N200 and the Feedback ERN. ed: Ullsperger, M, & Falkenstein, M, Errors, Conflicts, and the Brain: Current Opinions on Performance Monitoring, 211-218, Max-Planck-Institute of Cognitive and Brain Sciences.

The oddball N200 and the feedback error-related negativity (feedback ERN) are commonly regarded as two distinct components of the event-related brain potential (ERP). However, morphological similarities between the two ERP components suggest that they may in fact reflect the same phenomenon. This paper explores the ramifications of these two mutuallyexclusive possibilities. First, if the oddball N200 and the feedback ERN reflect different phenomena, then empirical methods should be developed to dissociate the two. Second, if the two components reflect the same phenomenon, then a unifying theory should be developed to account for them.

Towey, J, Rist, F, Hakerem, G, Ruchkin, D S, & Sutton, S, 1980: N250 latency and decision time. Bull. Psychonomic Soc., 15(6):365-368.

Eight subjects counted the rarer of two clicks under two levels of difficulty of discrimination. Event -related potentials showed a significant lengthening of N250 and P300 latency when the discrimination was more difficult. These findings confirm those of Ritter et al. (1979) despite the following procedural differences: (1) Stimuli differed in intensity rather than pitch, (2) the task involved silent counting rather than reaction time, and (3) statistical analyses were computed across subjects rather than within subjects. We conclude that the N250 latency shift reflects an increase in decision time as a consequence of greater difficulty. The current findings also support the Ritter et al. (1979) conclusion that the P300 latency increase is secondary to the N250 increase. Although P300 amplitude decreased with increased task difficulty, as predicted by the equivocation formulation of Ruchkin and Sutton (1979), this trend failed to reach the required .01 level of statistical significance.

Context Updading: Centro-parietal P300

Reinforcement Learning in Practice - Human Learning

Bogacz, R, Clure, S M M, Li, J, Cohen, J D, & Montague, P R, 2007: Short-term memory traces for action bias in human reinforcement learning. Brain Research, 1153:111-121.

Recent experimental and theoretical work on reinforcement learning has shed light on the neural bases of learning from rewards and punishments. One fundamental problem in reinforcement learning is the credit assignment problem, or how to properly assign credit to actions that lead to reward or punishment following a delay. Temporal difference learning solves this problem, but its efficiency can be significantly improved by the addition of eligibility traces (ET). In essence, ETs function as decaying memories of previous choices that are used to scale synaptic weight changes. It has been shown in theoretical studies that ETs spanning a number of actions may improve the performance of reinforcement learning. However, it remains an open question whether including ETs that persist over sequences of actions allows reinforcement learning models to better fit empirical data regarding the behaviors of humans and other animals. Here, we report an experiment in which human subjects performed a sequential economic decision game in which the long-term optimal strategy differed from the strategy that leads to the greatest short-term return. We demonstrate that human subjects' performance in the task is significantly affected by the time between choices in a surprising and seemingly counterintuitive way. However, this behavior is naturally explained by a temporal difference learning model which includes ETs persisting across actions. Furthermore, we review recent findings that suggest that short term synaptic plasticity in dopamine neurons may provide a realistic biophysical mechanism for producing ETs that persist on a timescale consistent with behavioral observations.

Reinforcement Learning in Theory - Machine Learning

Fox, C, Girdhar, N, & Gurney, K, 2008: A Causal Bayesian Network View of Reinforcement Learning. Proc. AAAI Int. FLAIRS.

Reinforcement Learning (RL) is a heuristic method for learning locally optimal policies in Markov Decision Processes (MDP). Its classical formulation (Sutton & Barto 1998) maintains point estimates of the expected values of states or state-action pairs. Bayesian RL (Dearden, Friedman, & Russell 1998) extends this to beliefs over values. However the concept of values sits uneasily with the original notion of Bayesian Networks (BNs), which were defined (Pearl 1988) as having explicitly causal semantics. In this paper we show how Bayesian RL can be cast in an explicitly Bayesian Network formalism, making use of backwards-in-time causality. We show how the heuristic used by RL can be seen as an instance of a more general BN inference heuristic, which cuts causal links in the network and replaces them with non-causal approximate hashing links for speed. This view brings RL into line with standard Bayesian AI concepts, and suggests similar hashing heuristics for other general inference tasks.

Engel, Y, Mannor, S, & Meir, R, 2008: Bayesian Reinforcement Learning with Gaussian Process Temporal Difference Methods. Technical Report, submitted, Technion.

Reinforcement Learning is a class of problems frequently encountered by both biological and artificial agents. An important algorithmic component of many Reinforcement Learning solution methods is the estimation of state or state-action values of a fixed policy controlling a Markov decision process (MDP), a task known as policy evaluation. We present a novel Bayesian approach to policy evaluation in general state and action spaces, which employs statistical generative models for value functions via Gaussian processes (GPs). The posterior distribution based on a GP-based statistical model provides us with a value-function estimate, as well as a measure of the variance of that estimate, opening the way to a range of possibilities not available up to now. We derive exact expressions for the posterior moments of the value GP, which admit both batch and recursive computations. An efficient sequential kernel sparsification method allows us to derive efficient online algorithms for learning good approximations of the posterior moments. By allowing our algorithms to evaluate state-action values we derive model-free algorithms based on Policy Iteration for improving policies, thus tackling the complete RL problem. A companion paper describes experiments conducted with the algorithms presented here.

Bhatnagar, S, Sutton, R S, Ghavamzadeh, M, & Lee, M, 2008: Incremental Natural Actor-Critic Algorithms. Proc. 21st Ann. Conf. Neur. Information Processing Sys..

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradient in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces, and the use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the policy gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda et al. by using temporal difference learning in the actor and by incorporating natural gradients, and extend prior empirical studies of natural-gradient actor-critic methods by Peters et al. by providing the first convergence proofs and the first fully incremental algorithms.

Arie, H, Ogata, T, Tani, J, & Sugano, S, 2007: Reinforcement learning of a continuous motor sequence with hidden states. Advanced Robotics, 21(10):1215-1229.

Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatiotemporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.

Geramifard, A, Bowling, M, Zinkevich, M, & Sutton, R S, 2007: iLSTD: Eligibility Traces and Convergence Analysis. Adv. Neur. Info. Proc. Sys. 19 (NIPS'06).

We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the data-efficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n^2), where n is the number of parameters in the linear function approximator, while iLSTD is O(n). In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n = 10,000) to show substantial computational advantages over LSTD.

Geramifard, A, Bowling, M, & Sutton, R S, 2006: Incremental Least-Squares Temporal Difference Learning. Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI'06), 356-361.

Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more computationally efficient than non-incremental least-squares TD methods such as LSTD (Bradtke & Barto 1996; Boyan 1999). In particular, we show that the per-time-step complexities of iLSTD and TD(0) are O(n), where n is the number of features, whereas that of LSTD is O(n^2). This difference can be decisive in modern applications of reinforcement learning where the use of a large number features has proven to be an effective solution strategy. We present empirical comparisons, using the test problem introduced by Boyan (1999), in which iLSTD converges faster than TD(0) and almost as fast as LSTD.

Sutton, R S, Rafols, E J, & Koop, A, 2006: Temporal abstraction in temporal-difference networks. Adv. Neur. Info. Proc. Sys. 18 (NIPS'05).

Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction (Sutton, Precup & Singh, 1999). The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.

Tanner, B, & Sutton, R S, 2005: TD(lambda) Networks: Temporal-Difference Networks with Eligibility Traces. Proc. 2005 Int'l Conf. Machine Learning, 889-896.

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(lambda) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(lambda) learning al- gorithm, creating TD(lambda) networks. We present experimental results that show TD(lambda) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(lambda) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(lambda) networks is only slightly more than that of TD networks.

Rafols, E J, Ring, M B, Sutton, R S, & Tanner, B, 2005: Using Predictive Representations to Improve Generalization in Reinforcement Learning. Proc. 2005 Int'l Joint Conf. Artificial Intelligence, 835-840.

The predictive representations hypothesis holds that particularly good generalization will result from representing the state of the world in terms of predictions about possible future experience. This hypothesis has been a central motivation behind recent research in, for example, PSRs and TD networks. In this paper we present the first explicit investigation of this hypothesis. We show in a reinforcement -learning example (a grid-world navigation task) that a predictive representation in tabular form can learn much faster than both the tabular explicit-state representation and a tabular history-based method.

Sutton, R S, & Tanner, B, 2005: Temporal-Difference Networks. Adv. Neur. Info. Proc. Sys. 17 (NIPS'04), 1377-1384.

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possible with conventional TD methods. Secondly, we show that when actions are introduced, and the inter-prediction relationships made contingent on them, the usual learning-efficiency advantage of TD methods over Monte Carlo (supervised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter -predictive temporal relationships can be expressed in these networks. Overall we argue that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

Littman, M L, Sutton, R S, & Singh, S, 2002: Predictive Representations of State. Adv. Neur. Info. Proc. Sys. 14 (NIPS'01), 1555-1561.

We show that states of a dynamical system can be usefully represented by multi-step, action-conditional predictions of future observations. State representations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent on accurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization of the predictive approach with the state representations used in POMDP and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.

Stone, P, & Sutton, R S, 2001: Scaling reinforcement learning toward RoboCup soccer. Proc. 18th Int'l Conf. Machine Learning.

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(lambda) with linear tile-coding function approximation and variable lambda to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, "the keepers," tries to keep control of the ball for as long as possible despite the efforts of "the takers." The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ball-holder and when to cover possible passing lanes. Our agents learned policies that significantly out-performed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.

Sutton, R S, Precup, D, & Singh, S, 1999: Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112:181-211.

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning. Formally, a set of options defined over an MDP constitutes a semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: 1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, 2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and 3) we propose a notion of subgoal that can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that these results can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the macro-utility problem.

Precup, D, Sutton, R S, & Singh, S P, 1997: Planning with closed-loop macro actions. AAAI Fall Symp. Model-Directed Autonomous Systems, 70-76.

Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Conventional model-based reinforcement learning uses primitive actions that last one time step and that can be modeled independently of the learning agent. These can be generalized to macro actions, multi-step actions specified by a arbitrary policy and a way of terminating. Macro actions generalize the classical notion of a macro operator in that they are closed loop, uncertain, and of variable duration. Macro actions are needed to represent common-sense higher-level actions such as going to lunch, grasping an object, or traveling to a distant city. This paper generalizes prior work on temporally abstract models (Sutton, 1995) and extends it from the prediction setting to include actions, control, and planning. We define a semantics of models of macro actions that guarantees the validity of planning using such models. This paper present new results in the theory of planning with macro actions and illustrates its potential advantages in a gridworld task.

Sutton, R S, 1996: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Adv. Neur. Info. Proc. Sys. 8 (NIPS'95), 1038-1044.

On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases, there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is significantly larger. The most important differences are that we used sparse-coarse-coded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they used learned offline. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes (rollouts), as in classical Monte Carlo methods, and as in the TD(lambda) algorithm when lambda=1. However, in our experiments this always resulted in substantially poorer performance. We conclude that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general lambda.

Sutton, R S, 1992: Gain adaptation beats least squares? Proc. 7th Yale Workshop on Adaptive and Learning Systems, 161-166.

I present computational results suggesting that gain-adaptation algorithms based in part on connectionist learning methods may improve over least squares and other classical parameter -estimation methods for stochastic time-varying linear systems. The new algorithms are evaluated with respect to classical methods along three dimensions: asymptotic error, computational complexity, and required prior knowledge about the system. The new algorithms are all of the same order of complexity as LMS methods, O(n), where n is the dimensionality of the system, whereas least-squares methods and the Kalman filter are O(n^2). The new methods also improve over the Kalman filter in that they do not require a complete statistical model of how the system varies over time. In a simple computational experiment, the new methods are shown to produce asymptotic error levels near that of the optimal Kalman filter and significantly below those of least-squares and LMS methods. The new methods may perform better even than the Kalman filter if there is any error in the filter's model of how the system varies over time.

Sutton, R S, & Barto, A G, 1990: Time-derivative models of pavlovian reinforcement. ed: M. Gabriel, M, & Moore, J, Learning and Computational Neuroscience: Foundations of Adaptive Networks, 497-537, MIT Press.

This chapter presents a model of classical conditioning called the temporal difference (TD) model. The TD model was originally developed as a neuronlike unit for use in adaptive networks (Sutton and Barto 1987; Sutton 1984; Barto, Sutton and Anderson 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. For an exposition of the TD model from an engineering point of view, see Chapter 13 of this volume. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a composite association combining innate (US) and acquired (CS) associations.

Sutton, R S, 1988: Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9-44.

This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

Sutton, R S, 1987: Implementation Details of the TD(lambda) Procedure for the Case of Vector Predictions and Backpropagation, Technical Note TN87-509, GTE Laboratories.

The outcomes z predicted by the TD(lambda) procedure can be either scalars or vectors. In the technical report, "Learning to predict by the methods of temporal differences," for simplicity all equations were given for the case in which z is a scalar. A second limitation of the equations given in the report is that they did not spell out fully the algorithmic procedure for doing backpropagation in the TD case. Since both of these extensions involve substantial additional notation, and yet add little to the main ideas of that report, it is probably appropriate to not include them there. Instead, we present these details here.

Sutton, R S, 1984: Temporal credit assignment in reinforcement learning, Ph.D Thesis, University of Massachusetts.

This dissertation describes computational experiments comparing the performance of a range of reinforcement learning algorithms. The experiments are designed to focus on aspects of the credit -assignment problem having to do with determining when the behavior that deserves credit occurred. The issues of knowledge representation involved in developing new features or refining existing ones are not addressed.

The algorithms considered include some from learning automata theory, mathematical learning theory, early "cybernetic" approaches to learning, Samuel's checker-playing program, Michie and Chambers's "Boxes" system, and a number of new algorithms. The tasks were selected to involve, first in isolation and then in combination, the issues of misleading generalizations, delayed reinforcement, unbalanced reinforcement, and secondary reinforcement. The tasks range from simple, abstract "two -armed bandit" tasks to a physically realistic pole-balancing task.

The results indicate several areas where the algorithms presented here perform substantially better than those previously studied. An unbalanced distribution of reinforcement, misleading generalizations, and delayed reinforcement can greatly retard learning and in some cases even make it counterproductive. Performance can be substantially improved in the presence of these common problems through the use of mechanisms of reinforcement comparison and secondary reinforcement. We present a new algorithm similar to the "learning-by-generalization" algorithm used for altering the static evaluation function in Samuel's checker-playing program. Simulation experiments indicate that the new algorithm performs better than a version of Samuel's algorithm suitably modified for reinforcement learning tasks. Theoretical analysis in terms of an "ideal reinforcement signal" sheds light on the relationship between these two algorithms and other temporal credit-assignment algorithms.

Barto, A G, 1995: Adaptive critics and the basal ganglia. ed: Houk, J C, Davis, J, & Beiser, D, Models of Information Processing in the Basal Ganglia, 215-232, MIT Press.

One of the most active areas of research in arti?cial intelligence is the study of learning methods by which "embedded agents" can improve performance while acting in complex dynamic environments. An agent, or decision maker, is embedded in an environment when it receives information from, and acts on, that environment in an ongoing closed-loop interaction. An embedded agent has to make decisions under time pressure and uncertainty and has to learn without the help of an ever-present knowledgeable teacher. Although the novelty of this emphasis may be inconspicuous to a biologist, animals being the prototypical embedded agents, this emphasis is a signi?cant departure from the more traditional focus in arti?cial intelligence on reasoning within circumscribed domains removed from the ?ow of real-world events. One consequence of the embedded agent view is the increasing interest in the learning paradigm called reinforcement learning (RL). Unlike the more widely studied supervised learning systems, which learn from a set of examples of correct input/output behavior, RL systems adjust their behavior with the goal of maximizing the frequency and/or magnitude of the reinforcing events they encounter over time.

Tesauro, G, 1992: Practical Issues in Temporal Difference Learning. Machine Learning, 8, :257-277.

This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD(lambda) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(lambda) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains, may be worth investigating.

Tesauro, G, 1995: Temporal difference learning and TD-Gammon. Commun. ACM, 38(3):58-68..

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers -learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.

Dearden, R, Friedman, N, & Russell, S, 1998: Bayesian Q-learning. Proc. 15th Nat'l Conf. Artificial Intelligence / 10th Conf. Innovative Applications of Artificial Intelligence, 761-768.

A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information -the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper, we adopt a Bayesian approach to maintaining this uncertain information. We extend Watkins' Q-learning by maintaining and propagating probability distributions over the Q-values. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish the convergence properties of our algorithm and show experimentally that it can exhibit substantial improvements over other well-known model-free exploration strategies.

Watkins, C J C H, 1989: Models of Delayed Reinforcement Learning, PhD thesis, Psychology Department, Cambridge University.

In behavioural ecology, stochastic dynamic programming may be used as a general method for calculating animals' optimal behavioural policies. But how might the animals themselves learm optimal policies from their experience? The aim of this thesis is to give a systematic analysis of possible computational methods of learning efficient behaviour. First it is argued that it does follow from the optimality assumption that animals should learn optimal policies, even though they may not always follow them. Next it is argued that Markov decision processes are a general formal model of an animal's behvioural choices in its environment. The conventional methods of determining optimal policies by dynamic programming are then described. It is not plausible that animals carry out calculations of this type. However, there is a range of alternative methods of organising the dynamic programming calculation, in ways that are plausible computational models of animal learning. In particular, there is an incremental Monte-Carlo method that enables optimal values (or 'canonical costs') of actions to be learned directly, without any requirement for the animal to model its environment or to remember situations and actions for more than a short period of time. A proof is given that this learning method works. Learning methods of this type are also possible for heirarchical policies. Previously suggested learning methods are reviewed, and some even simpler learning methods are presented without proof. Demonstration implementations of some of the learning methods are described.

Watkins, C J C H, & Dayan, P, 1992: Technical Note: Q-learning. Machine Learning, 8(3-4):279-292.

Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem for Q-learning based on that outlined in Watkins (1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed each iteration, rather than just one.

Poupart, P, Vlassis, N, Hoey, J, & Regan, K, 2006: An analytic solution to discrete Bayesian reinforcement learning. Proc. 23rd Int'l Conf. Machine Learning, 697-704.

Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point -based value iteration algorithm that exploits this simple parameterization.

Perkins, T J, & Barto, A G, 2003: Lyapunov design for safe reinforcement learning. J. Machine Learning Res., 3, 803-832.

Lyapunov design methods are used widely in control engineering to design controllers that achieve qualitative objectives, such as stabilizing a system or maintaining a system's state in a desired operating range. We propose a method for constructing safe, reliable reinforcement learning agents based on Lyapunov design principles. In our approach, an agent learns to control a system by switching among a number of given, base-level controllers. These controllers are designed using Lyapunov domain knowledge so that any switching policy is safe and enjoys basic performance guarantees. Our approach thus ensures qualitatively satisfactory agent behavior for virtually any reinforcement learning algorithm and at all times, including while the agent is learning and taking exploratory actions. We demonstrate the process of designing safe agents for four different control problems. In simulation experiments, we find that our theoretically motivated designs also enjoy a number of practical benefits, including reasonable performance initially and throughout learning, and accelerated learning.

Naveed, M H, & Cowling, P I, 2006: Using coevolution and gradient-based learning for the virus game. Proc. 2006 Int'l Conf. Game Research & Development, 283-287 .

This paper presents a novel coevolutionary model which is used to create strong game (The Virus Game) playing strategies. We use two approaches to coevolve Artificial Neural Networks (ANN) which evaluate board positions of a two player zero-sum game (The Virus Game). The first approach uses the coevolution with initial population of random ANN and second approach is a novel coevolutionary model with initial population of ANN which are trained using gradient based adaptive learning methods (Backpropagation, RPROP and iRPROP). In our case, the results of coevolutionary experiments show that pre training of the population in coevolution is highly effective in creating stronger game playing strategies than coevolution with random population.

Wang, T, Lizotte, D, Bowling, M, & Schuurmans, D, 2005: Bayesian sparse sampling for on-line reward optimization. Proc. 22nd Int'l Conf. Machine learning, 956-963.

We present an efficient "sparse sampling" technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios.

Brafman, R I, & Tennenholtz, M, 2003: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learning Res., 3, 213-231.

R-MAX is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-MAX, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. R-MAX improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E3 algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the "optimism under uncertainty" bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz's LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the only algorithm for learning in repeated games, to date, which is provably efficient, considerably improving and simplifying previous algorithms by Banos and by Megiddo.

Mannor, S, & Shimkin, N, 2004: A Geometric Approach to Multi-Criterion Reinforcement Learning. J. Machine Learning Res., 5, 325-360.

We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, actions that are observed but cannot be predicted beforehand. We capture this situation using a stochastic game model, where the learning agent is facing an adversary whose policy is arbitrary and unknown, and where the reward function is vector-valued. State recurrence conditions are imposed throughout. In our basic problem formulation, a desired target set is specified in the vector reward space, and the objective of the learning agent is to approach the target set, in the sense that the long-term average reward vector will belong to this set. We devise appropriate learning algorithms, that essentially use multiple reinforcement learning algorithms for the standard scalar reward problem, which are combined using the geometric insight from the theory of approachability for vector-valued stochastic games. We then address the more general and optimization-related problem, where a nested class of possible target sets is prescribed, and the goal of the learning agent is to approach the smallest possible target set (which will generally depend on the unknown system parameters). A particular case which falls into this framework is that of stochastic games with average reward constraints, and further specialization provides a reinforcement learning algorithm for constrained Markov decision processes. Some basic examples are provided to illustrate these results.

Duff, M O, 2002: Optimal learning: computational procedures for bayes-adaptive Markov decision processes, Ph.D. Thesis, University of Massachusetts, Amherst.

This dissertation considers a particular aspect of sequential decision making under uncertainty in which, at each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world to change. Optimal learning is a pattern of behavior that yields the highest expected total reward over the entire duration of an agent's interaction with its uncertain world. The problem of determining an optimal learning strategy is a sort of meta -problem, with optimality defined with respect to a distribution of environments that the agent is likely to encounter. Given this prior uncertainty over possible environments, the optimal-learning agent must collect and use information in an intelligent way, balancing greedy exploitation of certainty-equivalent world models with exploratory actions aimed at discerning the true state of nature.
My approach to approximating optimal learning strategies retains the full model of the sequential decision process that, in incorporating a Bayesian model for evolving uncertainty about unknown process parameters, takes the form of a Markov decision process defined over a set of "hyperstates" whose cardinality grows exponentially with the planning horizon.
I develop computational procedures that retain the full Bayesian formulation, but sidestep intractability by utilizing techniques from reinforcement learning theory (specifically, Monte-Carlo simulation and the adoption of parameterized function approximators). By pursuing an approach that is grounded in a complete Bayesian world model, I develop algorithms that produce policies that exhibit performance gains over simple heuristics. Moreover, in contrast to many heuristics, the justification or legitimacy of the policies follows directly from the fact that they are clearly motivated by a complete characterization of the underlying decision problem to be solved.
This dissertation's contributions include a reinforcement learning algorithm for estimating Gittins indices for multi-armed bandit problems, a Monte-Carlo gradient-based algorithm for approximating solutions to general problems of optimal learning, a gradient-based scheme for improving optimal learning policies instantiated as finite-state stochastic automata, and an investigation of diffusion processes as analytical models for evolving uncertainty.

Strens, M J A, 2000: A Bayesian Framework for Reinforcement Learning. Proc. 17th Int'l Conf. Machine Learning, 943-950.

The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states.

Choi, D, & Roy, B, 2006: A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, 16(2):207-239.

The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins---Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems.

Iwata, K, Ikeda, K, & Sakai, H, 2006: The asymptotic equipartition property in reinforcement learning and its relation to return maximization. Neural Networks, 19(1):62-75.

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.

Reinforcement Learning in Spiking Neural Networks

Urbanczik, R, & Senn, W, 2008: Reinforcement learning in populations of spiking neurons. Nature Neurosci. in press.

Population coding is widely regarded as a key mechanism for achieving reliable behavioral responses in the face of neuronal variability. But in standard reinforcement learning a ?ip-side becomes apparent. Learning slows down with increasing population size since the global reinforcement becomes less and less related to the performance of any single neuron. We show that, in contrast, learning speeds up with increasing population size if feedback about the population response modulates synaptic plasticity in addition to global reinforcement. The two feedback signals (reinforcement and population-response signal) can be encoded by ambient neurotransmitter concentrations which vary slowly, yielding a fully online plasticity rule where the learning of a stimulus is interleaved with the processing of the subsequent one. The assumption of a single additional feedback mechanism therefore reconciles biological plausibility with efficient learning.

Xie, X, & Seung, H S, 2004: Learning in neural networks by reinforcement of irregular spiking. Phys. Rev. E, 69(4):041909.

Artificial neural networks are often trained by using the back propagation algorithm to compute the gradient of an objective function with respect to the synaptic strengths. For a biological neural network, such a gradient computation would be difficult to implement, because of the complex dynamics of intrinsic and synaptic conductances in neurons. Here we show that irregular spiking similar to that observed in biological neurons could be used as the basis for a learning rule that calculates a stochastic approximation to the gradient. The learning rule is derived based on a special class of model networks in which neurons fire spike trains with Poisson statistics. The learning is compatible with forms of synaptic dynamics such as short-term facilitation and depression. By correlating the fluctuations in irregular spiking with a reward signal, the learning rule performs stochastic gradient ascent on the expected reward. It is applied to two examples, learning the XOR computation and learning direction selectivity using depressing synapses. We also show in simulation that the learning rule is applicable to a network of noisy integrate-and-fire neurons.

Zador, A M, & Pearlmutter, B A, 1996: VC dimension of an integrate-and-fire neuron model. Neural Computation, 8():611-624.

We find the VC dimension of a leaky integrate-andfire neuron model. The VC dimension quantifies the ability of a function class to partition an input pattern space, and can be considered a measure of computational capacity. In this case, the function class is the class of integrate-and-fire models generated by varying the integration time constant and the threshold, the input space they partition is the space of continuous-time signals, and the binary partition is specified by whether or not the model reaches threshold and spikes at some specified time. We show that the VC dimension diverges only logarithmically with the input signal bandwidth N, where the signal bandwidth is determined by the noise inherent in the process of spike generation. For reasonable estimates of the signal bandwidth, the VC dimension turns out to be quite small (10). We also extend this approach to arbitrary passive dendritic trees. The main contributions of this work are (1) it offers a novel treatment of the computational capacity of this class of dynamic system; and (2) it provides a framework for analyzing the computational capabilities of the dynamical systems defined by networks of spiking neurons.

Fiete, I R, 2003: Learning and coding in biological neural networks, Ph.D. Thesis, Department of Physics, Harvard University.

How can large groups of neurons that locally modify their activities learn to collectively perform a desired task? Do studies of learning in small networks tell us anything about learning in the fantastically large collection of neurons that make up a vertebrate brain? What factors do neurons optimize by encoding sensory inputs or motor commands in the way they do? In this thesis I present a collection of four theoretical works: each of the projects was motivated by specific constraints and complexities of biological neural networks, as revealed by experimental studies; together, they aim to partially address some of the central questions of neuroscience posed above.
We first study the role of sparse neural activity, as seen in the coding of sequential commands in a premotor area responsible for birdsong. We show that the sparse coding of temporal sequences in the songbird brain can, in a network where the feedforward plastic weights must translate the sparse sequential code into a time-varying muscle code, facilitate learning by minimizing synaptic interference.
Next, we propose a biologically plausible synaptic plasticity rule that can perform goal-directed learning in recurrent networks of voltage-based spiking neurons that interact through conductances. Learning is based on the correlation of noisy local activity with a global reward signal; we prove that this rule performs stochastic gradient ascent on the reward. Thus, if the reward signal quantifies network performance on some desired task, the plasticity rule provably drives goal-directed learning in the network.
To assess the convergence properties of the learning rule, we compare it with a known example of learning in the brain. Song-learning in finches is a clear example of a learned behavior, with detailed available neurophysiological data. With our learning rule, we train an anatomically accurate model birdsong network that drives a sound source to mimic an actual zebra finch song. Simulation and theoretical results on the scalability of this rule show that learning with stochastic gradient ascent may be adequately fast to explain learning in the bird.
Finally, we address the more general issue of the scalability of stochastic gradient learning on quadratic cost surfaces in linear systems, as a function of system size and task characteristics, by deriving analytical expressions for the learning curves.

Reinforcement Learning in the Cerebellar Model Articulation Controller

Hu, Y & Fellman, R D, 1994: A hardware efficient implementation of a boxes reinforcement learning system. IEEE Int'l Conf. Neural Networks, 4:2297-2302.

This paper presents two modifications to the Boxes-ASE/ACE reinforcement learning algorithm to improve implementation efficiency and performance. A state history queue (SHQ) replaces the decay computations associated with each control state, decoupling the dependence of computational demand from the number of control states. A dynamic link table implements CMAC state association to decrease training time, yet minimize the number of control states. Simulations of the link table demonstrated its potential for minimizing control states for unoptimized state-space quantization. Simulations coupling the link table to CMAC state association show a 3-fold reduction in learning time. A hardware implementation of the pole-cart balancer shows the SHQ modification to reduce computation time 12 -fold.

Mori, T, Nakamura, Y, Sato, M, & Ishii, S, 2004: Reinforcement Learning for a CPG-driven Biped Robot. Proc. 18th AAAI Conf. Artif. Intel., 623-630.

Animal's rhythmic movements such as locomotion are considered to be controlled by neural circuits called central pattern generators (CPGs). This article presents a reinforcement learning (RL) method for a CPG controller, which is inspired by the control mechanism of animals. Because the CPG controller is an instance of recurrent neural networks, a naive application of RL involves difficulties. In addition, since state and action spaces of controlled systems are very large in real problems such as robot control, the learning of the value function is also difficult. In this study, we propose a learning scheme for a CPG controller called a CPG actor-critic model, whose learning algorithm is based on a policy gradient method. We apply our RL method to autonomous acquisition of biped locomotion by a biped robot simulator. Computer simulations show our method is able to train a CPG controller such that the learning process is stable.

Functional Neuroanatomy in General

Toro, R, Fox, P T, & Paus, T, 2008: Functional Coactivation Map of the Human Brain. Cerebral Cortex, in press.

Understanding the interactions among different brain regions is fundamental to our understanding of brain function. Here we describe a complete map of functional connections in the human brain derived by an automatic meta-analysis of 825 neuroimaging articles, representing 3402 experiments. The likelihood of a functional connection between regions was estimated by studying the interdependence of their "activity," as reported in each experiment, across all experiments. We obtained a dense coactivation map that recovers some fundamental principles of the brain's functional connectivity, such as the symmetric interhemispheric connections, and important functional networks, such as the fronto -parietal attention network, the resting state network and the motor network.

Sporns, O, Tononi, G, & K�tter, R, 2005: The human connectome: a structural description of the human brain. PLoS Computational Biology, 1(4):e42.

The connection matrix of the human brain (the human "connectome") represents an indispensable foundation for basic and applied neurobiological research. However, the network of anatomical connections linking the neuronal elements of the human brain is still largely unknown. While some databases or collations of large-scale anatomical connection patterns exist for other mammalian species, there is currently no connection matrix of the human brain, nor is there a coordinated research effort to collect, archive, and disseminate this important information. We propose a research strategy to achieve this goal, and discuss its potential impact.

Passingham, R E, Stephan, K E, & K�tter, R, 2002: The anatomical basis of functional localization in the cortex. Nature Rev. Neurosci., 3:606-616.

The functions of a cortical area are determined by its extrinsic connections and intrinsic properties. Using the database CoCoMac, we show that each cortical area has a unique pattern of corticocortical connections - a 'connectional fingerprint'. We present examples of such fingerprints and use statistical analysis to show that no two areas share identical patterns. We suggest that the connectional fingerprint underlies the observed cell-firing differences between areas during different tasks. We refer to this pattern as a 'functional fingerprint' and present examples of such fingerprints. In addition to electrophysiological analysis, functional fingerprints can be determined by functional brain imaging. We argue that imaging provides a useful way to define such fingerprints because it is possible to compare activations across many cortical areas and across a wide range of tasks.

Bressler, S L, Tognoli E, 2006: Operational principles of neurocognitive networks. Int'l J. Psychophysiol., 60(2):139-148.

Large-scale neural networks are thought to be an essential substrate for the implementation of cognitive function by the brain. If so, then a thorough understanding of cognition is not possible without knowledge of how the large-scale neural networks of cognition (neurocognitive networks) operate. Of necessity, such understanding requires insight into structural, functional, and dynamical aspects of network operation, the intimate interweaving of which may be responsible for the intricacies of cognition.
Knowledge of anatomical structure is basic to understanding how neurocognitive networks operate. Phylogenetically and ontogenetically determined patterns of synaptic connectivity form a structural network of brain areas, allowing communication between widely distributed collections of areas. The function of neurocognitive networks depends on selective activation of anatomically linked cortical and subcortical areas in a wide variety of configurations. Large-scale functional networks provide the cooperative processing which gives expression to cognitive function. The dynamics of neurocognitive network function relates to the evolving patterns of interacting brain areas that express cognitive function in real time.
This article considers the proposition that a basic similarity of the structural, functional, and dynamical features of all neurocognitive networks in the brain causes them to function according to common operational principles. The formation of neural context through the coordinated mutual constraint of multiple interacting cortical areas, is considered as a guiding principle underlying all cognitive functions. Increasing knowledge of the operational principles of neurocognitive networks is likely to promote the advancement of cognitive theories, and to seed strategies for the enhancement of cognitive abilities.

Kringelbach, M L, & Rolls, E T, 2004: The functional neuroanatomy of the human orbitofrontal cortex: evidence from neuroimaging and neuropsychology. Progress in Neurobiology, 72:341-372.

The human orbitofrontal cortex is an important brain region for the processing of rewards and punishments, which is a prerequisite for the complex and flexible emotional and social behaviour which contributes to the evolutionary success of humans. Yet much remains to be discovered about the functions of this key brain region, and new evidence from functional neuroimaging and clinical neuropsychology is affording newinsights into the different functions of the human orbitofrontal cortex .We reviewthe neuroanatomical and neuropsychological literature on the human orbitofrontal cortex, and propose two distinct trends of neural activity based on a meta-analysis of neuroimaging studies. One is a mediolateral distinction, whereby medial orbitofrontal cortex activity is related to monitoring the reward value of many different reinforcers, whereas lateral orbitofrontal cortex activity is related to the evaluation of punishers which may lead to a change in ongoing behaviour. The second is a posterior -anterior distinction with more complex or abstract reinforcers (such as monetary gain and loss) represented more anteriorly in the orbitofrontal cortex than simpler reinforcers such as taste or pain. Finally, we propose new neuroimaging methods for obtaining further evidence on the localisation of function in the human orbitofrontal cortex.

Local Versus Global Processing

Sanders, L D, & Poeppel, D, 2007: Local and Global Auditory Processing: Behavioral and ERP Evidence. Neuropsychologia, 45(6):1172-1186.

Differential processing of local and global visual features is well established. Global precedence effects, differences in event-related potentials (ERPs) elicited when attention is focused on local versus global levels, and hemispheric specialization for local and global features all indicate that relative scale of detail is an important distinction in visual processing. Observing analogous differential processing of local and global auditory information would suggest that scale of detail is a general organizational principle of the brain. However, to date the research on auditory local and global processing has primarily focused on music perception or on the perceptual analysis of relatively higher and lower frequencies. The study described here suggests that temporal aspects of auditory stimuli better capture the local-global distinction. By combining short (40 ms) frequency modulated tones in series to create global auditory patterns (500 ms), we independently varied whether pitch increased or decreased over short time spans (local) and longer time spans (global). Accuracy and reaction time measures revealed better performance for global judgments and asymmetric interference that were modulated by amount of pitch change. ERPs recorded while participants listened to identical sounds and indicated the direction of pitch change at the local or global levels provided evidence for differential processing similar to that found in ERP studies employing hierarchical visual stimuli. ERP measures failed to provide evidence for lateralization of local and global auditory perception, but differences in distributions suggest preferential processing in more ventral and dorsal areas respectively.

ERN Applications for BCI

Ferrez, P W, & del R. Millan, J, 2008: Error-Related EEG Potentials Generated During Simulated Brain-Computer Interaction. IEEE Trans. Biomed. Eng., 55(3):923-929.

Brain-computer interfaces (BCIs) are prone to errors in the recognition of subject's intent. An elegant approach to improve the accuracy of BCIs consists in a verification procedure directly based on the presence of error-related potentials (ErrP) in the electroencephalogram (EEG) recorded right after the occurrence of an error. Several studies show the presence of ErrP in typical choice reaction tasks. However, in the context of a BCI, the central question is: ldquoAre ErrP also elicited when the error is made by the interface during the recognition of the subject's intent?rdquo We have thus explored whether ErrP also follow a feedback indicating incorrect responses of the simulated BCI interface. Five healthy volunteer subjects participated in a new human-robot interaction experiment, which seem to confirm the previously reported presence of a new kind of ErrP. However, in order to exploit these ErrP, we need to detect them in each single trial using a short window following the feedback associated to the response of the BCI. We have achieved an average recognition rate of correct and erroneous single trials of 83.5% and 79.2%, respectively, using a classifier built with data recorded up to three months earlier.

Spatial Filtering of EEG

Blankertz, B, Tomioka, R, Lemm, S, Kawanabe, M, & M�ller, K-R, 2008: Optimizing spatial filters for robust EEG single-trial analysis. IEEE Sig. Proc. Mag., 25 (1):41-56.

Due to the volume conduction multi-channel electroencephalogram (EEG) recordings give a rather blurred image of brain activity. Therefore spatial ?lters are extremely useful in single-trial analysis in order to improve the signal-to-noise ratio. There are powerful methods from machine learning and signal processing that permit the optimization of spatio-temporal ?lters for each subject in a data dependent fashion beyond the ?xed ?lters based on the sensor geometry, e.g., Laplacians. Here we elucidate the theoretical background of the Common Spatial Pattern (CSP) algorithm, a popular method in Brain -Computer Interface (BCI) research. Apart from reviewing several variants of the basic algorithm, we reveal tricks of the trade for achieving a powerful CSP performance, brie?y elaborate on theoretical aspects of CSP and demonstrate the application of CSP-type preprocessing in our studies of the Berlin Brain-Computer Interface project.

Blankertz, B, Kawanabe, M, Tomioka, R, Hohlefeld, F, Nikulin, V, & M�ller, K-R, 2008: Invariant common spatial patterns: Alleviating nonstationarities in brain-computer interfacing. Adv. Neur. Info. Proc. Sys. (NIPS'07), 20.

Brain-Computer Interfaces can suffer from a large variance of the subject conditions within and across sessions. For example vigilance ?uctuations in the individual, variable task involvement, workload etc. alter the characteristics of EEG signals and thus challenge a stable BCI operation. In the present work we aim to de?ne features based on a variant of the common spatial patterns (CSP) algorithm that are constructed invariant with respect to such nonstationarities. We enforce invariance properties by adding terms to the denominator of a Rayleigh coef?cient representation of CSP such as disturbance covariance matrices from ?uctuations in visual processing. In this manner physiological prior knowledge can be used to shape the classi?cation engine for BCI. As a proof of concept we present a BCI classi?er that is robust to changes in the level of parietal a-activity. In other words, the EEG decoding still works when there are lapses in vigilance.

Functional Source Localization

Brett, M, Johnsrude, I S, & Owen, A M, 2002: The problem of functional localization in the human brain. Nature Rev. Neurosci., 3:243-249.

Functional imaging gives us increasingly detailed information about the location of brain activity. To use this information, we need a clear conception of the meaning of location data. Here, we review methods for reporting location in functional imaging and discuss the problems that arise from the great variability in brain anatomy between individuals. These problems cause uncertainty in localization, which limits the effective resolution of functional imaging, especially for brain areas involved in higher cognitive function.

Mazziotta, J C, Toga, A W, Evans, A, Fox, P, & Lancaster, J, 1995: A Probablistic Atlas of the Human Brain: Theory and Rationale for Its Development. NeuroImage 2(2, part, 1):89-101.

As in geography, neuroanatomy requires accepted maps, terminology, coordinate systems and reference spaces in order to allow accurate and effective communication within the field and to allied disciplines. Unlike geographical atlases, anatomical atlases cannot assume a single, constant physical reality. Classic atlases of the human brain and other species have been derived from a single brain, or brains from a very small number of subjects, and have employed simple scale factors to stretch or constrict a given subject's brain to match the atlas. The result is a rigid and often inflexible system that disregards useful information about both morphometric and densitometric variability between subjects. This article reviews the theory and rationale for defining a probabilistic atlas derived from a large series of subjects, representative of the entire species, with retention of information about variability. Such a project must take on the problems inherent in dealing with biologically variable structure and function but, when successful, will provide a system that is realistic in its complexity, has defined accuracy and errors, and that, as a benefit, contributes new neurobiological information.

Evans, A C, Collins, D L, Mills, S R, Brown, E D, Kelly, R L, & Peters, T M, 1993: 3D statistical neuroanatomical models from 305 MRI volumes. Proc. IEEE-Nuclear Science Symp. & Med. Imaging Conf., 1813-1817.

Recently, there has been a rapid growth in the use of 3D multi-modal correlative imaging for studies of the human brain. Regional cerebral blood flow (CBF) changes indicate brain areas involved in stimulus processing. These focal changes are often too small (<10%) to be discerned from a single subject and the experiment is repeated in a series of individuals. To investigate the extent of residual variability the authors have collected over 300 MRI volumetric datasets from normal individuals and transformed these datasets into stereotaxic space using a 3D linear re-sampling algorithm. The authors then generated a series of statistical measures which express this population nonlinear variability in the form of parametric volumes, e.g. mean intensity, intensity variance. A model for anatomical variability, expressed as the width of a Gaussian blurring kernel applied to an ideal single subject, was developed and tested against the observed data.

ICA with Clustering

Contreras-Vidal, J L, & Kerick, S E, 2004: Independent component analysis of dynamic brain responses during visuomotor adaptation. NeuroImage, 21(3):936-945.

To investigate the spatial and temporal changes in electro-cortical brain activity and hand kinematics during the acquisition of an internal model of a novel screen-cursor transformation, we employed single -trial infomax independent component analysis (ICA), spectral estimation, and kinematics methods. Participants performed center-out drawing movements under normal and rotated visual feedback of pen movements displayed on a computer screen. Clustering of task-related and adaptation-related independent components identified a selective recruitment of brain activation/deactivation foci associated with the exposure to the distorted visual feedback, including networks associated with frontal-, central-, and lateral-posterior alpha rhythms, and frontal-central error-related negativity potential associated with transient theta and low beta rhythms locked to movement onset. Moreover, adaptation to the rotated reference frame was associated with a reduction in the imposed directional bias and decreases in movement path length and movement time by late-exposure trials, as well as after-effects after removal of the visual distortion. The underlying spatiotemporal pattern of activations is consistent with recruitment of frontal-parietal, sensory-motor, and anterior cingulate cortical areas during visuomotor adaptation.

Clutsering Methods

Still, S, & Bialek, W, 2004: How Many Clusters? An Information-Theoretic Perspective. Neural Computation, 16(12):2483-2506.

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate for the description of a given system. Traditional approaches to this problem are based on either a framework in which clusters of a particular shape are assumed as a model of the system or on a two-step procedure in which a clustering criterion determines the optimal assignments for a given number of clusters and a separate criterion measures the goodness of the classification to determine the number of clusters. In a statistical mechanics approach, clustering can be seen as a trade-off between energy- and entropy-like terms, with lower temperature driving the proliferation of clusters to provide a more detailed description of the data. For finite data sets, we expect that there is a limit to the meaningful structure that can be resolved and therefore a minimum temperature beyond which we will capture sampling noise. This suggests that correcting the clustering criterion for the bias that arises due to sampling errors will allow us to find a clustering solution at a temperature that is optimal in the sense that we capture maximal meaningful structure-without having to define an external criterion for the goodness or stability of the clustering. We show that in a general information-theoretic framework, the finite size of a data set determines an optimal temperature, and we introduce a method for finding the maximal number of clusters that can be resolved from the data in the hard clustering limit.

Burman, P A: Comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3):503-514.

Concepts of v-fold cross validation and repeated learning-testing methods have been introduced here. In many problems, these methods are computationally much less expensive than ordinary cross-validation and can be used in its place. A comparative study of these three methods has been carried out in detail.