Emergent Embodied Cognition

Bernie C. Till

 B.C. Till Home   B.C. Till Research   Papers Archive   Design Resources   UVATT Home 

Papers on the emergence of cognition in autonomous self-organizing embodied systems.

Contents

Functional Architecture of the Human Brain

IEEE Trans. Autonomous Mental Development

Volume 1 - 2009

Issue 1

Issue 2

Issue 3

Issue 4

Volume 2 - 2010

Issue 1

Issue 2

Issue 3

Issue 4

Volume 3 - 2011

Issue 1

Preprints in press as of March 2011


Functional Architecture of the Human Brain

Biswal, B B, Mennes, M, Zuo, X-N, Gohel, S, Kelly, C, Smith, S M, Beckmann, C F, Adelstein, J S, Buckner, R L, Colcombe, S, Dogonowski, A-M, Ernst, M, Fair, D, Hampson, M, Hoptman, M J, Hyde, J S, Kiviniemi, V J, Kötter, R, Li, S-J, Lin, C-P, Lowe, M J, Mackay, C, Madden, D J, Madsen, K H, Margulies, D S, Mayberg, H S, Mahon, K M, Monk, C S, Mostofsky, S H, Nagel, B J, Pekar, J J, Peltier, S J, Petersen, S E, Riedl, V, Rombouts, S A R B, Rypma, B, Schlaggar, B L, Schmidt, S, Seidler, R D, Siegle, G J, Sorg, C, Teng, G-J, Veijola, J, Villringer, A, Walter, M, Wang, L, Weng, X-C, Whitfield-Gabrieli, S W, Williamson, P, Windischberger, C, Zang, Y-F, Zhang, H-Y, Castellanos, F X, & Milham, M P, 2010: Toward discovery science of human brain function. Proc. Nat'l. Acad. Sci., 107(10):4734-4739.

Although it is being successfully implemented for exploration of the genome, discovery science has eluded the functional neuroimaging community. The core challenge remains the development of common paradigms for interrogating the myriad functional systems in the brain without the constraints of a priori hypotheses. Resting-state functional MRI (R-fMRI) constitutes a candidate approach capable of addressing this challenge. Imaging the brain during rest reveals large-amplitude spontaneous low-frequency (<0.1 Hz) fluctuations in the fMRI signal that are temporally correlated across functionally related areas. Referred to as functional connectivity, these correlations yield detailed maps of complex neural systems, collectively constituting an individual's "functional connectome." Reproducibility across datasets and individuals suggests the functional connectome has a common architecture, yet each individual's functional connectome exhibits unique features, with stable, meaningful interindividual differences in connectivity patterns and strengths. Comprehensive mapping of the functional connectome, and its subsequent exploitation to discern genetic influences and brain-behavior relationships, will require multicenter collaborative datasets. Here we initiate this endeavor by gathering R-fMRI data from 1,414 volunteers collected independently at 35 international centers. We demonstrate a universal architecture of positive and negative functional connections, as well as consistent loci of inter-individual variability. Age and sex emerged as significant determinants. These results demonstrate that independent R-fMRI datasets can be aggregated and shared. High-throughput R-fMRI can provide quantitative phenotypes for molecular genetic studies and biomarkers of developmental and pathological processes in the brain. To initiate discovery science of brain function, the 1000 Functional Connectomes Project dataset is freely accessible at www.nitrc.org/projects/fcon_1000/.

Frackowiak, R S J, 1998: The functional architecture of the brain. Daedalus, 127(2):105-130.

Introduction; The aims of noninvasive human brain mapping; Describing the brain's anatomy and function; The mapping of sensory signals onto the human cerebral cortex; Functional specialization in the occipital cortex; Beyond the extrastriate cortex; The problem of self-reporting; The localization of memory; Synthesis and conclusions.

Friston, K J, & Price, C J, 2001: Dynamic representations and generative models of brain function. Brain Research Bulletin, 54(3):275-285.

The main point made in this article is that the representational capacity and inherent function of any neuron, neuronal population or cortical area is dynamic and context-sensitive. This adaptive and contextual specialisation is mediated by functional integration or interactions among brain systems with a special emphasis on backwards or top-down connections. The critical notion is that neuronal responses, in any given cortical area, can represent different things at different times. Our argument is developed under the perspective of generative models of functional brain architectures, where higher-level systems provide a prediction of the inputs to lower-level regions. Conflict between the two is resolved by changes in the higher-level representations, driven by the resulting error in lower regions, until the mismatch is 'cancelled'. In this model the specialisation of any region is determined both by bottom-up driving inputs and by top -down predictions. Specialisation is therefore not an intrinsic property of any region but depends on both forward and backward connections with other areas. Because these other areas have access to the context in which the inputs are generated they are in a position to modulate the selectivity or specialisation of lower areas. The implications for 'classical' models (e.g., classical receptive fields in electrophysiology, classical specialisation in neuroimaging and connectionism in cognitive models) are severe and suggest these models provide incomplete accounts of real brain architectures. Generative models represent a far more plausible framework for understanding selective neurophysiological responses and how representations are constructed in the brain.

Friston, K J, 2002: Functional integration and inference in the brain. Prog. Neurobiol., 68(2):113-143.

Self-supervised models of how the brain represents and categorises the causes of its sensory input can be divided into two classes: those that minimise the mutual information (i.e. redundancy) among evoked responses and those that minimise the prediction error. Although these models have similar goals, the way they are attained, and the functional architectures employed, can be fundamentally different. This review describes the two classes of models and their implications for the functional anatomy of sensory cortical hierarchies in the brain. We then consider how empirical evidence can be used to disambiguate between architectures that are sufficient for perceptual learning and synthesis.
Most models of representational learning require prior assumptions about the distribution of sensory causes. Using the notion of empirical Bayes, we show that these assumptions are not necessary and that priors can be learned in a hierarchical context. Furthermore, we try to show that learning can be implemented in a biologically plausible way. The main point made in this review is that backward connections, mediating internal or generative models of how sensory inputs are caused, are essential if the process generating inputs cannot be inverted. Because these processes are dynamical in nature, sensory inputs correspond to a non-invertible nonlinear convolution of causes. This enforces an explicit parameterisation of generative models (i.e. backward connections) to enable approximate recognition and suggests that feedforward architectures, on their own, are not sufficient. Moreover, nonlinearities in generative models, that induce a dependence on backward connections, require these connections to be modulatory; so that estimated causes in higher cortical levels can interact to predict responses in lower levels. This is important in relation to functional asymmetries in forward and backward connections that have been demonstrated empirically.
To ascertain whether backward influences are expressed functionally requires measurements of functional integration among brain systems. This review summarises approaches to integration in terms of effective connectivity and proceeds to address the question posed by the theoretical considerations above. In short, it will be shown that functional neuroimaging can be used to test for interactions between bottom-up and top-down inputs to an area. The conclusion of these studies points toward the prevalence of top-down influences and the plausibility of generative models of sensory brain function.

Friston, K J, 2003: Learning and inference in the brain. Neural Networks, 16(9):1325-1352.

This article is about how the brain data mines its sensory inputs. There are several architectural principles of functional brain anatomy that have emerged from careful anatomic and physiologic studies over the past century. These principles are considered in the light of representational learning to see if they could have been predicted a priori on the basis of purely theoretical considerations. We first review the organisation of hierarchical sensory cortices, paying special attention to the distinction between forward and backward connections. We then review various approaches to representational learning as special cases of generative models, starting with supervised learning and ending with learning based upon empirical Bayes. The latter predicts many features, such as a hierarchical cortical system, prevalent top-down backward influences and functional asymmetries between forward and backward connections that are seen in the real brain.
The key points made in this article are: (i) hierarchical generative models enable the learning of empirical priors and eschew prior assumptions about the causes of sensory input that are inherent in non -hierarchical models. These assumptions are necessary for learning schemes based on information theory and efficient or sparse coding, but are not necessary in a hierarchical context. Critically, the anatomical infrastructure that may implement generative models in the brain is hierarchical. Furthermore, learning based on empirical Bayes can proceed in a biologically plausible way. (ii) The second point is that backward connections are essential if the processes generating inputs cannot be inverted, or the inversion cannot be parameterised. Because these processes involve many-to-one mappings, are non-linear and dynamic in nature, they are generally non-invertible. This enforces an explicit parameterisation of generative models (i .e. backward connections) to afford recognition and suggests that forward architectures, on their own, are not sufficient for perception. (iii) Finally, non-linearities in generative models, mediated by backward connections, require these connections to be modulatory, so that representations in higher cortical levels can interact to predict responses in lower levels. This is important in relation to functional asymmetries in forward and backward connections that have been demonstrated empirically.

Friston, K J, Kilner, J, & Harrison, L, 2006: A free energy principle for the brain. J. Physiol. - Paris, 100(1-3):70-87.

By formulating Helmholtz's ideas about perception, in terms of modern-day theories, one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts: using constructs from statistical physics, the problems of inferring the causes of sensory input and learning the causal structure of their generation can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory input is caused. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and context-sensitive fashion. This scheme provides a principled way to understand many aspects of cortical organisation and responses.
In this paper, we show these perceptual processes are just one aspect of emergent behaviours of systems that conform to a free energy principle. The free energy considered here measures the difference between the probability distribution of environmental quantities that act on the system and an arbitrary distribution encoded by its configuration. The system can minimise free energy by changing its configuration to affect the way it samples the environment or change the distribution it encodes. These changes correspond to action and perception respectively and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment assumes that the system's state and structure encode an implicit and probabilistic model of the environment. We will look at the models entailed by the brain and how minimisation of its free energy can explain its dynamics and structure.

Friston, K J, & Kiebel, S, 2009: Cortical circuits for perceptual inference. Neural Networks, 22(8):1093-1104.

This paper assumes that cortical circuits have evolved to enable inference about the causes of sensory input received by the brain. This provides a principled specification of what neural circuits have to achieve. Here, we attempt to address how the brain makes inferences by casting inference as an optimisation problem. We look at how the ensuing recognition dynamics could be supported by directed connections and message -passing among neuronal populations, given our knowledge of intrinsic and extrinsic neuronal connections. We assume that the brain models the world as a dynamic system, which imposes causal structure on the sensorium. Perception is equated with the optimisation or inversion of this internal model, to explain sensory input. Given a model of how sensory data are generated, we use a generic variational approach to model inversion to furnish equations that prescribe recognition; i.e., the dynamics of neuronal activity that represents the causes of sensory input. Here, we focus on a model whose hierarchical and dynamical structure enables simulated brains to recognise and predict sequences of sensory states. We first review these models and their inversion under a variational free-energy formulation. We then show that the brain has the necessary infrastructure to implement this inversion and present stimulations using synthetic birds that generate and recognise birdsongs.

Chen, C C, Henson, R N, Stephan, K E, Kilner, J M, & Friston, K J: Forward and backward connections in the brain: A DCM study of functional asymmetries. NeuroImage, 45(2):453-462.

In this paper, we provide evidence for functional asymmetries in forward and backward connections that define hierarchical architectures in the brain. We exploit the fact that modulatory or nonlinear influences of one neuronal system on another (i.e., effective connectivity) entail coupling between different frequencies. Functional asymmetry in forward and backward connections was addressed by comparing dynamic causal models of MEG responses induced by visual processing of normal and scrambled faces. We compared models with and without nonlinear (between-frequency) coupling in both forward and backward connections. Bayesian model comparison indicated that the best model had nonlinear forward and backward connections. Using the best model we then quantified frequency-specific causal influences mediating observed spectral responses. We found a striking asymmetry between forward and backward connections; in which high (gamma) frequencies in higher cortical areas suppressed low (alpha) frequencies in lower areas. This suppression was significantly greater than the homologous coupling in the forward connections. Furthermore, exactly the asymmetry was observed when we examined face-selective coupling (i.e., coupling under faces minus scrambled faces). These results highlight the importance of nonlinear coupling among brain regions and point to a functional asymmetry between forward and backward connections in the human brain that is consistent with anatomical and physiological evidence from animal studies. This asymmetry is also consistent with functional architectures implied by theories of perceptual inference in the brain, based on hierarchical generative models.

IEEE Trans. Autonomous Mental Development

Volume 1 - 2009

Issue 1

Asada, M, Hosoda, K, Kuniyoshi, Y, Ishiguro, H, Inui, T, Yoshikawa, Y, Ogino, M, & Yoshida, C, 2009: Cognitive developmental robotics: a survey IEEE Trans. Auton. Ment. Devel., 1(1):12-34.

Cognitive developmental robotics (CDR) aims to provide new understanding of how human's higher cognitive functions develop by means of a synthetic approach that developmentally constructs cognitive functions. The core idea of CDR is ldquophysical embodimentrdquo that enables information structuring through interactions with the environment, including other agents. The idea is shaped based on the hypothesized development model of human cognitive functions from body representation to social behavior. Along with the model, studies of CDR and related works are introduced, and discussion on the model and future issues are argued.

Lake, B M, Vallabha, G K, & McClelland, J L, 2009: Modeling unsupervised perceptual category learning IEEE Trans. Auton. Ment. Devel., 1(1):35-43.

During the learning of speech sounds and other perceptual categories, category labels are not provided, the number of categories is unknown, and the stimuli are encountered sequentially. These constraints provide a challenge for models, but they have been recently addressed in the online mixture estimation model of unsupervised vowel category learning (see Vallabha in the reference section). The model treats categories as Gaussian distributions, proposing both the number and the parameters of the categories. While the model has been shown to successfully learn vowel categories, it has not been evaluated as a model of the learning process. We account for several results: acquired distinctiveness between categories and acquired similarity within categories, a faster increase in discrimination for more acoustically dissimilar vowels, and gradual unsupervised learning of category structure in simple visual stimuli.

Nagai, Y, & Rohlfing, K J, 2009: Computational analysis of motionese: toward scaffolding robot action learning IEEE Trans. Auton. Ment. Devel., 1(1):44-54.

A difficulty in robot action learning is that robots do not know where to attend when observing action demonstration. Inspired by human parent-infant interaction, we suggest that parental action demonstration to infants, called motionese, can scaffold robot learning as well as infants'. Since infants' knowledge about the context is limited, which is comparable to robots, parents are supposed to properly guide their attention by emphasizing the important aspects of the action. Our analysis employing a bottom-up attention model revealed that motionese has the effects of highlighting the initial and final states of the action, indicating significant state changes in it, and underlining the properties of objects used in the action. Suppression and addition of parents' body movement and their frequent social signals to infants produced these effects. Our findings are discussed toward designing robots that can take advantage of parental teaching.

Rolf, M, Hanheide, M, & Rohlfing, K J, 2009: Attention via synchrony: making use of multimodal cues in social learning IEEE Trans. Auton. Ment. Devel., 1(1):55-67.

Infants learning about their environment are confronted with many stimuli of different modalities. Therefore, a crucial problem is how to discover which stimuli are related, for instance, in learning words. In making these multimodal "bindings," infants depend on social interaction with a caregiver to guide their attention towards relevant stimuli. The caregiver might, for example, visually highlight an object by shaking it while vocalizing the object's name. These cues are known to help structuring the continuous stream of stimuli. To detect and exploit them, we propose a model of bottom-up attention by multimodal signal-level synchrony. We focus on the guidance of visual attention from audio-visual synchrony informed by recent adult-infant interaction studies. Consequently, we demonstrate that our model is receptive to parental cues during child -directed tutoring. The findings discussed in this paper are consistent with recent results from developmental psychology but for the first time are obtained employing an objective, computational model. The presence of "multimodal motherese" is verified directly on the audio-visual signal. Lastly, we hypothesize how our computational model facilitates tutoring interaction and discuss its application in interactive learning scenarios, enabling social robots to benefit from adult-like tutoring.

Weng, J, & Luciw, M, 2009: Dually optimal neuronal layers: lobe component analysis IEEE Trans. Auton. Ment. Devel., 1(1):68-85.

Development imposes great challenges. Internal "cortical" representations must be autonomously generated from interactive experiences. The eventual quality of these developed representations is of course important. Additionally, learning must be as fast as possible to quickly derive better representation from limited experiences. Those who achieve both of these will have competitive advantages. We present a cortex-inspired theory called lobe component analysis (LCA) guided by the aforementioned dual criteria. A lobe component represents a high concentration of probability density of the neuronal input space. We explain how lobe components can achieve a dual-spatiotemporal ("best" and "fastest") optimality, through mathematical analysis, in which we describe how lobe components plasticity can be temporally scheduled to take into account the history of observations in the best possible way. This contrasts with using only the last observation in gradient-based adaptive learning algorithms. Since they are based on two cell-centered mechanisms - Hebbian learning and lateral inhibition - lobe components develop in-place, meaning every networked neuron is individually responsible for the learning of its signal-processing characteristics within its connected network environment. There is no need for a separate learning network. We argue that in -place learning algorithms will be crucial for real-world large-size developmental applications due to their simplicity, low computational complexity, and generality. Our experimental results show that the learning speed of the LCA algorithm is drastically faster than other Hebbian-based updating methods and independent component analysis algorithms, thanks to its dual optimality, and it does not need to use any second- or higher order statistics. We also introduce the new principle of fast learning from stable representation.

Pitti, A, Mori, H, Kouzuma, S, & Kuniyoshi, Y, 2009: Contingency perception and agency measure in visuo-motor spiking neural networks IEEE Trans. Auton. Ment. Devel., 1(1):86-97.

Agency is the sense that I am the cause or author of a movement. Babies develop early this feeling by perceiving the contingency between afferent (sensor) and efferent (motor) information. A comparator model is hypothesized to be associated with many brain regions to monitor and simulate the concordance between self-produced actions and their consequences. In this paper, we propose that the biological mechanism of spike timing-dependent plasticity, that synchronizes the neural dynamics almost everywhere in the central nervous system, constitutes the perfect algorithm to detect contingency in sensorimotor networks. The coherence or the dissonance in the sensorimotor information flow imparts then the agency level. In a head -neck-eyes robot, we replicate three developmental experiments illustrating how particular perceptual experiences can modulate the overall level of agency inside the system; i.e., (1) by adding a delay between proprioceptive and visual feedback information, (2) by facing a mirror, and (3) a person. We show that the system learns to discriminate animated objects (self-image and other persons) from other type of stimuli. This suggests a basic stage representing the self in relation to others from low-level sensorimotor processes. We discuss then the relevance of our findings with neurobiological evidences and development psychological observations for developmental robots.

Issue 2

Song, M, Liu, Y, Zhou, Y, Wang, K, Yu, C, & Jiang, T, 2009: Default network and intelligence difference IEEE Trans. Auton. Ment. Devel., 1(2):101-109.

In the last few years, many studies in the cognitive and system neuroscience found that a consistent network of brain regions, referred to as the default network, showed high levels of activity when no explicit task was performed. Some scientists believed that the resting state activity might reflect some neural functions that consolidate the past, stabilize brain ensembles, and prepare us for the future. Here, we modeled the default network as undirected weighted graph, and then used graph theory to investigate the topological properties of the default network of the two groups of people with different intelligence levels. We found that, in both groups, the posterior cingulate cortex showed the greatest degree in comparison to the other brain regions in the default network, and that the medial temporal lobes and cerebellar tonsils were topologically separations from the other brain regions in the default network. More importantly, we found that the strength of some functional connectivities and the global efficiency of the default network were significantly different between the superior intelligence group and the average intelligence group, which indicates that the functional integration of the default network might be related to the individual intelligent performance.

Dandurand, F, & Shultz, T R, 2009: Connectionist models of reinforcement, imitation, and instruction in learning to solve complex problems IEEE Trans. Auton. Ment. Devel., 1(2):110-121.

We compared computational models and human performance on learning to solve a high-level, planning -intensive problem. Humans and models were subjected to three learning regimes: reinforcement, imitation, and instruction. We modeled learning by reinforcement (rewards) using SARSA, a softmax selection criterion and a neural network function approximator; learning by imitation using supervised learning in a neural network; and learning by instructions using a knowledge-based neural network. We had previously found that human participants who were told if their answers were correct or not (a reinforcement group) were less accurate than participants who watched demonstrations of successful solutions of the task (an imitation group) and participants who read instructions explaining how to solve the task. Furthermore, we had found that humans who learn by imitation and instructions performed more complex solution steps than those trained by reinforcement. Our models reproduced this pattern of results.

Stoytchev, A, 2009: Some basic principles of developmental robotics IEEE Trans. Auton. Ment. Devel., 1(2):122-130.

This paper formulates five basic principles of developmental robotics. These principles are formulated based on some of the recurring themes in the developmental learning literature and in the author's own research. The five principles follow logically from the verification principle (postulated by Richard Sutton) which is assumed to be self-evident. This paper also gives an example of how these principles can be applied to the problem of autonomous tool use in robots.

Fuke, S, Ogino, M, & Asada, M, 2009: Acquisition of the head-centered peri-personal spatial representation found in VIP neuron IEEE Trans. Auton. Ment. Devel., 1(2):131-140.

Both body and visuo-spatial representations are supposed to be gradually acquired during the developmental process as described in cognitive and brain sciences. A typical example is face representation in a neuron (found in the ventral intraparietal (VIP) area) of which the function is not only to code for the location of visual stimuli in the head-centered reference frame, but also to connect visual sensation with tactile sensation. This paper presents a model that enables a robot to acquire such representation. The proprioception of arm posture is utilized as reference data through the "hand regard behavior," that is, the robot moves its hand in front of its face, and the self-organizing map (SOM) and Hebbian learning methods are applied. The simulation results are shown and discussions on the limitation of the current model and future issues are given.

Yu, C, Smith, L B, Shen, H, Pereira, A F, & Smith, T, 2009: Active information selection: visual attention through the hands IEEE Trans. Auton. Ment. Devel., 1(2):141-151.

An important goal in studying both human intelligence and artificial intelligence is to understand how a natural or an artificial learning system deals with the uncertainty and ambiguity of the real world. For a natural intelligence system such as a human toddler, the relevant aspects in a learning environment are only those that make contact with the learner's sensory system. In real-world interactions, what the child perceives critically depends on his own actions as these actions bring information into and out of the learner's sensory field. The present analyses indicate how, in the case of a toddler playing with toys, these perception-action loops may simplify the learning environment by selecting relevant information and filtering irrelevant information. This paper reports new findings using a novel method that seeks to describe the visual learning environment from a young child's point of view and measures the visual information that a child perceives in real-time toy play with a parent. The main results are 1) what the child perceives primarily depends on his own actions but also his social partner's actions; 2) manual actions, in particular, play a critical role in creating visual experiences in which one object dominates; 3) this selecting and filtering of visual objects through the actions of the child provides more constrained and clean input that seems likely to facilitate cognitive learning processes. These findings have broad implications for how one studies and thinks about human and artificial learning systems.

Issue 3

Baranes, A, & Oudeyer, P-Y, 2009: R-IAC: Robust intrinsically motivated exploration and active learning IEEE Trans. Auton. Ment. Devel., 1(3):155-169.

Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to self-organize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsically motivated learning heuristics could be viewed as active learning algorithms that are particularly suited for learning forward models in unprepared sensorimotor spaces with large unlearnable subspaces. Then, we introduce a novel formulation of IAC, called robust intelligent adaptive curiosity (R-IAC), and show that its performances as an intrinsically motivated active learning algorithm are far superior to IAC in a complex sensorimotor space where only a small subspace is neither unlearnable nor trivial. We also show results in which the learnt forward model is reused in a control scheme. Finally, an open source accompanying software containing these algorithms as well as tools to reproduce all the experiments presented in this paper is made publicly available.

Yong, C H, & Miikkulainen, R, 2009: Coevolution of role-based cooperation in multiagent systems IEEE Trans. Auton. Ment. Devel., 1(3):170-186.

In tasks such as pursuit and evasion, multiple agents need to coordinate their behavior to achieve a common goal. An interesting question is, how can such behavior be best evolved? A powerful approach is to control the agents with neural networks, coevolve them in separate subpopulations, and test them together in the common task. In this paper, such a method, called multiagent enforced subpopulations (multiagent ESP), is proposed and demonstrated in a prey-capture task. First, the approach is shown to be more efficient than evolving a single central controller for all agents. Second, cooperation is found to be most efficient through stigmergy, i.e., through role-based responses to the environment, rather than communication between the agents. Together these results suggest that role-based cooperation is an effective strategy in certain multiagent tasks.

Lyon, C, Sato, Y, Saunders, J, & Nehaniv, C L, 2009: What is needed for a robot to acquire grammar? Some underlying primitive mechanisms for the synthesis of linguistic ability IEEE Trans. Auton. Ment. Devel., 1(3):187-195.

A robot that can communicate with humans using natural language will have to acquire a grammatical framework. This paper analyses some crucial underlying mechanisms that are needed in the construction of such a framework. The work is inspired by language acquisition in infants, but it also draws on the emergence of language in evolutionary time and in ontogenic (developmental) time. It focuses on issues arising from the use of real language with all its evolutionary baggage, in contrast to an artificial communication system, and describes approaches to addressing these issues. We can deconstruct grammar to derive underlying primitive mechanisms, including serial processing, segmentation, categorization, compositionality, and forward planning. Implementing these mechanisms are necessary preparatory steps to reconstruct a working syntactic/semantic/pragmatic processor which can handle real language. An overview is given of our own initial experiments in which a robot acquires some basic linguistic capacity via interacting with a human.

Stevens, G T, & Zhang, J, 2009: A dynamic systems model of infant attachment IEEE Trans. Auton. Ment. Devel., 1(3):196-207.

Attachment, or the emotional tie between an infant and its primary caregiver, has been modeled as a homeostatic process by Bowlby's (Attachment and Loss, 1969; Anxiety and Depression, 1973; Loss: Sadness and Depression, 1980). Evidence from neurophysiology has grounded such mechanism of infant attachment to the dynamic interplay between an opioid-based proximity-seeking mechanism and an NE-based arousal system that are regulated by external stimuli (interaction with primary caregiver and the environment). Here, we model such attachment mechanism and its dynamic regulation by a coupled system of ordinary differential equations. We simulated the characteristic patterns of infant behaviors in the strange situation procedure, a common instrument for assessing the quality of attachment outcomes ("types") for infants at about one year of age. We also manipulated the parameters of our model to account for neurochemical adaptation, and to allow for caregiver style (such as responsiveness and other factors) and temperamental factor (such as reactivity and readiness in self-regulation) to be incorporated into the homeostatic regulation model of attachment dynamics. Principle component analysis revealed the characteristic regions in the parameter space that correspond to secure, anxious, and avoidant attachment typology. Implications from this kind of approach are discussed.

Issue 4

Ishihara, H, Yoshikawa, Y, Miura, K, & Asada, M, 2009: How caregiver's anticipation shapes infant's vowel through mutual imitation IEEE Trans. Auton. Ment. Devel., 1(4):217-225.

The mechanism of infant vowel development is a fundamental issue of human cognitive development that includes perceptual and behavioral development. This paper models the mechanism of imitation underlying caregiver-infant interaction by focusing on potential roles of the caregiver's imitation in guiding infant vowel development. Proposed imitation mechanism is constructed with two kinds of the caregiver's possible biases in mind. The first is what we call "sensorimotor magnets," in which a caregiver perceives and imitates infant vocalizations as more prototypical ones as mother-tongue vowels. The second is based on what we call "automirroring bias," by which the heard vowel is much closer to the expected vowel because of the anticipation being imitated. Computer simulation results of caregiver-infant interaction show the sensorimotor magnets help form small clusters and the automirroring bias shapes these clusters to become clearer vowels in association with the sensorimotor magnets.

Schillingmann, L, Wrede, B, & Rohlfing, K J, 2009: A computational model of acoustic packaging IEEE Trans. Auton. Ment. Devel., 1(4):226-237.

In order to learn and interact with humans, robots need to understand actions and make use of language in social interactions. The use of language for the learning of actions has been emphasized by Hirsh-Pasek and Golinkoff (MIT Press, 1996), introducing the idea of acoustic packaging. Accordingly, it has been suggested that acoustic information, typically in the form of narration, overlaps with action sequences and provides infants with a bottom-up guide to attend to relevant parts and to find structure within them. In this article, we present a computational model of the multimodal interplay of action and language in tutoring situations. For our purpose, we understand events as temporal intervals, which have to be segmented in both, the visual and the acoustic modality. Our acoustic packaging algorithm merges the segments from both modalities based on temporal overlap. First evaluation results show that acoustic packaging can provide a meaningful segmentation of action demonstration within tutoring behavior. We discuss our findings with regard to a meaningful action segmentation. Based on our future vision of acoustic packaging we point out a roadmap describing the further development of acoustic packaging and interactive scenarios it is employed in.

Solgi, M, & Weng, J, 2009: Developmental stereo: emergence of disparity preference in models of the visual cortex IEEE Trans. Auton. Ment. Devel., 1(4):238-252.

How our brains develop disparity tuned V1 and V2 cells and then integrate binocular disparity into 3-D perception of the visual world is still largely a mystery. Moreover, computational models that take into account the role of the 6-layer architecture of the laminar cortex and temporal aspects of visual stimuli are elusive for stereo. In this paper, we present cortex-inspired computational models that simulate the development of stereo receptive fields, and use developed disparity sensitive neurons to estimate binocular disparity. Not only do the results show that the use of top-down signals in the form of supervision or temporal context greatly improves the performance of the networks, but also results in biologically compatible cortical maps - the representation of disparity selectivity is grouped, and changes gradually along the cortex. To our knowledge, this work is the first neuromorphic, end-to-end model of laminar cortex that integrates temporal context to develop internal representation, and generates accurate motor actions in the challenging problem of detecting disparity in binocular natural images. The networks reach a subpixel average error in regression, and 0.90 success rate in classification, given limited resources.

Spratling, M W, 2009: Learning posture invariant spatial representations through temporal correlations IEEE Trans. Auton. Ment. Devel., 1(4):253-263.

A hierarchical neural network model is used to learn, without supervision, sensory-sensory coordinate transformations like those believed to be encoded in the dorsal pathway of the cerebral cortex. The resulting representations of visual space are invariant to eye orientation, neck orientation, or posture in general. These posture invariant spatial representations are learned using the same mechanisms that have previously been proposed to operate in the cortical ventral pathway to learn object representation that are invariant to translation, scale, orientation, or viewpoint in general. This model thus suggests that the same mechanisms of learning and development operate across multiple cortical hierarchies.

Volume 2 - 2010

Issue 1

Oudeyer, P-Y, 2010: On the impact of robotics in behavioral and cognitive sciences: from insect navigation to human cognitive development IEEE Trans. Auton. Ment. Devel., 2(1):2-16.

The interaction of robotics with behavioral and cognitive sciences has always been tight. As often described in the literature, the living has inspired the construction of many robots. Yet, in this article, we focus on the reverse phenomenon: building robots can impact importantly the way we conceptualize behavior and cognition in animals and humans. This article presents a series of paradigmatic examples spanning from the modelling of insect navigation, the experimentation of the role of morphology to control locomotion, the development of foundational representations of the body and of the self/other distinction, the self -organization of language in robot societies, and the use of robots as therapeutic tools for children with developmental disorders. Through these examples, I review the way robots can be used as operational models confronting specific theories to reality, or can be used as proof of concepts, or as conceptual exploration tools generating new hypotheses, or used as experimental set ups to uncover particular behavioral properties in animals or humans, or even used as therapeutic tools. Finally, I discuss the fact that in spite of its role in the formation of many fundamental theories in behavioral and cognitive sciences, the use of robots is far from being accepted as a standard tool and contributions are often forgotten, leading to regular rediscoveries and slowing down cumulative progress. The article concludes by highlighting the high priority of further historical and epistemological work.

Gordon, S M, Kawamura, K, & Wilkes, D M, 2010: Neuromorphically inspired appraisal-based decision making in a cognitive robot IEEE Trans. Auton. Ment. Devel., 2(1):17-39.

Real-time search techniques have been used extensively in the areas of task planning and decision making. In order to be effective, however, these techniques require task-specific domain knowledge in the form of heuristic or utility functions. These functions can either be embedded by the programmer, or learned by the system over time. Unfortunately, many of the reinforcement learning techniques that might be used to acquire this knowledge generally demand static feature vector representations defined a priori. Current neurobiological research offers key insights into how the cognitive processing of experience may be used to alleviate dependence on preprogrammed heuristic functions, as well as on static feature representations. Research also suggests that internal appraisals are influenced by such processing and that these appraisals integrate with the cognitive decision-making process, providing a range of useful and adaptive control signals that focus, inform, and mediate deliberation. This paper describes a neuromorphically inspired approach for cognitively processing experience in order to: 1) abstract state information; 2) learn utility functions over this state abstraction; and 3) learn to tradeoff between performance and deliberation time.

Sumioka, H, Yoshikawa, Y, & Asada, M, 2010: Reproducing interaction contingency toward open-ended development of social actions: case study on joint attention IEEE Trans. Auton. Ment. Devel., 2(1):40-50.

How can human infants gradually socialize through interaction with their caregivers? This paper presents a learning mechanism that incrementally acquires social actions by finding and reproducing the contingency in interaction with a caregiver. A contingency measure based on transfer entropy is used to select the appropriate pairs of variables to be associated to acquire social actions from the set of all possible pairs. Joint attention behavior is tested to examine the development of social actions caused by responding to changes in caregiver behavior due to reproducing the found contingency. The results of computer simulations of human-robot interaction indicate that a robot acquires a series of actions related to joint attention such as gaze following and alternation in an order that almost matches the infant development of joint attention found in developmental psychology. The difference in the order between them is discussed based on the analysis of robot behavior, and then future issues are given.

Thivierge, J-P, 2010: Computational developmental neuroscience: capturing developmental trajectories from genes to cognition IEEE Trans. Auton. Ment. Devel., 2(1):51-58.

Over the course of development, the central nervous system grows into a complex set of structures that ultimately controls our experiences and interactions with the world. To understand brain development, researchers must disentangle the contributions of genes, neural activity, synaptic plasticity, and intrinsic noise in guiding the growth of axons between brain regions. Here, we examine how computer simulations can shed light on neural development, making headway towards systems that self-organize into fully autonomous models of the brain. We argue that these simulations should focus on the "open-ended" nature of development, rather than a set of deterministic outcomes.

Issue 2

Singh, S, Lewis, R L, Barto, A G, & Sorg, J, 2010: Intrinsically motivated reinforcement learning: an evolutionary perspective IEEE Trans. Auton. Ment. Devel., 2(2):70-82.

There is great interest in building intrinsic motivation into artificial systems using the reinforcement learning framework. Yet, what intrinsic motivation may mean computationally, and how it may differ from extrinsic motivation, remains a murky and controversial subject. In this paper, we adopt an evolutionary perspective and define a new optimal reward framework that captures the pressure to design good primary reward functions that lead to evolutionary success across environments. The results of two computational experiments show that optimal primary reward signals may yield both emergent intrinsic and extrinsic motivation. The evolutionary perspective and the associated optimal reward framework thus lead to the conclusion that there are no hard and fast features distinguishing intrinsic and extrinsic reward computationally. Rather, the directness of the relationship between rewarding behavior and evolutionary success varies along a continuum.

Niekum, S, Barto, A G, & Spector, L, 2010: Genetic programming for reward function search IEEE Trans. Auton. Ment. Devel., 2(2):83-90.

Reward functions in reinforcement learning have largely been assumed given as part of the problem being solved by the agent. However, the psychological notion of intrinsic motivation has recently inspired inquiry into whether there exist alternate reward functions that enable an agent to learn a task more easily than the natural task-based reward function allows. This paper presents a genetic programming algorithm to search for alternate reward functions that improve agent learning performance. We present experiments that show the superiority of these reward functions, demonstrate the possible scalability of our method, and define three classes of problems where reward function search might be particularly useful: distributions of environments, nonstationary environments, and problems with short agent lifetimes.

Butko, N J, & Movellan, J R, 2010: Infomax control of eye movements IEEE Trans. Auton. Ment. Devel., 2(2):91-107.

Recently, infomax methods of optimal control have begun to reshape how we think about active information gathering. We show how such methods can be used to formulate the problem of choosing where to look. We show how an optimal eye movement controller can be learned from subjective experiences of information gathering, and we explore in simulation properties of the optimal controller. This controller outperforms other eye movement strategies proposed in the literature. The learned eye movement strategies are tailored to the specific visual system of the learner - we show that agents with different kinds of eyes should follow different eye movement strategies. Then we use these insights to build an autonomous computer program that follows this approach and learns to search for faces in images faster than current state-of-the-art techniques. The context of these results is search in static scenes, but the approach extends easily, and gives further efficiency gains, to dynamic tracking tasks. A limitation of infomax methods is that they require probabilistic models of uncertainty of the sensory system, the motor system, and the external world. In the final section of this paper, we propose future avenues of research by which autonomous physical agents may use developmental experience to subjectively characterize the uncertainties they face.

Cakmak, M, Chao, C, & Thomaz, A L, 2010: Designing interactions for robot active learners IEEE Trans. Auton. Ment. Devel., 2(2):108-118.

This paper addresses some of the problems that arise when applying active learning to the context of human-robot interaction (HRI). Active learning is an attractive strategy for robot learners because it has the potential to improve the accuracy and the speed of learning, but it can cause issues from an interaction perspective. Here we present three interaction modes that enable a robot to use active learning queries. The three modes differ in when they make queries: the first makes a query every turn, the second makes a query only under certain conditions, and the third makes a query only when explicitly requested by the teacher. We conduct an experiment in which 24 human subjects teach concepts to our upper-torso humanoid robot, Simon, in each interaction mode, and we compare these modes against a baseline mode using only passive supervised learning. We report results from both a learning and an interaction perspective. The data show that the three modes using active learning are preferable to the mode using passive supervised learning both in terms of performance and human subject preference, but each mode has advantages and disadvantages. Based on our results, we lay out several guidelines that can inform the design of future robotic systems that use active learning in an HRI setting.

Merrick, K E, 2010: A comparative study of value systems for self-motivated exploration and learning by robots IEEE Trans. Auton. Ment. Devel., 2(2):119-131.

A range of different value systems have been proposed for self-motivated agents, including biologically and cognitively inspired approaches. Likewise, these value systems have been integrated with different behavioral systems including reflexive architectures, reward-based learning and supervised learning. However, there is little literature comparing the performance of different value systems for motivating exploration and learning by robots. This paper proposes a neural network architecture for integrating different value systems with reinforcement learning. It then presents an empirical evaluation and comparison of four value systems for motivating exploration by a Lego Mindstorms NXT robot. Results reveal the different exploratory properties of novelty-seeking motivation, interest and competence-seeking motivation.

Vigorito, C M, & Barto, A G, 2010: Intrinsically motivated hierarchical skill learning in structured environments IEEE Trans. Auton. Ment. Devel., 2(2):132-143.

We present a framework for intrinsically motivated developmental learning of abstract skill hierarchies by reinforcement learning agents in structured environments. Long-term learning of skill hierarchies can drastically improve an agent's efficiency in solving ensembles of related tasks in a complex domain. In structured domains composed of many features, understanding the causal relationships between actions and their effects on different features of the environment can greatly facilitate skill learning. Using Bayesian network structure (learning techniques and structured dynamic programming algorithms), we show that reinforcement learning agents can learn incrementally and autonomously both the causal structure of their environment and a hierarchy of skills that exploit this structure. Furthermore, we present a novel active learning scheme that employs intrinsic motivation to maximize the efficiency with which this structure is learned. As new structure is acquired using an agent's current set of skills, more complex skills are learned, which in turn allow the agent to discover more structure, and so on. This bootstrapping property makes our approach a developmental learning process that results in steadily increasing domain knowledge and behavioral complexity as an agent continues to explore its environment.

Issue 3

Zhang, Y, & Weng, J, 2010: Spatio-temporal multimodal developmental learning IEEE Trans. Auton. Ment. Devel., 2(3):149-166.

It is elusive how the skull-enclosed brain enables spatio-temporal multimodal developmental learning. By multimodal, we mean that the system has at least two sensory modalities, e.g., visual and auditory in our experiments. By spatio-temporal, we mean that the behavior from the system depends not only on the spatial pattern in the current sensory inputs, but also those of the recent past. Traditional machine learning requires humans to train every module using hand-transcribed data, using handcrafted symbols among modules, and hand-link modules internally. Such a system is limited by a static set of symbols and static module performance. A key characteristic of developmental learning is that the "brain" is "skull-closed" after birth - not directly manipulatable by the system designer - so that the system can continue to learn incrementally without the need for reprogramming. In this paper, we propose an architecture for multimodal developmental learning - parallel modality pathways all situate between a sensory end and the motor end. Motor signals are not only used as output behaviors, but also as part of input to all the related pathways. For example, the proposed developmental learning does not use silence as cut points for speech processing or motion static points as key frames for visual processing.

Cangelosi, A, Metta, G, Sagerer, G, Nolfi, S, Nehaniv, C, Fischer, K, Tani, J, Belpaeme, T, Sandini, G, Nori, F, Fadiga, L, Wrede, B, Rohlfing, K, Tuci, E, Dautenhahn, K, Saunders, J, & Zeschel, A, 2010: Integration of action and language knowledge: a roadmap for developmental robotics IEEE Trans. Auton. Ment. Devel., 2(3):167-195.

This position paper proposes that the study of embodied cognitive agents, such as humanoid robots, can advance our understanding of the cognitive development of complex sensorimotor, linguistic, and social learning skills. This in turn will benefit the design of cognitive robots capable of learning to handle and manipulate objects and tools autonomously, to cooperate and communicate with other robots and humans, and to adapt their abilities to changing internal, environmental, and social conditions. Four key areas of research challenges are discussed, specifically for the issues related to the understanding of: 1) how agents learn and represent compositional actions; 2) how agents learn and represent compositional lexica; 3) the dynamics of social interaction and learning; and 4) how compositional action and language representations are integrated to bootstrap the cognitive system. The review of specific issues and progress in these areas is then translated into a practical roadmap based on a series of milestones. These milestones provide a possible set of cognitive robotics goals and test scenarios, thus acting as a research roadmap for future work on cognitive developmental robotics.

Miao, J, Qing, L, Zou, B, Duan, L, & Gao, W, 2010: Top-down gaze movement control in target search using population cell coding of visual context IEEE Trans. Auton. Ment. Devel., 2(3):196-215.

Visual context plays an important role in humans' top-down gaze movement control for target searching. Exploring the mental development mechanism in terms of incremental visual context encoding by population cells is an interesting issue. This paper presents a biologically inspired computational model. The visual contextual cues were used in this model for top-down eye-motion control on searching targets in images. We proposed a population cell coding mechanism for visual context encoding and decoding. The model was implemented in a neural network system. A developmental learning mechanism was simulated in this system by dynamically generating new coding neurons to incrementally encode visual context during training. The encoded context was decoded with population neurons in a top-down mode. This allowed the model to control the gaze motion to the centers of the targets. The model was developed with pursuing low encoding quantity and high target locating accuracy. Its performance has been evaluated by a set of experiments to search different facial objects in a human face image set. Theoretical analysis and experimental results show that the proposed visual context encoding algorithm without weight updating is fast, efficient and stable, and the population-cell coding generally performs better than single-cell coding and k-nearest-neighbor (k-NN)-based coding.

Rolf, M, Steil, J J, & Gienger, M, 2010: Goal babbling permits direct learning of inverse kinematics. IEEE Trans. Auton. Ment. Devel., 2(3):216-229.

We present an approach to learn inverse kinematics of redundant systems without prior- or expert -knowledge. The method allows for an iterative bootstrapping and refinement of the inverse kinematics estimate. The essential novelty lies in a path-based sampling approach: we generate training data along paths, which result from execution of the currently learned estimate along a desired path towards a goal. The information structure thereby induced enables an efficient detection and resolution of inconsistent samples solely from directly observable data. We derive and illustrate the exploration and learning process with a low-dimensional kinematic example that provides direct insight into the bootstrapping process. We further show that the method scales for high dimensional problems, such as the Honda humanoid robot or hyperredundant planar arms with up to 50 degrees of freedom.

Schmidhuber, J, 2010: Formal theory of creativity, fun, and intrinsic motivation (1990-2010) IEEE Trans. Auton. Ment. Devel., 2(3):230-247.

The simple, but general formal theory of fun and intrinsic motivation and creativity (1990-2010) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old, but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, and humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown, but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical, but nonoptimal implementations (1991, 1995, and 1997-2002) are reviewed, as well as several recent variants by others (2005-2010). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.

Luciw, M, & Weng, J, 2010: Top-down connections in self-organizing Hebbian networks: topographic class grouping IEEE Trans. Auton. Ment. Devel., 2(3):248-261.

We investigate the effects of top-down input connections from a later layer to an earlier layer in a biologically inspired network. The incremental learning method combines optimal Hebbian learning for stable feature extraction, competitive lateral inhibition for sparse coding, and neighborhood-based self -organization for topographic map generation. The computational studies reported indicate top-down connections encourage features that reduce uncertainty at the lower layer with respect to the features in the higher layer, enable relevant information to be uncovered at the lower layer so that irrelevant information can preferentially be discarded [a necessary property for autonomous mental development ( AMD)], and cause topographic class grouping. Class groups have been observed in cortex, e.g., in the fusiform face area and parahippocampal place area. This paper presents the first computational account, as far as we know, explaining these three phenomena by a single biologically inspired network. Visual recognition experiments show that top-down-enabled networks reduce error rates for limited network sizes, show class grouping, and can refine lower layer representation after new conceptual information is learned. These findings may shed light on how the brain self-organizes cortical areas, and may contribute to computational understanding of how autonomous agents might build and maintain an organized internal representation over its lifetime of experiences.

Issue 4

Maier, W, & Steinbach, E, 2010: A probabilistic appearance representation and its application to surprise detection in cognitive robots IEEE Trans. Auton. Ment. Devel., 2(4):267-281.

In this work, we present a novel probabilistic appearance representation and describe its application to surprise detection in the context of cognitive mobile robots. The luminance and chrominance of the environment are modeled by Gaussian distributions which are determined from the robot's observations using Bayesian inference. The parameters of the prior distributions over the mean and the precision of the Gaussian models are stored at a dense series of viewpoints along the robot's trajectory. Our probabilistic representation provides us with the expected appearance of the environment and enables the robot to reason about the uncertainty of the perceived luminance and chrominance. Hence, our representation provides a framework for the detection of surprising events, which facilitates attentional selection. In our experiments, we compare the proposed approach with surprise detection based on image differencing. We show that our surprise measure is a superior detector for novelty estimation compared to the measure provided by image differencing.

Wyatt, J L, Aydemir, A, Brenner, M, Hanheide, M, Hawes, N, Jensfelt, P, Kristan, M, Kruijff, G M, Lison, P, Pronobis, A, Sjoöö, K, Vrecko, A, Zender, H, Zillich, M, & Skocaj, D, 2010: Self-understanding and self-extension: a systems and representational approach IEEE Trans. Auton. Ment. Devel., 2(4):282-303.

There are many different approaches to building a system that can engage in autonomous mental development. In this paper, we present an approach based on what we term self-understanding, by which we mean the explicit representation of and reasoning about what a system does and does not know, and how that knowledge changes under action. We present an architecture and a set of representations used in two robot systems that exhibit a limited degree of autonomous mental development, which we term self -extension. The contributions include: representations of gaps and uncertainty for specific kinds of knowledge, and a goal management and planning system for setting and achieving learning goals.

Hoffmann, M, Marques, H, Arieta, A, Sumioka, H, Lungarella, M, & Pfeifer, R, 2010: Body schema in robotics: a review IEEE Trans. Auton. Ment. Devel., 2(4):304-324.

How is our body imprinted in our brain? This seemingly simple question is a subject of investigations of diverse disciplines, psychology, and philosophy originally complemented by neurosciences more recently. Despite substantial efforts, the mysteries of body representations are far from uncovered. The most widely used notions-body image and body schema-are still waiting to be clearly defined. The mechanisms that underlie body representations are coresponsible for the admiring capabilities that humans or many mammals can display: combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools. These features are also desirable in robots. This paper surveys the body representations in biology from a functional or computational perspective to set ground for a review of the concept of body schema in robotics. First, we examine application-oriented research: how a robot can improve its capabilities by being able to automatically synthesize, extend, or adapt a model of its body. Second, we summarize the research area in which robots are used as tools to verify hypotheses on the mechanisms underlying biological body representations. We identify trends in these research areas and propose future research directions.

Morse, A.F.; de Greeff, J.; Belpeame, T.; Cangelosi, A, 2010: Epigenetic robotics architecture (ERA) IEEE Trans. Auton. Ment. Devel., 2(4):325-339.

In this paper, we discuss the requirements of cognitive architectures for epigenetic robotics, and highlight the wider role that they can play in the development of the cognitive sciences. We discuss the ambitious goals of ongoing development, scalability, concept use and transparency, and introduce the epigenetic robotics architecture (ERA) as a framework guiding modeling efforts. A formal implementation is provided, demonstrated, and discussed in terms of meeting these goals. Extensions of the architecture are also introduced and we show how the dynamics of resulting models can transparently account for a wide range of psychological phenomena, without task dependant tuning, thereby making progress in all of the goal areas we highlight.

Bellas, F, Duro, R J, Faina, A, & Souto, D, 2010: Multilevel Darwinist brain (MDB): artificial evolution in a cognitive architecture for real robots IEEE Trans. Auton. Ment. Devel., 2(4):340-354.

The multilevel Darwinist brain (MDB) is a cognitive architecture that follows an evolutionary approach to provide autonomous robots with lifelong adaptation. It has been tested in real robot on-line learning scenarios obtaining successful results that reinforce the evolutionary principles that constitute the main original contribution of the MDB. This preliminary work has lead to a series of improvements in the computational implementation of the architecture so as to achieve realistic operation in real time, which was the biggest problem of the approach due to the high computational cost induced by the evolutionary algorithms that make up the MDB core. The current implementation of the architecture is able to provide an autonomous robot with real time learning capabilities and the capability for continuously adapting to changing circumstances in its world, both internal and external, with minimal intervention of the designer. This paper aims at providing an overview or the architecture and its operation and defining what is required in the path towards a real cognitive robot following a developmental strategy. The design, implementation and basic operation of the MDB cognitive architecture are presented through some successful real robot learning examples to illustrate the validity of this evolutionary approach.

Hülse, M, McBride, S, Law, J, & Lee, M, 2010: Integration of active vision and reaching from a developmental robotics perspective IEEE Trans. Auton. Ment. Devel., 2(4):355-367.

Inspired by child development and brain research, we introduce a computational framework which integrates robotic active vision and reaching. Essential elements of this framework are sensorimotor mappings that link three different computational domains relating to visual data, gaze control, and reaching. The domain of gaze control is the central computational substrate that provides, first, a systematic visual search and, second, the transformation of visual data into coordinates for potential reach actions. In this respect, the representation of object locations emerges from the combination of sensorimotor mappings. The framework is tested in the form of two different architectures that perform visually guided reaching. Systematic experiments demonstrate how visual search influences reaching accuracy. The results of these experiments are discussed with respect to providing a reference architecture for developmental learning in humanoid robot systems.

Kraft, D, Detry, R, Pugeault, N, Baseski, E, Guerin, F, Piater, J H, & Krüger, N, 2010: Development of object and grasping knowledge by robot exploration IEEE Trans. Auton. Ment. Devel., 2(4):368-383.

We describe a bootstrapping cognitive robot system that - mainly based on pure exploration - acquires rich object representations and associated object-specific grasp affordances. Such bootstrapping becomes possible by combining innate competences and behaviors by which the system gradually enriches its internal representations, and thereby develops an increasingly mature interpretation of the world and its ability to act within it. We compare the system's prior competences and developmental progress with human innate competences and developmental stages of infants.

Volume 3 - 2011

Issue 1

Shen, Q, Kose-Bagci, H, Saunders, J, & Dautenhahn, K, 2011: The impact of participants' beliefs on motor interference and motor coordination in human-humanoid interactions IEEE Trans. Auton. Ment. Devel., 3(1):6-16.

This study compared the responses of human participants studying motor interference and motor coordination when they were interacting with three different types of visual stimuli: a humanoid robot, a pendulum, and a virtual moving dot. Participants' responses indicated that participants' beliefs about the engagement of the robot affected the elicitation of the motor interference effects. Together with research supporting the importance of other elements of robot appearance and behavior, such as bottom-up effects and biological motion profile, we hypothesize that it may be the overall perception (in this study, by the term "overall perception," we mean the human observer's overall perception of the robot in terms of appearance, motion, and observer's beliefs) of a robot as a "social entity" instead of any individual appearance or motion feature that is critical to elicit the interference effect in human-humanoid interaction. Moreover, motor coordination responses indicated that the participants tended to synchronize with agents with better overall perception, which were generally in-line with the above hypothesis. Based on all the results from this experimental study, the authors suggest that a humanoid robot with good overall perception as a "social entity" may facilitate "engaging" interactions with a human.

Tikhanoff, V, Cangelosi, A, & Metta, G, 2011: Integration of speech and action in humanoid robots: iCub simulation experiments IEEE Trans. Auton. Ment. Devel., 3(1):17-29.

Building intelligent systems with human level competence is the ultimate grand challenge for science and technology in general, and especially for cognitive developmental robotics. This paper proposes a new approach to the design of cognitive skills in a robot able to interact with, and communicate about, the surrounding physical world and manipulate objects in an adaptive manner. The work is based on robotic simulation experiments showing that a humanoid robot (iCub platform) is able to acquire behavioral, cognitive, and linguistic skills through individual and social learning. The robot is able to learn to handle and manipulate objects autonomously, to understand basic instructions, and to adapt its abilities to changes in internal and environmental conditions.

Andry, P, Blanchard, A, & Gaussier, P, 2011: Using the rhythm of nonverbal human-robot interaction as a signal for learning IEEE Trans. Auton. Ment. Devel., 3(1):30-42.

Human-robot interaction is a key issue in order to build robots for everyone. The difficulty for people to understand how robots work and how they must be controlled will be one of the main limits for broad robotics. In this paper, we study a new way of interacting with robots without needing to understand how robots work or to give them explicit instructions. This work is based on psychological data showing that synchronization and rhythm are very important features for pleasant interaction. We propose a biologically inspired architecture using rhythm detection to build an internal reward for learning. After showing the results of keyboard interactions, we present and discuss the results of real human-robots (Aibo and Nao) interactions. We show that our minimalist control architecture allows the discovery and learning of arbitrary sensorimotor associations games with expert users. With nonexpert users, we show that using only the rhythm information is not sufficient for learning all the associations due to the different strategies used by the human. Nevertheless, this last experiment shows that the rhythm is still allowing the discovery of subsets of associations, being one of the promising signal of tomorrow social applications.

Chinellato, E, Antonelli, M, Grzyb, B J, & del Pobil, A P, 2011: Implicit sensorimotor mapping of the peripersonal space by gazing and reaching IEEE Trans. Auton. Ment. Devel., 3(1):43-53.

Primates often perform coordinated eye and arm movements, contextually fixating and reaching towards nearby objects. This combination of looking and reaching to the same target is used by infants to establish an implicit visuomotor representation of the peripersonal space, useful for both oculomotor and arm motor control. In this work, taking inspiration from such behavior and from primate visuomotor mechanisms, a shared sensorimotor map of the environment, built on a radial basis function framework, is configured and trained by the coordinated control of eye and arm movements. Computational results confirm that the approach seems especially suitable for the problem at hand, and for its implementation on a real humanoid robot. By exploratory gazing and reaching actions, either free or goal-based, the artificial agent learns to perform direct and inverse transformations between stereo vision, oculomotor, and joint-space representations. The integrated sensorimotor map that allows to contextually represent the peripersonal space through different vision and motor parameters is never made explicit, but rather emerges thanks to the interaction of the agent with the environment.

Goerick, C, 2011: Towards an understanding of hierarchical architectures IEEE Trans. Auton. Ment. Devel., 3(1):54-63.

Cognitive systems research aims to understand how cognitive abilities can be created in artificial systems. One key issue is the architecture of the system. It organizes the interplay between the different system elements and thus, determines the principle limits for the performance of the system. In this contribution, we focus on important properties of hierarchical cognitive systems. Therefore, we first present a framework for modeling hierarchical systems. Based on this framework, we formulate and discuss some crucial issues that should be treated explicitly in the design of a system. On this basis, we analyze and compare several well-established cognitive architectures with respect to their internal structure.

Yorita, A, & Kubota, N, 2011: Cognitive development in partner robots for information support to elderly people IEEE Trans. Auton. Ment. Devel., 3(1):64-73.

This paper discusses an utterance system based on the associative memory of partner robots developed through interaction with people. Human interaction based on gestures is quite important to the expression of natural communication, and the meaning of gestures can be understood through intentional interactions with a human. We therefore propose a method for associative learning based on intentional interaction and conversation that can realize such natural communication. Steady-state genetic algorithms (SSGA) are applied in order to detect the human face and objects via image processing. Spiking neural networks are applied in order to memorize the spatio-temporal patterns of human hand motions and various relationships among the perceptual information that is conveyed. The experimental results show that the proposed method can refine the relationships among this varied perceptual information that can then inform an updated relationship to natural communication with a human. We also present methods of assisting memory and assessing a human's state.

Zibner, S K U, Faubel, C, Iossifidis, I, & Schoner, G, 2011: Dynamic neural fields as building blocks of a cortex-inspired architecture for robotic scene representation IEEE Trans. Auton. Ment. Devel., 3(1):74-91.

Based on the concepts of dynamic field theory (DFT), we present an architecture that autonomously generates scene representations by controlling gaze and attention, creating visual objects in the foreground, tracking objects, reading them into working memory, and taking into account their visibility. At the core of this architecture are three-dimensional dynamic neural fields (DNFs) that link feature to spatial information. These three-dimensional fields couple into lower dimensional fields, which provide the links to the sensory surface and to the motor systems. We discuss how DNFs can be used as building blocks for cognitive architectures, characterize the critical bifurcations in DNFs, as well as the possible coupling structures among DNFs. In a series of robotic experiments, we demonstrate how the DNF architecture provides the core functionalities of a scene representation.

Begum, M, & Karray, F, 2011: Visual attention for robotic cognition: a survey IEEE Trans. Auton. Ment. Devel., 3(1):92-105.

The goal of the cognitive robotics research is to design robots with human-like cognition (albeit reduced complexity) in perception, reasoning, action planning, and decision making. Such a venture of cognitive robotics has developed robots with redundant number of sensors and actuators in order to perceive the world and act up on it in a human-like fashion. A major challenge to deal with these robots is managing the enormous amount of information continuously arriving through multiple sensors. The primates master this information management skill through their custom-built attention mechanism. Mimicking the attention behavior of the primates, therefore, has gained tremendous popularity in robotic research in the recent years (Bar-Cohen , Biologically Inspired Intelligent Robots, 2003, and B. Webb, Biorobotics, 2003). The difficulties of redundant information management, however, is the most severe in case of visual perception of the robots. Even a moderate size image of the natural scene generally contains enough visual information to easily overload the on-line decision making process of an autonomous robot. Modeling primates-like visual attention mechanism for the robot, therefore, is becoming more popular among the robotic researchers. A visual attention model enables the robot to selectively (and autonomously) choose a "behaviorally relevant" segment of visual information for further processing while relative exclusion of the others. This paper sheds light on the ongoing journey of robotics research to achieve a visual attention model which will serve as a component of cognition of the modern-day robots.

Preprints in press as of March 2011

Cowell, R, & French, R, 2011: Noise and the emergence of rules in category learning: a connectionist model IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

We present a neural network model of category learning that addresses the question of how rules for category membership are acquired. The architecture of the model comprises a set of statistical learning synapses and a set of rule-learning synapses, whose weights, crucially, emerge from the statistical network. The network is implemented with a neurobiologically plausible Hebbian learning mechanism. The statistical weights form category representations on the basis of perceptual similarity, whereas the rule weights gradually extract rules from the information contained in the statistical weights. These rules are weightings of individual features; weights are stronger for features that convey more information about category membership. The most significant contribution of this model is that it relies on a novel mechanism involving feeding noise through the system to generate these rules. We demonstrate that the model predicts a cognitive advantage in classifying perceptually ambiguous stimuli over a system that relies only on perceptual similarity. In addition, we simulate reaction times from an experiment by Thibaut et al. (1998), in which both perceptual (i.e., statistical) and rule based information are available for the classification of perceptual stimuli.

Veale, R, Schermerhorn, P, & Scheutz, M, 2011: Temporal, environmental, and social constraints of word-referent learning in young infants: a neurorobotic model of multimodal habituation IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

Infants are able to adaptively associate auditory stimuli with visual stimuli even in their first year of life, as demonstrated by multimodal habituation studies. Different from language acquisition during later developmental stages, this adaptive learning in young infants is temporary and still very much stimulus -driven. Hence, temporal aspects of environmental and social factors figure crucially in the formation of pre -lexical multimodal associations. Study of these associations can offer important clues regarding how semantics are bootstrapped in real-world embodied infants. In this paper, we present a neuroanatomically -based embodied computational model of multimodal habituation to explore the temporal and social constraints on the learning observed in very young infants. In particular, the model is able to explain empirical results showing that auditory word stimuli must be presented synchronously with visual stimulus movement for the two to be associated.

Schulz, R, Wyeth, G, & Wiles, J, 2011: Are we there yet? Grounding temporal concepts in shared journeys IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

An understanding of time and temporal concepts is critical for interacting with the world and with other agents in the world. What does a robot need to know to refer to the temporal aspects of events? Could a robot gain a grounded understanding of a long journey, or soon? Cognitive maps constructed by individual agents from their own journey experiences have been used for grounding spatial concepts in robot languages. In this paper, we test whether a similar methodology can be applied to learning temporal concepts and an associated lexicon to answer the question how long did it take to complete a journey? Using evolutionary language games for specific and generic journeys, successful communication was established for concepts based on representations of time, distance, and amount of change. The studies demonstrate that a lexicon for journey duration can be grounded using a variety of concepts. Spatial and temporal terms are not identical, but the studies show that both can be learned using similar language evolution methods, and that time, distance, and change can serve as proxies for each other under noisy conditions. Effective concepts and names for duration provide a first step towards a grounded lexicon for temporal interval logic.

Hart, S, & Grupen, R, 2011: Learning generalizable control programs IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

In this paper we present a framework for guiding autonomous learning in robot systems. The paradigm we introduce allows a robot to acquire new skills according to an intrinsic motivation function that finds behavioral affordances. Affordances in the sense of Gibson [6] describe the latent possibilities for action in the environment and provide a direct means of organizing functional knowledge in embodied systems. We begin by showing how a robot can assemble closed-loop action primitives from its sensory and motor resources and then show how these primitives can be sequenced into multi-objective policies. We then show how these policies can be assembled hierarchically to support incremental and cumulative learning. The main contribution of this paper demonstrates how the proposed intrinsic motivator for affordance discovery can cause a robot to both acquire such hierarchical policies using reinforcement learning and then to generalize these policies to new contexts. As the framework is described, its effectiveness and applicability is demonstrated through a longitudinal learning experiment on a bimanual robot.

Meyer, M, Hard, B, Brand, R, McGarvey, M, & Baldwin, D, 2011: Acoustic packaging: maternal speech and action synchrony IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

The current study addressed the degree to which maternal speech and action are synchronous in interactions with infants. English-speaking mothers demonstrated the function of two toys, stacking rings and nesting cups, to younger infants (6- 9.5 mo.) and older infants (9.5-13 mo.). Action and speech units were identified, and speech units were coded as being ongoing action descriptions or non-action descriptions (examples of nonaction descriptions include attention-getting utterances such as 'Look!' or statements of action completion such as 'Yay, we did it!'). Descriptions of ongoing actions were found to be more synchronous with the actions themselves in comparison to other types of utterances, suggesting that 1) mothers align speech and action to provide synchronous "acoustic packaging" during action demonstrations and 2) mothers selectively pair utterances directly related to actions with the action units themselves rather than simply aligning speech in general with actions. Our results complement past studies of acoustic packaging in two ways. First, we provide a quantitative temporal measure of the degree to which speech and action onsets and offsets are aligned. Second, we offer a semantically-based analysis of the phenomenon, which we argue may be meaningful to infants known to process global semantic messages in infant-directed speech. In support of this possibility, we determined that adults were capable of classifying low-pass filtered action- and non-action-describing utterances at rates above chance.

Castellini, C, Tommasi, T, Noceti, N, Odone, F, & Caputo, B, 2011: Using object affordances to improve object recognition IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

The problem of object recognition has not yet been solved in its general form. The most successful approach to it so far relies on object models obtained by training a statistical method on visual features obtained from camera images. The images must necessarily come from huge visual datasets, in order to circumvent all problems related to changing illumination, point of view, etc. We hereby propose to also consider, in an object model, a simple model of how a human being would grasp that object (its affordance). This knowledge is represented as a function mapping visual features of an object to the kinematic features of a hand while grasping it. The function is practically enforced via regression on a human grasping database. After describing the database (which is publicly available) and the proposed method, we experimentally evaluate it, showing that a standard object classifier working on both sets of features (visual and motor) has a significantly better recognition rate than that of a visual-only classifier.

Malfaz, M, Castro-Gonzalez, A, Barber, R, & Salichs, M A, 2011: A biologically inspired architecture for an autonomous and social robot IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

Lately, lots of effort has been put into the construction of robots able to live among humans. This fact has favored the development of personal or social robots, which are expected to behave in a natural way. This implies that these robots could meet certain requirements, for example: to be able to decide their own actions (autonomy), to be able to make deliberative plans (reasoning), or to be able to have an emotional behavior in order to facilitate human-robot interaction. In this paper, the authors present a bio-inspired control architecture for an autonomous and social robot, which tries to accomplish some of these features. In order to develop this new architecture, authors have used as a base a prior hybrid control architecture (AD) that is also biologically inspired. Nevertheless, in the later, the task to be accomplished at each moment is determined by a fix sequence processed by the Main Sequencer. Therefore, the Main Sequencer of the architecture coordinates the previously programmed sequence of skills that must be executed. In the new architecture, the Main Sequencer is substituted by a decision making system based on drives, motivations, emotions, and self-learning, which decides the proper action at every moment according to robots state. Consequently, the robot improves its autonomy since the added decision making system will determine the goal and consequently the skills to be executed. A basic version of this new architecture has been implemented on a real robotic platform. Some experiments are shown at the end of the paper.

Tuci, E, Ferrauto, T, Zeschel, A, Massera, G, & Nolfi, S, 2011: An experiment on behaviour generalisation and the emergence of linguistic compositionality in evolving robots IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

Populations of simulated agents controlled by dynamical neural networks are trained by artificial evolution to access linguistic instructions and to execute them by indicating, touching or moving specific target objects. During training the agent experiences only a subset of all object/action pairs. During post -evaluation, some of the successful agents proved to be able to access and execute also linguistic instructions not experienced during training. This owes to the development of a semantic space, grounded in the sensory motor capability of the agent and organised in a systematised way in order to facilitate linguistic compositionality and behavioural generalisation. Compositionality seems to be underpinned by a capability of the agents to access and execute the instructions by temporally decomposing their linguistic and behavioural aspects into their constituent parts (i.e., finding the target object and executing the required action). The comparison between two experimental conditions, in one of which the agents are required to ignore rather than to indicate objects, shows that the composition of the behavioural set significantly influences the development of compositional semantic structures.

Uno, R, Marocco, D, Nolfi, S, & Ikegami, T, 2011: Emergence of proto-sentences in artificial communicating systems IEEE Trans. Auton. Ment. Devel., in press Mar 2011.

This paper investigates the relationship between embodied interaction and symbolic communication. We report about an experiment in which simulated autonomous robotic agents, whose control systems were evolved through an artificial evolutionary process, use abstract communication signals to coordinate their behavior in a context independent way. This use of signals includes some fundamental aspects of sentences in natural languages which are discussed by using the concept of joint attention in relation to the grammatical structure of sentences.