Keywords: control engineering computing;learning (artificial intelligence);multi-agent systems;networked control systems;nonlinear control systems;multiple control tasks;model-based ETC designs;event-triggered control methods;high-performance control;usual time-triggered methods;mathematical model;controller;deep reinforcement learning algorithms;DRL approach;Mathematical model;Reinforcement learning;Heuristic algorithms;Aerospace electronics;Sensors;Task analysis;Numerical models
One of the challenges of this century is to understand the neural mechanisms behind cognitive control and learning. Recent investigations propose biologically plausible synaptic mechanisms for self-organizing controllers, in the spirit of Hebbian learning. In particular, differential extrinsic plasticity (DEP) has proven to enable embodied agents to self-organize their individual sensorimotor development, and generate highly coordinated behaviors during their interaction with the environment. These behaviors are attractors of a dynamical system. In this paper, we use the DEP rule to generate attractors and we combine it with a “repelling potential” which allows the system to actively explore all its attractor behaviors in a systematic way. With a view to a selfdetermined exploration of goal-free behaviors, our framework enables switching between different motion patterns in an autonomous and sequential fashion. Our algorithm is able to recover all the attractor behaviors in a toy system and it is also effective in two simulated environments. A spherical robot discovers all its major rolling modes and a hexapod robot learns to locomote in 50 different ways in 30min.
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.
Author summary Neurons in the retina transform patterns of incoming light into sequences of neural spikes. We recorded from ∼100 neurons in the rat retina while it was stimulated with a complex movie. Using machine learning regression methods, we fit decoders to reconstruct the movie shown from the retinal output. We demonstrated that retinal code can only be read out with a low error if decoders make use of correlations between successive spikes emitted by individual neurons. These correlations can be used to ignore spontaneous spiking that would, otherwise, cause even the best linear decoders to “hallucinate” nonexistent stimuli. This work represents the first high resolution single-trial full movie reconstruction and suggests a new paradigm for separating spontaneous from stimulus-driven neural activity.
Keywords: Control systems;Muscles;Robot kinematics;Robot sensing systems;Springs;Tendons
distinguished oral paper award
Grounding autonomous behavior in the nervous system is a fundamental challenge for neuroscience. In particular, self-organized behavioral development provides more questions than answers. Are there special functional units for curiosity, motivation, and creativity? This paper argues that these features can be grounded in synaptic plasticity itself, without requiring any higher-level constructs. We propose differential extrinsic plasticity (DEP) as a new synaptic rule for self-learning systems and apply it to a number of complex robotic systems as a test case. Without specifying any purpose or goal, seemingly purposeful and adaptive rhythmic behavior is developed, displaying a certain level of sensorimotor intelligence. These surprising results require no system-specific modifications of the DEP rule. They rather arise from the underlying mechanism of spontaneous symmetry breaking, which is due to the tight brain body environment coupling. The new synaptic rule is biologically plausible and would be an interesting target for neurobiological investigation. We also argue that this neuronal mechanism may have been a catalyst in natural evolution.
best paper award
Keywords: Self-exploration; intrinsic motivation; robot control; information theory; dynamical systems; learning
Self-organizing processes are crucial for the development of living beings. Practical applications in robots may benefit from the self-organization of behavior, e.g. to increase fault tolerance and enhance flexibility, provided that external goals can also be achieved. We present results on the guidance of self-organizing control by visual target stimuli and show a remarkable robustness to sensorimotor disruptions. In a proof of concept study an autonomous wheeled robot is learning an object finding and ball-pushing task from scratch within a few minutes in continuous domains. The robustness is demonstrated by the rapid recovery of the performance after severe changes of the sensor configuration.
Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.
Autonomous robots may become our closest companions in the near future. While the technology for physically building such machines is already available today, a problem lies in the generation of the behavior for such complex machines. Nature proposes a solution: young children and higher animals learn to master their complex brain-body systems by playing. Can this be an option for robots? How can a machine be playful? The book provides answers by developing a general principle-homeokinesis, the dynamical symbiosis between brain, body, and environment-that is shown to drive robots to self-determined, individual development in a playful and obviously embodiment-related way: a dog-like robot starts playing with a barrier, eventually jumping or climbing over it; a snakebot develops coiling and jumping modes; humanoids develop climbing behaviors when fallen into a pit, or engage in wrestling-like scenarios when encountering an opponent. The book also develops guided self-organization, a new method that helps to make the playful machines fit for fulfilling tasks in the real world.
Autonomous robots can generate exploratory behavior by self-organization of the sensorimotor loop. We show that the behavioral manifold that is covered in this way can be modified in a goal-dependent way without reducing the self-induced activity of the robot. We present three strategies for guided self-organization, namely by using external rewards, a problem-specific error function or assumptions about the symmetries of the desired behavior. The strategies are analyzed for two different robots in a physically realistic simulation.
This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, noncommunicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared with a maximization per robot. Another focus of this article is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.
best paper award
We study an adaptive controller that adjusts its internal parameters by self-organization of its interaction with the environment. We show that the parameter changes that occur in this low-level learning process can themselves provide a source of information to a higher-level context-sensitive learning mechanism. In this way the context is interpreted in terms of the concurrent low-level learning mechanism. The dual learning architecture is studied in realistic simulations of a foraging robot and of a humanoid hand that manipulated an object. Both systems are driven by the same low-level scheme, but use the second-order information in different ways. While the low-level adaptation continues to follow a set of rigid learning rules, the second-order learning modulates the elementary behaviors and affects the distribution of the sensory inputs via the environment.
Ideally, sensory information forms the only source of information to a robot. We consider an algorithm for the self-organization of a controller. At short timescales the controller is merely reactive but the parameter dynamics and the acquisition of knowledge by an internal model lead to seemingly purposeful behavior on longer timescales. As a paradigmatic example, we study the simulation of an underactuated snake-like robot. By interacting with the real physical system formed by the robotic hardware and the environment, the controller achieves a sensitive and body-specific actuation of the robot.
Robotic agents can self-organize their interaction with the environment by an adaptive homeokinetic controller that simultaneously maximizes sensitivity of the behavior and predictability of sensory inputs. Based on previous work with single robots, we study the interaction of two homeokinetic agents. We show that this paradigm also produces quasi-social interactions among artificial agents. The results suggest that homeokinetic learning generates social behavior only in the the context of an actual encounter of the interaction partner while this does not happen for an identical stimulus pattern that is only replayed. This is in agreement with earlier experiments with human subjects.
Homeokinetic learning provides a route to the self-organization of elementary behaviors in autonomous robots by establishing low-level sensomotoric loops. Strength and duration of the internal parameter changes which are caused by the homeokinetic adaptation provide a natural evaluation of external states, which can be used to incorporate information from additional sensory inputs and to extend the function of the low-level behavior to more general situations. We illustrate the approach by two examples, a mobile robot and a human-like hand which are driven by the same low-level scheme, but use the second-order information in different ways to achieve either risk avoidance and unconstrained movement or constrained movement. While the low-level adaptation follows a set of rigid learning rules, the second-order learning exerts a modulatory effect to the elementary behaviors and to the distribution of their inputs.
The paper presents a method to guide the self-organised development of behaviours of autonomous robots. In earlier publications we demonstrated how to use the homeokinesis principle and dynamical systems theory to obtain self-organised playful but goal-free behaviour. Now we extend this framework by reinforcement signals. We validate the mechanisms with two experiment with a spherical robot. The first experiment aims at fast motion, where the robot reaches on average about twice the speed of a not reinforcement robot. In the second experiment spinning motion is rewarded and we demonstrate that the robot successfully develops pirouettes and curved motion which only rarely occur among the natural behaviours of the robot.
Self-organization and the phenomenen of emergence play an essential role in living systems and form a challenge to artificial life systems. This is not only because systems become more life like but also since self-organization may help in reducing the design efforts in creating complex behavior systems. The present paper exemplifies a general approach to the self-organization of behavior which has been developed and tested in various examples in recent years. We apply this approach to a spherical robot driven by shifting internal masses. The complex physics of this robotic object is completely unknown to the controller. Nevertheless after a short time the robot develops systematic rolling movements covering large distances with high velocity. In a hilly landscape it is capable of manoeuvering out of the basins and in landscapes with a fixed rotational geometry the robot more or less adatps its movements to this geometry - the controller so to say develops a kind of feeling for its environment although there are no sensors for measuring the positions or the velocity of the robot. We argue that this behavior is a result of the spontaneous symmetry breaking effects which are responsible for the emergence of behavior in our approach.
Self-organization and the phenomenon of emergence play an essential role in living systems and form a challenge to artificial life systems. This is not only because systems become more lifelike, but also since self-organization may help in reducing the design efforts in creating complex behavior systems. The present paper studies self-exploration based on a general approach to the self-organization of behavior, which has been developed and tested in various examples in recent years. This is a step towards autonomous early robot development. We consider agents under the close sensorimotor coupling paradigm with a certain cognitive ability realized by an internal forward model. Starting from tabula rasa initial conditions we overcome the bootstrapping problem and show emerging self-exploration. Apart from that, we analyze the effect of limited actions, which lead to deprivation of the world model. We show that our paradigm explicitly avoids this by producing purposive actions in a natural way. Examples are given using a simulated simple wheeled robot and a spherical robot driven by shifting internal masses.
Dynamical systems offer intriguing possibilities as a substrate for the generation of behavior because of their rich behavioral complexity. However this complexity together with the largely covert relation between the parameters and the behavior of the agent is also the main hindrance in the goal-oriented design of a behavior system. This paper presents a general approach to the self-regulation of dynamical systems so that the design problem is circumvented. We consider the controller (a neural net work) as the mediator for changes in the sensor values over time and define a dynamics for the parameters of the controller by maximizing the dynamical complexity of the sensorimotor loop under the condition that the consequences of the actions taken are still predictable. This very general principle is given a concrete mathematical formulation and is implemented in an extremely robust and versatile algorithm for the parameter dynamics of the controller. We consider two different applications, a mechanical device called the rocking stamper and the ODE simulations of a "snake" with five degrees of freedom. In these and many other examples studied we observed various behavior modes of high dynamical complexity.
Keywords: autonomous robots, self-organization, homeostasis, homeokinesis, dynamical systems, learning
Despite the tremendous progress in robotic hardware and in both sensorial and computing efficiencies the performance of contemporary autonomous robots is still far below that of simple animals. This has triggered an intensive search for alternative approaches to the control of robots. The present paper exemplifies a general approach to the self-organization of behavior which has been developed and tested in various examples in recent years. We apply this approach to an underactuated snake like artifact with a complex physical behavior which is not known to the controller. Due to the weak forces available, the controller so to say has to develop a kind of feeling for the body which is seen to emerge from our approach in a natural way with meandering and rotational collective modes being observed in computer simulation experiments.