Research Topic: Reinforcement Learning in Humanoid Robotics

Robotic learning is the gathering of knowledge robot changing environment as well as of internal robot abilities. Learning can be seen as a systematic changing of behaviour due to attained and evaluative information by examining changes in environment. Learning and adaptation are important paradigms in the field of current robotics research. Especially when a system cannot be completely modelled, data-driven methods enhance model-based approaches.

We have integrated learning methods in various robotics applications. From feed-forward methods to cognitive models, such approaches have proven their usability in our lab. Part of our work on learning in robotics is theoretical work with aim to find new learning algorithm, driven by the high-dimensional real-time applications that we work on.

Classical humanoid robotics still rely heavily on teleoperation or fixed a priori determined behaviour based control with very little autonomous ability to react to the environment. Key missing elements is the ability to create control systems that can deal with a large movement repertoire, variable speeds, constraints and most importantly, uncertainty in the real-world environment in a fast, reactive manner. The acquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments.

For physical agents, such as humanoid robots acting in the real world, it is much more difficult to gain experience through the process of learning. Robot learning in realistic environments requires novel algorithms for learning to identify important events in the stream of sensory inputs, and to temporarily memorize them in adaptive, dynamic, internal states until the memories can help to compute proper control actions. While supervised statistical learning techniques have significant applications for model and imitation learning, they do not suffice for all biped learning problems, particularly when no expert teacher or idealized desired behaviour is available. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Humanoid Robotics is a very challenging domain for reinforcement learning, however, since robots cannot perceive the underlying state of their environment and because training time is usually quite limited.

This research considers a optimal solutions for application of reinforcement learning in humanoid robotics. The importance of hybrid approach is emphasized. The hybrid aspect is connected with application of model-based and model free approaches as well as with combination of different paradigms of computational intelligence.

The general goal in synthesis of reinforcement learning control algorithms is the development of methods which scale into the dimensionality of humanoid robots and can generate actions for biped with many degrees of freedom. In this research, we will consider specially that control of walking of active and passive dynamic walkers by using of reinforcement learning can be efficiently solved.

Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot., uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward -torques applied at one time may have an effect on the performance many steps into the future. Hence, for a physical robot, it is essential to learn from few trials in order to have some time left for exploitation. It is thus necessary to speed the learning up by using different methods (hierarchical learning, subtask decomposition, imitation,…), that will be presented.

Actor-critic Architecture

Actor-Critic Architecture. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consists of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials.The structure of controller involves two feedback loops: model-based dynamic controller and fuzzy reinforcement learning feedback around Zero-Moment Point. The reinforcement learning architecture as external reinforcement use fuzzy evaluative feedback. This research consists of finding the optimal structure of soft-computing paradigms, enhancement of learning convergence and hierarchical task decomposition.

Hybrid Control Algorithm for Humanoid Robots Based on Reinforcement Structure

Fig. 1. Hybrid Control Algorithm for Humanoid Robots Based on Reinforcement Structure.

Reinforcement during process of walking

Fig. 2. Reinforcement during process of walking.