Learning Multi-Agent Policies from Observations
A recent paper from the DCIST team introduces a framework for learning to perform multi-robot missions by observing an expert system executing the same
mission. The expert system is a team of robots equipped with a library of controllers, each designed to solve a specific task. The expert system’s policy selects the controller necessary to successfully execute the mission at each time step, based on the states of the robots and the environment. The objective of the learning framework is to enable an un-trained team of robots (i.e., imitator system) — equipped with the same library of controllers but not the expert policy — to learn to execute the mission with performance comparable to that of the expert system. Based on un-annotated and noisy observations of the expert system, a multi-hypothesis filtering technique estimates the series of individual controllers executed by the expert policy. Then, the history of estimated controllers and environmental states provide supervision to train a neural network policy for the imitator system. When evaluated on a perimeter protection scenario, experimental results suggest that the learned policy endows the imitator system with performance comparable to that of the expert system.
mission. The expert system is a team of robots equipped with a library of controllers, each designed to solve a specific task. The expert system’s policy selects the controller necessary to successfully execute the mission at each time step, based on the states of the robots and the environment. The objective of the learning framework is to enable an un-trained team of robots (i.e., imitator system) — equipped with the same library of controllers but not the expert policy — to learn to execute the mission with performance comparable to that of the expert system. Based on un-annotated and noisy observations of the expert system, a multi-hypothesis filtering technique estimates the series of individual controllers executed by the expert policy. Then, the history of estimated controllers and environmental states provide supervision to train a neural network policy for the imitator system. When evaluated on a perimeter protection scenario, experimental results suggest that the learned policy endows the imitator system with performance comparable to that of the expert system.
Source: P. Pierpaoli, H. Ravichandar, N. Waytowich, A. Li, D. Asher, M. Egerstedt. “Inferring and Learning Multi-Robot Policies from Observations”, International Conference on Intelligent Robots and Systems (IROS), 2020 – under review
Points of Contact: Pietro Pierpaoli; Harish Ravichandar {pietro.pierpaoli, harish.ravichandar} @gatech.edu