Coding for Distributed Multi-Agent Reinforcement Learning

A recent paper by the members of the DCIST alliance develops a multi-agent reinforcement learning (MARL) algorithm which uses coding theory to mitigate straggler effects in distributed training. Stragglers are delayed, non-responsive or compromised compute nodes, which occur commonly in distributed learning systems, due to communication bottlenecks and adversarial conditions. Coding techniques have been utilized to speed up distributed computation tasks in the presence of stragglers, such as matrix multiplications and inverse problems. Their proposed coded distributed learning framework can be applied with any policy gradient method to train policies for MARL problems in the presence of stragglers. They develop a coded distributed version of multi-agent deep deterministic policy gradient (MADDPG), a state-of-the-art MARL algorithm. To gain a comprehensive understanding of the benefits of coding in distributed MARL, they investigated various coding schemes, including the maximum distance separable (MDS) code, random sparse code, replication-based code, and regular low density parity check (LDPC) code. All of these methods were implemented in simulation on several multi-robot problems, including cooperative navigation, predator-prey, physical deception and keep-away tasks. Their approach achieves the same training accuracy while significantly speeding up the training of policy gradient algorithms in the presence of stragglers.

Capability: T3C1D: Optimal control & reinforcement learning with information theoretic objectives

Points of Contact: Nikolay Atanasov (PI), Baoqian Wang, and Junfei Xie

Video: https://youtu.be/B8WMjzRHoh0

Paper: https://arxiv.org/pdf/2101.02308.pdf

Citation: B. Wang, J. Xie, and N. Atanasov “Coding for Distributed Multi-Agent Reinforcement Learning”, IEEE International Conference on Robotics and Automation (ICRA), 2021.

Intermittent Interactions on Multi-Agent Systems: Diffusion of Information and Consensus Control

Recent works by members of the DCIST alliance investigate methods to handle consensus and broadcast of information tasks in networks of mobile robots subject to intermittent communication. The effort of this work is to alleviate the restriction of an all-time connected network, letting agents interact periodically and taking into consideration the uncertainty associated with those events. Hence, the time-varying communication topology is modeled as an inhomogeneous stochastic process with finite states and time-varying transition matrices belonging to a convex set. This modeling allows deriving new conditions for the consensus under stochastic interactions between robots. In the proposed methodology, the results follow from the transformation of the consensus problem into a stability problem of stochastic jumping systems, with the subsequent employment of the Lyapunov theory. The conditions are formulated in terms of convex optimization problems. In addition, the broadcasting of information in such a networked system is also investigated. To estimate the statistics of the broadcasting, the stochastic process associated with the changing topologies is utilized, but in this scenario governing a switching in an augmented state-space that maps the transmission of information in the network. The results are presented in the form of sufficient conditions for the convergence and lemmas used to estimate the expected time to the transmission of information between any two nodes in the network.

Left:  Time evolution for six agents interacting over a stochastic switching topology. Blue diamonds represent initial conditions and black solid lines the system’s trajectories. Middle and Right: Time evolution of the time-varying topology (up left), control signals (bottom left), and states (up and bottom right). On the time-varying topology (top left) topology 1 means all agents are disconnected, and the union of topology 2 and 3 form a connected graph.

Capability: T3C2D Synthesis of Time-Varying Communication Networks with Information Propagation Guarantees

Points of Contact: M. Ani Hsieh (PI), Xi Yu, Li Shen, and Thales C. Silva

Citation: 

Li Shen, Xi Yu, and M. Ani Hsieh. “Topology Control of a Periodic Time-varying Communication Network with Stochastic Temporal Links,” submitted to the 2022 American Control Conference, Under Review.

C. Silva and M. A. Hsieh, “Intermittent Interactions on Multi-Agent Systems: Diffusion of Information and Consensus Control” In preparation.