Learning to swarm with knowledge-based neural ordinary differential equations
A recent paper by members of the DCIST alliance uses the deep learning method, knowledge-based neural ordinary differential equations (KNODE) to develop a data-driven approach for extracting single-robot controllers from the observations of a swarm’s trajectory. The goal is to reproduce global swarm behavior using the extracted controller. Different from the previous works on imitation learning, this method does not require action data for training. The proposed method can combine existing knowledge about the single-robot dynamics, and incorporates information decentralization, time delay, and obstacle avoidance into a general model for controlling each individual robot in a swarm. The decentralized information structure and homogeneity assumption further allow the method for scalable training, i.e., the training time grows linearly with the swarm size. This method was applied on two different flocking swarms, in 2D and 3D respectively, and successfully reproduced global swarm behavior using the learnt controllers. In addition to the learning method, the paper also proposed the novel application of proper orthogonal decomposition (POD) for evaluating the performance of a learnt controller. Furthermore, extensive analysis on hyperparameters is conducted to provide more insights on the properties and characteristics of the proposed method.

Capability: T3C4C – Adaptive Swarm Behaviors for Uncertainty Mitigation (Hsieh)
Points of Contact: M. Ani Hsieh (PI) and Tom Z. Jiahao
Video: https://drive.google.com/file/d/1QV4kE8K0nYcoLWHTAZ9BNsSI0b4dUax_/view?usp=sharing
Paper: https://arxiv.org/pdf/2109.04927.pdf
Citation: T. Z. Jiahao, L. Pan, M. A. Hsieh “Learning to Swarm with Knowledge-Based Neural Ordinary Differential Equations.” Arxiv Preprint, December 2021.












navigation. A recent paper by members of the DCIST alliance tackles the problem of autonomous mapping of unknown environments using information theoretic metrics and signed distance field maps. Signed distance fields are discrete representations of environmental occupancy in which each cell of the environment stores a distance to the nearest obstacle surface, with negative distances indicating that the cell is within an obstacle. Such a representation has many benefits over the more traditional occupancy grid map including trivial collision checking, and easy extraction of mesh representations of the obstacle surfaces. The researchers use a truncated signed distance field, which only keeps track of cells near obstacle surfaces, and model each cell as a Gaussian random variable with an expected distance and a variance determined incrementally using a realistic RGB-D sensor noise model. The use of Gaussian random variables enables the closed form computation of Shannon mutual information between a Gaussian sensor measurement and the Gaussian cells it intersects. This allows for efficient evaluations of expected information when planning and evaluating possible future trajectories. Using these tools, a robot is able to efficiently evaluate a large number of trajectories before choosing the best next step to increase its information about the environment. The researchers show the resulting active exploration algorithm running on several simulated 2D environments of varying complexity. The figure shows a snapshot of the robot exploring the most complex of the three environments. These simulations can be viewed in more detail in the video linked below.
A recent paper from the DCIST team introduces a framework for learning to perform multi-robot missions by observing an expert system executing the same
A recent paper by members of the DCIST alliance develops the use of reinforcement learning techniques to train policies in simulation that transfer remarkably well to multiple different physical quadrotors. Quadrotor stabilizing controllers often require careful, model-specific tuning for safe operation. The policies developed are low-level, i.e., they map the rotorcrafts’ state directly to the motor outputs. The trained control policies are very robust to external disturbances and can withstand harsh initial conditions such as throws. The work shows how different training methodologies (change of the cost function, modeling of noise, use of domain randomization) might affect flight performance. The is the first work that demonstrates that a simple neural network can learn a robust stabilizing low-level quadrotor controller (without the use of a stabilizing PD controller) that is shown to generalize to multiple quadrotors.