Ignorance is bliss - the role of noise and heterogeneity in training and deployment of Single Agent Policies for the Multi-Agent Persistent Surveillance Problem


Reinforcement Learning for Multi-Agent systems is hard! Can we instead get away with just training a single agent and then deploying it on many agents? Maybe, but there are catches. The aim of this talk is to explore a ‘toy’ problem of multi-agent persistent surveillance where the basic objective is for agents to continuously monitor areas of a map to maximise a surveillance ‘score’. To see how a single agent policy, trained in isolation, can perform when deployed in a multi-agent scenario we compare the performance of a number of single-agent policies: Reinforcement Learning (DDPG); Neuro-Evolution; and a gradient descent heuristic, against more traditional pre-defined boustrophedon-style ‘ploughing patterns’. We will observe the ‘homogeneous-policy convergence problem’, where identical policies force multiple agents to get stuck together, and look at how noise and uncertainty can alleviate the issue. Finally we look at the decentralised case, where agents, having only partial knowledge of the world, are able to communicate. Different methods of state consensus indicate that for homogeneous deterministic policies, communication can be detrimental to performance.

Oct 19, 2019 3:00 PM — 4:00 PM
Thomas E. Kent
Senior Research Associate in Applied Mathematics & Computer Science

A post-doctoral research associate in Bristol University’s Computer Science Department. Currently working on the T-B Phase project (Thales Bristol Partnership in Hybrid Autonomous Systems Engineering). I am interested in exploring how we can utilise AI, Machine Learning and Decision Making techniques within a number of key Multi-Agent Systems use-cases, e.g. Search and Rescue, Persistent Surveillance.