Reinforcement Learning for Multi-Agent systems is hard! Can we instead get away with just training a single agent and then deploying it on many agents? Maybe, but there are catches. The aim of this talk is to explore a ‘toy’ problem of multi-agent persistent surveillance where the basic objective is for agents to continuously monitor areas of a map to maximise a surveillance ‘score’. To see how a single agent policy, trained in isolation, can perform when deployed in a multi-agent scenario we compare the performance of a number of single-agent policies: Reinforcement Learning (DDPG); Neuro-Evolution; and a gradient descent heuristic, against more traditional pre-defined boustrophedon-style ‘ploughing patterns’. We will observe the ‘homogeneous-policy convergence problem’, where identical policies force multiple agents to get stuck together, and look at how noise and uncertainty can alleviate the issue. Finally we look at the decentralised case, where agents, having only partial knowledge of the world, are able to communicate. Different methods of state consensus indicate that for homogeneous deterministic policies, communication can be detrimental to performance.