基于多头注意力机制通信的多智能体强化学习算法

MULTI-AGENT REINFORCEMENT LEARNING ALGORITHM BASED ON MULTI-HEAD ATTENTION COMMUNICATION

摘要: 对于多智能体视野受限场景下的合作任务，传统强化学习算法很难取得满意表现，而采用消息通信的算法可以有效提高协作水平，因此提出一种基于多头注意力机制通信的多智能体强化学习算法。智能体结合注意力机制与多头注意力机制聚合消息，达成高效的信息交流；采用中心化训练分布式执行架构，从全局角度评估策略价值，提升决策质量。在经典仿真环境Traffic-Junction上的实验结果表明，该算法能有效提高多智能体的合作水平，性能优于现有算法。

Abstract: For multi-agent cooperation tasks in partially observable scenarios, it is difficult for traditional reinforcement learning algorithms to achieve satisfactory performance, while multi-agent reinforcement learning algorithms (MARL) with communication can effectively improve collaboration. We design multi-agent reinforcement learning algorithm based on multi-head attention mechanism. Agents aggregated messages using self-attention and multi-head attention to achieve efficient communication. We used centralized training and distributed execution framework (CTDE) to evaluate and improve policies from a global perspective, so as to improve the quality of decision-making. The experimental results on Traffic-Junction show that our method can effectively improve multi-agent cooperation and achieve better performance than existing algorithms.