Your first multi-agent RL project will teach you a hard truth: everything you know about single-agent training breaks the moment a second learner enters the room. Non-stationarity sets in. Rewards stop meaning what you think they mean. And the fixes that worked for a single policy quietly make things worse.
This is the book for engineers and researchers who already know single-agent RL and are ready for what comes next - written by a practitioner who's built coordinating robot fleets, adversarial trading agents, and cooperating LLM agent teams, and who still remembers exactly where it went wrong the first time.
Inside, you'll learn: - Why non-stationarity is the real enemy of MARL - and how to design around it
- How to formulate state, observation, action, and reward before you write training code (the highest-leverage decision in any MARL project)
- Cooperative methods: value decomposition (VDN, QMIX), credit assignment, and learned communication
- Competitive methods: self-play, opponent modeling, exploitability, and why average return lies to you
- Scaling to dozens or hundreds of agents without training collapsing
- Graph neural networks, mean-field methods, and attention-based communication architectures
- Real deployment: sim-to-real transfer, robotics, swarms, and multi-agent LLM systems
- Where the field is still unsolved - continual learning, human-AI teams, and multi-agent alignment
Written in first person, with real mistakes included, not just the theory that made it into the papers. Every chapter builds a working intuition, then shows you exactly how it fails in practice - so you find out in the book, not three weeks into a training run.
If you've trained a MARL system, watched it behave strangely, and wanted to know why - this book is for you.