The Autonomy Review

LLM Agents Fail at Consensus, and Anthropic Won't Bend for the Pentagon

LLM Agents Cannot Reliably Reach Consensus — Even in Benign Settings

Frédéric Berdoz, Leonardo Rugli, and Roger Wattenhofer at ETH Zurich tested LLM-based agents on a Byzantine consensus game over scalar values. The result: valid agreement is not reliable even when all agents are cooperative, and it degrades as group size grows. Introducing a small fraction of Byzantine agents further reduces success. The dominant failure mode is not subtle value corruption — it is loss of liveness, meaning timeouts and stalled convergence. (arXiv:2603.01213)

This is a sober finding for anyone building multi-agent orchestration. If LLM agents cannot converge on a number in a controlled setting, the implicit assumption behind many multi-agent architectures — that agents can coordinate through natural language — needs harder scrutiny. The paper suggests that protocol-level structure, not just better prompting, may be required for reliable multi-agent agreement.

Multi-agent coordination layers need formal protocols, not ad-hoc message passing. Treat consensus as an engineering problem, not an emergent property.