Adversarial Policies Beat Superhuman Go AIs

Authors attacked the state-of-the-art Go-playing AI system KataGo by training adversarial policies, achieving over a 97% win rate against it. Surprisingly, adversaries did not win by playing Go well, but by tricking KataGo into making serious blunders. This attack, transferable to other superhuman AIs, can be understood by human experts to consistently beat superhuman AIs without algorithmic assistance. The core vulnerability persists even in KataGo agents trained to defend against the attack. The results highlight that even superhuman AI systems may have unexpected failure modes. Example games are available for further review.

https://arxiv.org/abs/2211.00241

To top