Theory-of-Mind in Partially Observed, Mixed-Motive Games

Published in Proceedings of the AAAI Conference on Artificial Intelligence (Doctoral Consortium), 2026

Recommended citation: Alon, N. (2026). "Theory-of-Mind in Partially Observed, Mixed-Motive Games." Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41030-41031. https://doi.org/10.1609/aaai.v40i48.42141

Theory of Mind (ToM) enables agents to model others’ mental states, but in mixed-motive games, this capacity can lead to deceptive behaviour and alignment risks. My research investigates how ToM affects strategic behaviour in partially observed games, contributing: (1) a formal model of ToM-driven manipulation in a preference elicitation task, (2) evidence that excessive ToM leads to paranoid-like overmentalisation, and (3) the Aleph-IPOMDP model, a framework for multi-agent systems that balances ToM reasoning with game-theoretic principles to prevent manipulation, deterring capable agents from deceiving. My work contributes to the understanding of deceptive AI, overcoming deception in multi-agent systems and applications to computational model of human cognition.

Download paper here

Recommended citation: Alon, N. (2026). “Theory-of-Mind in Partially Observed, Mixed-Motive Games” Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41030-41031.