Aleph-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection

Published in Journal of Artificial Intelligence Research (JAIR), 2026

Recommended citation: Alon, N., Barnby, J. M., Sarkadi, S., Schulz, L., & Rosenschein, J. S. (2026). "ℵ-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection." Journal of Artificial Intelligence Research, 85. https://doi.org/10.1613/jair.1.19204

This paper addresses the vulnerability of agents with limited Theory of Mind (ToM) depth to manipulation by more sophisticated agents. To mitigate this, we propose the ℵ-IPOMDP framework, which enhances model-based reinforcement learning agents with anomaly detection and out-of-belief policies. This allows agents to recognize deceptive behaviors, even without fully understanding them, and to adopt defensive strategies. The framework’s effectiveness is demonstrated in both mixed-motive and zero-sum games, leading to more equitable outcomes and reduced exploitation. The study’s implications span AI safety, cybersecurity, cognitive science, and psychiatry.