Strengthening AI Agent Hijacking Evaluations

The U.S. AI Safety Institute discusses the increasing use of large AI models to power agents that automate tasks for users. These AI agents offer benefits such as automating research and serving as personal assistants. However, security risks, like agent hijacking, need to be identified and mitigated. Agent hijacking occurs when attackers insert malicious instructions into data to manipulate AI agents. To evaluate agent hijacking risk, the US AISI conducted experiments using the AgentDojo framework. This research highlighted the importance of continuous improvement in evaluation frameworks, adaptability of evaluations to new attacks, task-specific analysis of risk, and testing attacks on multiple attempts for realistic results.

https://www.nist.gov/news-events/news/2025/01/technical-blog-strengthening-ai-agent-hijacking-evaluations