Agent-SafetyBench: A New Benchmark for Evaluating AI Agent Safety

Top post
Agent-SafetyBench: A New Benchmark for the Safety of AI Agents
Large language models (LLMs) are increasingly being used as agents that independently perform tasks and interact with their environment. This development brings new safety challenges that go beyond the known risks of LLMs. Until now, comprehensive benchmarks for effectively evaluating the safety of such AI agents have been lacking. Agent-SafetyBench closes this gap and enables systematic review of the safety of LLM agents.
Comprehensive Benchmark for Diverse Security Risks
Agent-SafetyBench includes 349 interactive environments and 2,000 test cases covering eight categories of security risks and considering ten common failure modes in unsafe interactions. The test cases simulate real-world scenarios in which AI agents operate and examine how robust the agents are against various security threats. The benchmark covers a wide range of potential hazards, from data breaches to misconduct in critical situations.
Current LLM Agents Perform Poorly
The initial results of evaluating 16 common LLM agents with Agent-SafetyBench are sobering: none of the tested agents achieved a safety rating above 60%. This result highlights the significant safety deficits of current LLM agents and underscores the urgent need for improvements. The low safety rating shows that current safety mechanisms are insufficient to address the complex challenges in dealing with AI agents.
Lack of Robustness and Risk Awareness
Analysis of the test results identifies two fundamental safety deficiencies in current LLM agents: lack of robustness and lack of risk awareness. Agents often prove vulnerable to unexpected inputs or changing environmental conditions. Furthermore, they do not reliably recognize potential risks in their actions and therefore make unsafe decisions. These weaknesses can have serious consequences when AI agents are deployed in real-world environments.
Defensive Prompts Are Not Enough
The study also shows that the use of defensive prompts, i.e., special instructions to promote safe behavior, alone is not enough to close the safety gaps. LLM agents require more robust and advanced safety strategies to reliably ensure safe behavior. The development of such strategies is a central challenge for future research in the field of AI safety.
Agent-SafetyBench Available for Research
Agent-SafetyBench is being made available to the research community to advance the development and evaluation of safe LLM agents. The benchmark provides a standardized platform for measuring the safety of AI agents and verifying improvements. The open availability of Agent-SafetyBench is intended to promote collaboration within the research community and accelerate the development of safe AI systems.
Mindverse: AI Partner for Customized Solutions
Mindverse, a German company for AI-powered content creation, image generation, and research, offers companies customized AI solutions. From chatbots and voicebots to AI search engines and knowledge systems, Mindverse develops individual solutions tailored to the specific needs of its customers. As an AI partner, Mindverse supports companies in integrating artificial intelligence into their business processes.
Bibliography:
Zhang, Z. et al. (2024). Agent-SafetyBench: Evaluating the Safety of LLM Agents. arXiv preprint arXiv:2412.14470.
THUDM/AgentBench (2024). GitHub repository.
Zhang, Z. et al. (2024). SafetyBench: Evaluating the Safety of Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
Liu, X. et al. (2023). AgentBench: Evaluating LLMs as Agents. arXiv preprint arXiv:2308.03688.
SafetyBench: Evaluating the Safety of Large Language Models. ResearchGate.
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control. OpenReview.
AgentBench: Evaluating LLMs as Agents. Typeset.
AgentBench. PaperReading.