AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

要約

ビジョン言語モデル（VLM）の急速な進歩と具体化されたエージェントへの統合により、意思決定の強力な能力が解除されました。
ただし、これらのシステムは実際の環境でますます展開されているため、特に危険な指示に対応する場合、安全性の懸念事項に直面しています。
この作業では、危険な指示の下で具体化されたVLMエージェントの安全性を評価するための最初の包括的なベンチマークであるAgentsafeを提案します。
AgentsAfeは、シミュレーションサンドボックス内の現実的なエージェントと環境の相互作用をシミュレートし、高レベルのVLM出力と低レベルの具体化されたコントロールの間のギャップを埋める新しいアダプターモジュールを組み込んでいます。
具体的には、視覚エンティティを操作可能なオブジェクトに対して認識した視覚エンティティをマップし、抽象計画を環境で実行可能な原子アクションに変換します。
これに基づいて、Asimovsに触発されたリスク認識データセットを構築します。これには、基本的なリスクのある指示や変異したJailbroken Instructionsなど、3つのロボット工学の法律があります。
ベンチマークには、45の敵対的なシナリオ、1,350の危険タスク、8,100の危険な指示が含まれ、知覚、計画、および行動の実行段階に及ぶ敵対的条件下での体系的なテストを可能にします。

要約(オリジナル)

The rapid advancement of vision-language models (VLMs) and their integration into embodied agents have unlocked powerful capabilities for decision-making. However, as these systems are increasingly deployed in real-world environments, they face mounting safety concerns, particularly when responding to hazardous instructions. In this work, we propose AGENTSAFE, the first comprehensive benchmark for evaluating the safety of embodied VLM agents under hazardous instructions. AGENTSAFE simulates realistic agent-environment interactions within a simulation sandbox and incorporates a novel adapter module that bridges the gap between high-level VLM outputs and low-level embodied controls. Specifically, it maps recognized visual entities to manipulable objects and translates abstract planning into executable atomic actions in the environment. Building on this, we construct a risk-aware instruction dataset inspired by Asimovs Three Laws of Robotics, including base risky instructions and mutated jailbroken instructions. The benchmark includes 45 adversarial scenarios, 1,350 hazardous tasks, and 8,100 hazardous instructions, enabling systematic testing under adversarial conditions ranging from perception, planning, and action execution stages.

arxiv情報

著者	Aishan Liu,Zonghao Ying,Le Wang,Junjie Mu,Jinyang Guo,Jiakai Wang,Yuqing Ma,Siyuan Liang,Mingchuan Zhang,Xianglong Liu,Dacheng Tao
発行日	2025-06-17 16:37:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー