Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents

要約

長距離ロボット操作は、自律システムに重大な課題をもたらし、複雑な連続タスク全体で拡張された推論、正確な実行、および堅牢なエラー回復を必要とします。
静的計画またはエンドツーエンドの視覚運動ポリシーに基づいているかどうかにかかわらず、現在のアプローチは、エラーの蓄積に苦しみ、実行中の効果的な検証メカニズムが欠けており、実際のシナリオでの信頼性を制限しています。
標準化されたアクション手順（SAP）を通じてこれらの制限に対処する脳に触発されたフレームワークであるエージェントロボットを提示します。これは、操作タスク全体でコンポーネントの相互作用を管理する新しい調整プロトコルです。
SAPは、人間組織の標準化された操作手順（SOP）からインスピレーションを得て、計画、実行、および検証フェーズのための構造化されたワークフローを確立します。
私たちのアーキテクチャは、3つの専門的なコンポーネントで構成されています。（1）高レベルの命令を意味的に一貫したサブゴールに分解する大きな推論モデル、（2）リアルタイムの視覚入力から連続制御コマンドを生成する視覚言語のアクションエグゼクティック、および（3）自律的な進行を介して自動的な進行回復を可能にする一時的な検証です。
このSAP駆動型の閉ループ設計は、外部の監督なしで動的な自己検証をサポートします。
リベロのベンチマークでは、エージェントロボットは平均成功率79.6 \％で最先端のパフォーマンスを達成し、長距離タスクで6.1 \％、OpenVLAを7.4 \％よりも上回ります。
これらの結果は、特殊なコンポーネント間のSAP駆動型の調整が、順次操作におけるパフォーマンスと解釈可能性の両方を向上させ、信頼性の高い自律システムの重要な可能性を示唆していることを示しています。
プロジェクトGithub：https：//agentic-robot.github.io。

要約(オリジナル)

Long-horizon robotic manipulation poses significant challenges for autonomous systems, requiring extended reasoning, precise execution, and robust error recovery across complex sequential tasks. Current approaches, whether based on static planning or end-to-end visuomotor policies, suffer from error accumulation and lack effective verification mechanisms during execution, limiting their reliability in real-world scenarios. We present Agentic Robot, a brain-inspired framework that addresses these limitations through Standardized Action Procedures (SAP)–a novel coordination protocol governing component interactions throughout manipulation tasks. Drawing inspiration from Standardized Operating Procedures (SOPs) in human organizations, SAP establishes structured workflows for planning, execution, and verification phases. Our architecture comprises three specialized components: (1) a large reasoning model that decomposes high-level instructions into semantically coherent subgoals, (2) a vision-language-action executor that generates continuous control commands from real-time visual inputs, and (3) a temporal verifier that enables autonomous progression and error recovery through introspective assessment. This SAP-driven closed-loop design supports dynamic self-verification without external supervision. On the LIBERO benchmark, Agentic Robot achieves state-of-the-art performance with an average success rate of 79.6\%, outperforming SpatialVLA by 6.1\% and OpenVLA by 7.4\% on long-horizon tasks. These results demonstrate that SAP-driven coordination between specialized components enhances both performance and interpretability in sequential manipulation, suggesting significant potential for reliable autonomous systems. Project Github: https://agentic-robot.github.io.

arxiv情報

著者	Zhejian Yang,Yongchao Chen,Xueyang Zhou,Jiangyue Yan,Dingjie Song,Yinuo Liu,Yuting Li,Yu Zhang,Pan Zhou,Hechang Chen,Lichao Sun
発行日	2025-05-29 13:56:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー