Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

要約

アクションチャンキングとして知られる中間の再生なしで一連のアクションを予測および実行することは、人間のデモンストレーションから学習するロボットでますます使用されています。
しかし、学習したポリシーへの影響は矛盾したままです。一部の研究では、強力な結果を達成するために重要であると感じていますが、他の研究ではパフォーマンスの低下が観察されます。
この論文では、最初に、アクションチャンキングが学習者とデモンストレーターの間の発散にどのように影響するかを分析します。
アクションチャンキングにより、学習者はデモンストレーションでの時間的依存関係をより適切に捉えることができるが、予期しない状態に対する反応性の低下を犠牲にすることができることがわかります。
このトレードオフに対処するために、閉ループの適応でチャンクするアクションを橋渡しするテスト時間推論アルゴリズムである双方向デコード（BID）を提案します。
各タイムステップで、BIDは複数の候補予測をサンプルし、2つの基準に基づいて最適な予測を検索します。（i）以前の決定に沿ったサンプルを支持する後方一貫性。
（ii）将来の計画に対する高い可能性のサンプルを求める前方コントラスト。
アクションチャンク内および範囲間の決定を結合することにより、BIDは長期的な一貫性と短期反応性の両方を促進します。
実験結果は、私たちの方法が、7つのシミュレーションベンチマークと2つの実際のタスクにわたる2つの最先端の生成ポリシーのパフォーマンスを高めることを示しています。
コードとビデオはhttps://bid-robot.github.ioで入手できます。

要約(オリジナル)

Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its effects on the learned policy remain inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the divergence between a learner and a demonstrator. We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. At each timestep, BID samples multiple candidate predictions and searches for the optimal one based on two criteria: (i) backward coherence, which favors samples that align with previous decisions; (ii) forward contrast, which seeks samples of high likelihood for future plans. By coupling decisions within and across action chunks, BID promotes both long-term consistency and short-term reactivity. Experimental results show that our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks. Code and videos are available at https://bid-robot.github.io.

arxiv情報

著者	Yuejiang Liu,Jubayer Ibn Hamid,Annie Xie,Yoonho Lee,Maximilian Du,Chelsea Finn
発行日	2025-04-25 17:27:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー