TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

要約

多様で物理的にもっともらしいヒトシーン相互作用（HSI）の合成は、コンピューターアニメーションと具体化されたAIの両方にとって極めて重要です。
進歩を促進したにもかかわらず、現在の方法は主に個別のコントローラーの開発に焦点を当てており、それぞれが特定の相互作用タスクに特化しています。
これにより、複数のスキル、たとえばオブジェクトを運ぶ際に座ることを必要とするさまざまな挑戦的なHSIタスクに取り組む能力が大幅に妨げられます。
この問題に対処するために、マルチスキル統一と柔軟な適応が可能な単一の統一された変圧器ベースのポリシーであるTokenhsiを提示します。
重要な洞察は、ヒューマノイド固有受容を個別の共有トークンとしてモデル化し、マスキングメカニズムを介して個別のタスクトークンと結合することです。
このような統一されたポリシーにより、スキル全体で効果的な知識共有が可能になり、マルチタスクトレーニングが促進されます。
さらに、当社のポリシーアーキテクチャはさまざまな長さの入力をサポートし、学習スキルを新しいシナリオに柔軟に適応させることができます。
追加のタスクトークナーをトレーニングすることにより、相互作用ターゲットのジオメトリを変更するだけでなく、複数のスキルを調整して複雑なタスクに対処することもできます。
実験は、私たちのアプローチがさまざまなHSIタスクの汎用性、適応性、拡張性を大幅に改善できることを示しています。
ウェブサイト：https：//liangpan99.github.io/tokenhsi/

要約(オリジナル)

Synthesizing diverse and physically plausible Human-Scene Interactions (HSI) is pivotal for both computer animation and embodied AI. Despite encouraging progress, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of multiple skills, e.g., sitting down while carrying an object. To address this issue, we present TokenHSI, a single, unified transformer-based policy capable of multi-skill unification and flexible adaptation. The key insight is to model the humanoid proprioception as a separate shared token and combine it with distinct task tokens via a masking mechanism. Such a unified policy enables effective knowledge sharing across skills, thereby facilitating the multi-task training. Moreover, our policy architecture supports variable length inputs, enabling flexible adaptation of learned skills to new scenarios. By training additional task tokenizers, we can not only modify the geometries of interaction targets but also coordinate multiple skills to address complex tasks. The experiments demonstrate that our approach can significantly improve versatility, adaptability, and extensibility in various HSI tasks. Website: https://liangpan99.github.io/TokenHSI/

arxiv情報

著者	Liang Pan,Zeshi Yang,Zhiyang Dou,Wenjia Wang,Buzhen Huang,Bo Dai,Taku Komura,Jingbo Wang
発行日	2025-03-25 17:57:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー