TAAT: Think and Act from Arbitrary Texts in Text2Motion

要約

Text to Motion は、テキストから人間の動きを生成することを目的としています。
既存の設定では、テキストにアクションラベルが含まれることを前提としているため、実際のシナリオでの柔軟性が制限されます。
この論文では、テキストが任意であるというより現実的な仮定を使ってこのタスクを拡張します。
具体的には、私たちの設定では、任意のテキストには、アクションラベルで構成される既存のアクションテキストが含まれ、明示的なアクションラベルのないシーンテキストが導入されます。
この実際的な問題に対処するために、追加のシーンテキストを組み込むことで HUMANML3D データセットのアクションテキストを拡張し、それによって新しいデータセット HUMANML3D++ を作成します。
同時に、大規模言語モデル (LLM) を使用して任意のテキストからアクション表現を抽出し、その後モーションを生成する単純なフレームワークを提案します。
さらに、既存の評価手法を強化して、その不十分な点に対処します。
既存および提案されたデータセットに対する提案されたフレームワークの有効性を検証するために、さまざまなアプリケーションシナリオの下で広範な実験が実施されます。
結果は、この現実的な設定での Text to Motion は非常に困難であり、この実用的な方向での新しい研究を促進することを示しています。
データセットとコードがリリースされます。

要約(オリジナル)

Text to Motion aims to generate human motions from texts. Existing settings assume that texts include action labels, which limits flexibility in practical scenarios. This paper extends this task with a more realistic assumption that the texts are arbitrary. Specifically, in our setting, arbitrary texts include existing action texts composed of action labels and introduce scene texts without explicit action labels. To address this practical issue, we extend the action texts in the HUMANML3D dataset by incorporating additional scene texts, thereby creating a new dataset, HUMANML3D++. Concurrently, we propose a simple framework that extracts action representations from arbitrary texts using a Large Language Model (LLM) and subsequently generates motions. Furthermore, we enhance the existing evaluation methodologies to address their inadequacies. Extensive experiments are conducted under different application scenarios to validate the effectiveness of the proposed framework on existing and proposed datasets. The results indicate that Text to Motion in this realistic setting is very challenging, fostering new research in this practical direction. Our dataset and code will be released.

arxiv情報

著者	Runqi Wang,Caoyuan Ma,Guopeng Li,Zheng Wang
発行日	2024-08-27 13:36:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TAAT: Think and Act from Arbitrary Texts in Text2Motion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー