Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation

要約

大規模言語モデル (LLM) をテキストベースのアクション生成モデルに組み込むためのプラグアンドプレイフレームワークである Action-GPT を紹介します。
現在のモーションキャプチャデータセットのアクションフレーズには、最小限の要点情報が含まれています。
LLM のプロンプトを慎重に作成することにより、アクションの詳細で詳細な説明を生成します。
元のアクションフレーズの代わりにこれらの詳細な説明を利用すると、テキストとモーションスペースの配置が改善されることがわかりました。
私たちの実験は、最近のテキストからモーションへのモデルによって生成された合成モーションの品質の質的および量的改善を示しています。
コード、事前トレーニング済みのモデル、およびサンプルビデオは、https://actiongpt.github.io で入手できます。

要約(オリジナル)

We introduce Action-GPT, a plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models. Code, pretrained models and sample videos will be made available at https://actiongpt.github.io

arxiv情報

著者	Sai Shashank Kalakonda,Shubh Maheshwari,Ravi Kiran Sarvadevabhatla
発行日	2022-11-28 17:57:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Zero Shot Action Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー