AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

要約

タイトル：AQ-GT：時間的に整列し量子化されたGRU-Transformerによる共話ジェスチャー合成

要約：

– 共話ジェスチャーは、多モーダルな人工エージェントの創造において、現実的でコンテキストに即したものを生成することが求められている。
– これまでの方法は、共話ジェスチャーの表現と生成された動きの直接的な対応関係を学習することに焦点を当ててきたが、人間の評価では自然に見えるものの、信じられないほどのジェスチャーを生成することが多かった。
– 本研究では、生成的対抗ネットワークを使用して部分ジェスチャーシーケンスを事前トレーニングし、量子化パイプラインを採用することで、コードブックベクトルを生成し、再構成する手法を提案する。
– ラテント空間表現に対するマッピングを学習することで、直接的なベクトル表現へのマッピングを回避し、高度にリアルで表現力豊かなジェスチャーの生成を促進し、同時に生成プロセスでのアーティファクトを回避することができる。
– 確立された共話ジェスチャー生成方法や人間の振る舞いの既存のデータセットと比較して、本提案手法の性能を評価し、研究結果を分析することで、独自性を証明し、人間のジェスチャーに近いものを生成することができることが示された。
– 公開用のデータパイプラインおよび生成フレームワークを提供する。

要約(オリジナル)

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.

arxiv情報

著者	Hendric Voß,Stefan Kopp
発行日	2023-05-02 07:59:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー