GesGPT: Speech Gesture Synthesis With Text Parsing from GPT

要約

ジェスチャー合成は、重要な研究分野として大きな注目を集めており、音声またはテキスト入力に対応する文脈的に適切で自然なジェスチャーを生成することに焦点を当てています。
深層学習ベースのアプローチは目覚ましい進歩を遂げましたが、多くの場合、テキストに含まれる豊富な意味情報を見落としており、表現力や意味のあるジェスチャが低下しています。
GesGPT は、GPT などの大規模言語モデル (LLM) のセマンティック分析機能を活用するジェスチャ生成への新しいアプローチです。
テキスト分析のための LLM の強みを利用することにより、テキスト入力からジェスチャ関連情報を抽出するプロンプトを設計します。
私たちの方法では、ジェスチャ生成を GPT に基づく意図分類問題に変換する迅速な原則を開発し、精選されたジェスチャライブラリと統合モジュールを利用して意味的に豊富な共同音声ジェスチャを生成する必要があります。
実験結果は、GesGPT が文脈的に適切で表現力豊かなジェスチャを効果的に生成することを示しており、セマンティックな共同音声ジェスチャの生成に関する新しい視点を提供します。

要約(オリジナル)

Gesture synthesis has gained significant attention as a critical research area, focusing on producing contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. We propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of Large Language Models (LLMs), such as GPT. By capitalizing on the strengths of LLMs for text analysis, we design prompts to extract gesture-related information from textual input. Our method entails developing prompt principles that transform gesture generation into an intention classification problem based on GPT, and utilizing a curated gesture library and integration module to produce semantically rich co-speech gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures, offering a new perspective on semantic co-speech gesture generation.

arxiv情報

著者	Nan Gao,Zeyu Zhao,Zhi Zeng,Shuwu Zhang,Dongdong Weng
発行日	2023-03-23 03:30:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GesGPT: Speech Gesture Synthesis With Text Parsing from GPT

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー