AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation

要約

皮肉を効果的に検出するには、声のトーンや表情など、文脈を微妙に理解する必要があります。
しかし、皮肉検出におけるマルチモーダルな計算手法への進歩は、データの不足による課題に直面しています。
これに対処するために、AMuSeD (二峰性データ拡張を組み込んだ多峰性皮肉検出のための注意深層ニューラルネットワーク) を紹介します。
このアプローチでは、Multimodal Sarcasm Detection Dataset (MUStARD) を利用し、2 段階のバイモーダルデータ拡張戦略を導入します。
最初のフェーズでは、いくつかの 2 次言語からの逆翻訳を通じてさまざまなテキストサンプルを生成します。
第 2 段階では、FastSpeech 2 ベースの音声合成システムを改良し、皮肉のイントネーションを維持するために皮肉専用に調整します。
クラウドベースの Text-to-Speech (TTS) サービスと並行して、この微調整された FastSpeech 2 システムは、テキスト拡張に対応する音声を生成します。
また、テキストと音声データを効果的に結合するためのさまざまな注意メカニズムも調査し、二峰性統合には自己注意が最も効率的であることを発見しました。
私たちの実験では、拡張と注意を組み合わせたこのアプローチが、テキスト音声モダリティで 81.0% という大幅な F1 スコアを達成し、MUStARD データセットの 3 つのモダリティを使用するモデルさえも上回ることが明らかになりました。

要約(オリジナル)

Detecting sarcasm effectively requires a nuanced understanding of context, including vocal tones and facial expressions. The progression towards multimodal computational methods in sarcasm detection, however, faces challenges due to the scarcity of data. To address this, we present AMuSeD (Attentive deep neural network for MUltimodal Sarcasm dEtection incorporating bi-modal Data augmentation). This approach utilizes the Multimodal Sarcasm Detection Dataset (MUStARD) and introduces a two-phase bimodal data augmentation strategy. The first phase involves generating varied text samples through Back Translation from several secondary languages. The second phase involves the refinement of a FastSpeech 2-based speech synthesis system, tailored specifically for sarcasm to retain sarcastic intonations. Alongside a cloud-based Text-to-Speech (TTS) service, this Fine-tuned FastSpeech 2 system produces corresponding audio for the text augmentations. We also investigate various attention mechanisms for effectively merging text and audio data, finding self-attention to be the most efficient for bimodal integration. Our experiments reveal that this combined augmentation and attention approach achieves a significant F1-score of 81.0% in text-audio modalities, surpassing even models that use three modalities from the MUStARD dataset.

arxiv情報

著者	Xiyuan Gao,Shubhi Bansal,Kushaan Gowda,Zhu Li,Shekhar Nayak,Nagendra Kumar,Matt Coler
発行日	2024-12-13 12:42:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー