Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models

要約

自然言語処理の分野（NLP）の重要な研究方向としての皮肉検出は、広範囲にわたる注目を集めています。
従来の皮肉検出タスクは通常、単一モーダルのアプローチ（テキストなど）に焦点を合わせていますが、皮肉の暗黙的で微妙な性質のため、そのような方法は満足のいく結果をもたらさないことがよくあります。
近年、研究者は皮肉検出の焦点をマルチモーダルアプローチにシフトしています。
ただし、マルチモーダル情報を効果的に活用して皮肉コンテンツを正確に識別することは、さらなる調査を保証する課題のままです。
さまざまな情報源に対して、マルチモーダル大手言語モデル（MLLMS）の強力な統合処理機能を活用すると、革新的なマルチモーダルコマンダーGPTフレームワークを提案します。
軍事戦略に触発された私たちは、最初に皮肉検出タスクを6つの異なるサブタスクに分解します。
次に、中央司令官（意思決定者）が、それぞれの特定のサブタスクに対処するために、最適な大規模な言語モデルを割り当てます。
最終的に、各モデルからの検出結果は皮肉を特定するために集約されます。
4つのマルチモーダル大手言語モデルと6つのプロンプト戦略を利用して、MMSDおよびMMSD 2.0で広範な実験を実施しました。
私たちの実験は、私たちのアプローチが最新のパフォーマンスを達成し、F1スコアが19.3％改善され、微調整や根本的な理論的根拠を必要とせずに達成することを示しています。

要約(オリジナル)

Sarcasm detection, as a crucial research direction in the field of Natural Language Processing (NLP), has attracted widespread attention. Traditional sarcasm detection tasks have typically focused on single-modal approaches (e.g., text), but due to the implicit and subtle nature of sarcasm, such methods often fail to yield satisfactory results. In recent years, researchers have shifted the focus of sarcasm detection to multi-modal approaches. However, effectively leveraging multi-modal information to accurately identify sarcastic content remains a challenge that warrants further exploration. Leveraging the powerful integrated processing capabilities of Multi-Modal Large Language Models (MLLMs) for various information sources, we propose an innovative multi-modal Commander-GPT framework. Inspired by military strategy, we first decompose the sarcasm detection task into six distinct sub-tasks. A central commander (decision-maker) then assigns the best-suited large language model to address each specific sub-task. Ultimately, the detection results from each model are aggregated to identify sarcasm. We conducted extensive experiments on MMSD and MMSD 2.0, utilizing four multi-modal large language models and six prompting strategies. Our experiments demonstrate that our approach achieves state-of-the-art performance, with a 19.3% improvement in F1 score, without necessitating fine-tuning or ground-truth rationales.

arxiv情報

著者	Yazhou Zhang,Chunwang Zou,Bo Wang,Jing Qin
発行日	2025-03-25 04:33:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー