Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

要約

マルチモーダルな発話の意味論を発見することは、人間の言語を理解し、人間と機械の相互作用を強化するために不可欠です。
既存の方法では、教師なしのシナリオで複雑な意味を識別するために非言語情報を活用することに限界があります。
この論文では、この分野に先駆的な貢献をする、新しい教師なしマルチモーダルクラスタリング手法 (UMC) を紹介します。
UMC は、マルチモーダルデータの拡張ビューを構築するための独自のアプローチを導入しています。拡張ビューは、その後のクラスタリングのために適切に初期化された表現を確立するための事前トレーニングの実行に使用されます。
各サンプルの最も近いサンプルの密度によって評価され、表現学習のガイダンスとして高品質のサンプルを動的に選択する革新的な戦略が提案されています。
さらに、サンプルの選択を調整するために、各クラスターの上位 $K$ パラメーターの最適な値を自動的に決定する機能も備えています。
最後に、高品質サンプルと低品質サンプルの両方を使用して、効果的なクラスタリングに役立つ表現を学習します。
マルチモーダルな意図と対話行為のベンチマークデータセットに基づいてベースラインを構築します。
UMC は、クラスタリングメトリクスにおいて、最先端の手法と比較して 2 ～ 6\% スコアの顕著な向上を示し、この分野での最初の成功を示しています。
完全なコードとデータは https://github.com/thuiar/UMC で入手できます。

要約(オリジナル)

Discovering the semantics of multimodal utterances is essential for understanding human language and enhancing human-machine interactions. Existing methods manifest limitations in leveraging nonverbal information for discerning complex semantics in unsupervised scenarios. This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training to establish well-initialized representations for subsequent clustering. An innovative strategy is proposed to dynamically select high-quality samples as guidance for representation learning, gauged by the density of each sample’s nearest neighbors. Besides, it is equipped to automatically determine the optimal value for the top-$K$ parameter in each cluster to refine sample selection. Finally, both high- and low-quality samples are used to learn representations conducive to effective clustering. We build baselines on benchmark multimodal intent and dialogue act datasets. UMC shows remarkable improvements of 2-6\% scores in clustering metrics over state-of-the-art methods, marking the first successful endeavor in this domain. The complete code and data are available at https://github.com/thuiar/UMC.

arxiv情報

著者	Hanlei Zhang,Hua Xu,Fei Long,Xin Wang,Kai Gao
発行日	2024-05-21 13:24:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー