M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

要約

この論文では、認知負荷評価 (CLA) 用の AVCAffe データセットに適用される、新しいマルチモーダルマルチタスク学習フレームワークである M&M モデルを紹介します。
M&M は、オーディオおよびビデオ入力に特化したストリームを備えたデュアルパスウェイアーキテクチャを通じてオーディオビジュアルキューを独自に統合します。
重要な革新は、同期されたマルチタスクのためにさまざまなモダリティを融合する、クロスモダリティマルチヘッドアテンションメカニズムにあります。
もう 1 つの注目すべき機能は、モデルの 3 つの特化したブランチであり、それぞれが特定の認知負荷ラベルに合わせて調整されており、微妙なタスク固有の分析が可能になります。
M\&M は、AVCAffe のシングルタスクベースラインと比較すると控えめなパフォーマンスを示していますが、統合マルチモーダル処理の有望なフレームワークを示しています。
この取り組みは、複雑なタスク処理のための多様なデータ型の融合を強調する、マルチモーダルマルチタスク学習システムの将来の機能強化への道を開きます。

要約(オリジナル)

This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model’s three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe’s single-task baseline, M\&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.

arxiv情報

著者	Long Nguyen-Phuoc,Renald Gaboriau,Dimitri Delacroix,Laurent Navarro
発行日	2024-03-14 14:49:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー