FabuLight-ASD: Unveiling Speech Activity via Body Language

要約

マルチモーダル環境におけるアクティブ話者検出 (ASD) は、ビデオ会議から人間とロボットの対話まで、さまざまなアプリケーションにとって重要です。
このペーパーでは、顔、音声、体の姿勢情報を統合して検出精度と堅牢性を強化する高度な ASD モデルである FabuLight-ASD を紹介します。
私たちのモデルは、スケルトングラフを通じて表現される人間の姿勢データを組み込むことにより、既存の Light-ASD フレームワークに基づいて構築されており、計算オーバーヘッドを最小限に抑えます。
信頼性の高い顔と体の境界ボックスアノテーションで知られる Wilder Active Speaker Detection (WASD) データセットを使用して、現実世界のシナリオにおける FabuLight-ASD の有効性を実証します。
FabuLight-ASD は、全体の平均平均精度 (mAP) 94.3% を達成し、さまざまな困難なシナリオ全体で全体の mAP が 93.7% である Light-ASD を上回ります。
身体姿勢情報の組み込みは、音声障害、顔の遮蔽、および人間の声の背景雑音を伴うシナリオで観察される mAP の顕著な改善により、特に有利な効果を示しています。
さらに、効率分析では、パラメーター数 (27.3%) と積和演算 (最大 2.4%) のわずかな増加のみが示されており、モデルの効率と実現可能性が強調されています。
これらの発見は、身体姿勢データの統合を通じて ASD パフォーマンスを向上させる FabuLight-ASD の有効性を検証します。
FabuLight-ASD のコードとモデルの重みは、https://github.com/knowledgetechnologyuhh/FabuLight-ASD で入手できます。

要約(オリジナル)

Active speaker detection (ASD) in multimodal environments is crucial for various applications, from video conferencing to human-robot interaction. This paper introduces FabuLight-ASD, an advanced ASD model that integrates facial, audio, and body pose information to enhance detection accuracy and robustness. Our model builds upon the existing Light-ASD framework by incorporating human pose data, represented through skeleton graphs, which minimises computational overhead. Using the Wilder Active Speaker Detection (WASD) dataset, renowned for reliable face and body bounding box annotations, we demonstrate FabuLight-ASD’s effectiveness in real-world scenarios. Achieving an overall mean average precision (mAP) of 94.3%, FabuLight-ASD outperforms Light-ASD, which has an overall mAP of 93.7% across various challenging scenarios. The incorporation of body pose information shows a particularly advantageous impact, with notable improvements in mAP observed in scenarios with speech impairment, face occlusion, and human voice background noise. Furthermore, efficiency analysis indicates only a modest increase in parameter count (27.3%) and multiply-accumulate operations (up to 2.4%), underscoring the model’s efficiency and feasibility. These findings validate the efficacy of FabuLight-ASD in enhancing ASD performance through the integration of body pose data. FabuLight-ASD’s code and model weights are available at https://github.com/knowledgetechnologyuhh/FabuLight-ASD.

arxiv情報

著者	Hugo Carneiro,Stefan Wermter
発行日	2024-12-09 17:55:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FabuLight-ASD: Unveiling Speech Activity via Body Language

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー