AVSegFormer: Audio-Visual Segmentation with Transformer

要約

オーディオとビジョンの組み合わせは、マルチモーダルコミュニティで長い間注目されてきたトピックです。
最近、特定のビデオ内の音声オブジェクトを特定してセグメント化することを目的とした、新しいオーディオビジュアルセグメンテーション (AVS) タスクが導入されました。
このタスクでは、オーディオ主導のピクセルレベルのシーンの理解が初めて必要となり、重大な課題が生じます。
この論文では、トランスフォーマーアーキテクチャを利用する AVS タスク用の新しいフレームワークである AVSegFormer を提案します。
具体的には、オーディオクエリと学習可能なクエリをトランスデコーダに導入し、ネットワークが関心のある視覚的特徴に選択的に対応できるようにします。
さらに、関連する空間チャネルを増幅し、無関係な空間チャネルを抑制することにより、視覚的特徴を動的に調整できるオーディオビジュアルミキサーを紹介します。
さらに、デコーダの監視を強化するために中間マスク損失を考案し、ネットワークがより正確な中間予測を生成できるようにします。
広範な実験により、AVSegFormer が AVS ベンチマークで最先端の結果を達成することが実証されました。
コードは https://github.com/vvvb-github/AVSegFormer で入手できます。

要約(オリジナル)

The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture. Specifically, we introduce audio queries and learnable queries into the transformer decoder, enabling the network to selectively attend to interested visual features. Besides, we present an audio-visual mixer, which can dynamically adjust visual features by amplifying relevant and suppressing irrelevant spatial channels. Additionally, we devise an intermediate mask loss to enhance the supervision of the decoder, encouraging the network to produce more accurate intermediate predictions. Extensive experiments demonstrate that AVSegFormer achieves state-of-the-art results on the AVS benchmark. The code is available at https://github.com/vvvb-github/AVSegFormer.

arxiv情報

著者	Shengyi Gao,Zhe Chen,Guo Chen,Wenhai Wang,Tong Lu
発行日	2023-07-03 16:37:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AVSegFormer: Audio-Visual Segmentation with Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー