AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

要約

このレポートでは、2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge への提出の技術的な詳細を紹介します。
タスクは、オーディオサンプルから対応するアクションラベルへのマッピングを学習することです。
この目標を達成するために、オーディオサンプルの時間-周波数ログメルスペクトログラムで動作する AudioInceptionNeXt と呼ばれる、シンプルで効果的なシングルストリーム CNN ベースのアーキテクチャを提案します。
InceptionNeXt の設計に基づいて、AudioInceptionNeXt ブロックで並列マルチスケールの深さ方向に分離可能な畳み込みカーネルを提案します。これにより、モデルが時間と周波数の情報をより効果的に学習できるようになります。
大規模な分離可能なカーネルは、長期間のアクティビティとグローバルな周波数セマンティック情報をキャプチャし、一方、小規模な分離可能なカーネルは、短期間のアクティビティと周波数情報の局所的な詳細をキャプチャします。
私たちのアプローチは、チャレンジテストセットでトップ 1 の精度の 55.43% を達成し、公開リーダーボードで 1 位にランクされました。
コードは https://github.com/StevenLauHKHK/AudioInceptionNeXt.git で匿名で入手できます。

要約(オリジナル)

This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.

arxiv情報

著者	Kin Wai Lau,Yasar Abbas Ur Rehman,Yuyang Xie,Lan Ma
発行日	2023-07-14 10:39:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー