FlexiAST: Flexibility is What AST Needs

要約

この作業の目的は、オーディオスペクトログラムトランスフォーマー (AST) にパッチサイズの柔軟性を与えることです。
AST の最近の進歩により、さまざまなオーディオベースのタスクで優れたパフォーマンスが示されています。
ただし、標準 AST のパフォーマンスは、トレーニング中に使用したパッチサイズとは異なるパッチサイズを使用して評価すると大幅に低下します。
その結果、AST モデルは通常、パッチサイズの変更に対応するために再トレーニングされます。
この制限を克服するために、この論文では、アーキテクチャを変更することなく標準 AST モデルに柔軟性を提供し、推論段階でさまざまなパッチサイズで動作できるようにするトレーニング手順、FlexiAST を提案します。
この提案されたトレーニングアプローチは、ランダムなパッチサイズの選択と、パッチと位置の埋め込み重みのサイズ変更を単純に利用します。
私たちの実験では、FlexiAST が音声分類タスクのさまざまなデータセット上のさまざまなパッチサイズで評価能力を維持しながら、標準の AST モデルと同様のパフォーマンスを提供することが示されました。

要約(オリジナル)

The objective of this work is to give patch-size flexibility to Audio Spectrogram Transformers (AST). Recent advancements in ASTs have shown superior performance in various audio-based tasks. However, the performance of standard ASTs degrades drastically when evaluated using different patch sizes from that used during training. As a result, AST models are typically re-trained to accommodate changes in patch sizes. To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage – FlexiAST. This proposed training approach simply utilizes random patch size selection and resizing of patch and positional embedding weights. Our experiments show that FlexiAST gives similar performance to standard AST models while maintaining its evaluation ability at various patch sizes on different datasets for audio classification tasks.

arxiv情報

著者	Jiu Feng,Mehmet Hamza Erol,Joon Son Chung,Arda Senocak
発行日	2023-07-18 14:30:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FlexiAST: Flexibility is What AST Needs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー