Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

要約

私たちは、類似しているがわずかに異なるオーディオクリップの入力ペア間の意味論的な違いを記述するためのオーディオキャプションの新しい拡張タスクとして、オーディオディファレンスキャプション (ADC) を提案しました。
ADC は、従来のオーディオキャプションでは、類似したオーディオクリップに対して同様のキャプションが生成されることがあり、内容の違いを説明できないという問題を解決します。
また、一対のオーディオクリップを比較することによって相違点を抽出するクロスアテンション集中型トランスエンコーダと、潜在空間の相違点を強調するための類似性と相違性のもつれ解除を提案します。
提案された方法を評価するために、類似しているがわずかに異なるオーディオクリップのペアと、それらの違いについて人間による注釈が付けられた説明で構成される AudioDiffCaps データセットを構築しました。
AudioDiffCaps データセットを使用した実験では、提案された方法が ADC タスクを効果的に解決し、トランスエンコーダーで視覚化することで差を抽出するための注意の重みを改善することが示されました。

要約(オリジナル)

We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips. The ADC solves the problem that conventional audio captioning sometimes generates similar captions for similar audio clips, failing to describe the difference in content. We also propose a cross-attention-concentrated transformer encoder to extract differences by comparing a pair of audio clips and a similarity-discrepancy disentanglement to emphasize the difference in the latent space. To evaluate the proposed methods, we built an AudioDiffCaps dataset consisting of pairs of similar but slightly different audio clips with human-annotated descriptions of their differences. The experiment with the AudioDiffCaps dataset showed that the proposed methods solve the ADC task effectively and improve the attention weights to extract the difference by visualizing them in the transformer encoder.

arxiv情報

著者	Daiki Takeuchi,Yasunori Ohishi,Daisuke Niizumi,Noboru Harada,Kunio Kashino
発行日	2023-08-23 05:13:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー