Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

要約

この作業は、深層学習を使用してサッカービデオのキャプションを生成することを目的としています。
これに関連して、このホワイトペーパーでは、データセット、モデル、およびトリプルレベルの評価を紹介します。
このデータセットは、約 500 時間の \emph{SoccerNet} ビデオの 22,000 のキャプションとクリップのペアと 3 つの視覚的特徴 (画像、オプティカルフロー、修復) で構成されています。
このモデルは 3 つの部分に分かれています。トランスフォーマーは言語を学習し、ConvNets は視覚を学習し、言語機能と視覚機能の融合がキャプションを生成します。
この論文では、生成されたキャプションを次の 3 つのレベルで評価することを提案しています。構文 (BLEU スコアや CIDEr などの一般的に使用される評価指標)、意味 (ドメインエキスパートの説明の質)、およびコーパス (生成されたキャプションの多様性) です。
この論文は、生成されたキャプションの多様性が改善され (0.07 から 0.18 に到達)、選択された単語を優先するセマンティクス関連の損失があることを示しています。
セマンティクス関連の損失と、より多くの視覚的機能 (オプティカルフロー、修復) の利用により、正規化されたキャプションスコアが 28\% 改善されました。
この作品のウェブページ：https://sites.google.com/view/soccercaptioning}{https://sites.google.com/view/soccercaptioning

要約(オリジナル)

This work aims at generating captions for soccer videos using deep learning. In this context, this paper introduces a dataset, model, and triple-level evaluation. The dataset consists of 22k caption-clip pairs and three visual features (images, optical flow, inpainting) for ~500 hours of \emph{SoccerNet} videos. The model is divided into three parts: a transformer learns language, ConvNets learn vision, and a fusion of linguistic and visual features generates captions. The paper suggests evaluating generated captions at three levels: syntax (the commonly used evaluation metrics such as BLEU-score and CIDEr), meaning (the quality of descriptions for a domain expert), and corpus (the diversity of generated captions). The paper shows that the diversity of generated captions has improved (from 0.07 reaching 0.18) with semantics-related losses that prioritize selected words. Semantics-related losses and the utilization of more visual features (optical flow, inpainting) improved the normalized captioning score by 28\%. The web page of this work: https://sites.google.com/view/soccercaptioning}{https://sites.google.com/view/soccercaptioning

arxiv情報

著者	Ahmad Hammoudeh,Bastien Vanderplaetse,Stéphane Dupont
発行日	2022-11-30 12:26:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー