Dual-Stream Transformer for Generic Event Boundary Captioning

要約

このホワイトペーパーでは、CVPR2022 Generic Event Boundary Captioning (GEBC) コンペティションのチャンピオンソリューションについて説明します。
GEBC では、キャプションモデルが特定のビデオ境界付近での瞬間的なステータス変化を理解する必要があるため、従来のビデオキャプションタスクよりもはるかに困難になります。
このホワイトペーパーでは、ビデオコンテンツのエンコーディングとキャプション生成の両方を改善したデュアルストリームトランスフォーマーを提案します。
さらに、モデルがキャプションを生成するのに役立つヒントとして、境界のタイプを利用します。
(2) 境界キャプションの識別表現を学習するために、Dual-Stream Transformer と呼ばれるモデルを特に設計します。
(3) 内容に即した人間らしいキャプションの生成に向けて、単語レベルのアンサンブル戦略を設計することにより、説明の品質を向上させます。
GEBC テスト分割の有望な結果は、提案されたモデルの有効性を示しています。

要約(オリジナル)

This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition. GEBC requires the captioning model to have a comprehension of instantaneous status changes around the given video boundary, which makes it much more challenging than conventional video captioning task. In this paper, a Dual-Stream Transformer with improvements on both video content encoding and captions generation is proposed: (1) We utilize three pre-trained models to extract the video features from different granularities. Moreover, we exploit the types of boundary as hints to help the model generate captions. (2) We particularly design an model, termed as Dual-Stream Transformer, to learn discriminative representations for boundary captioning. (3) Towards generating content-relevant and human-like captions, we improve the description quality by designing a word-level ensemble strategy. The promising results on the GEBC test split demonstrate the efficacy of our proposed model.

arxiv情報

著者	Xin Gu,Hanhua Ye,Guang Chen,Yufei Wang,Libo Zhang,Longyin Wen
発行日	2023-03-21 11:32:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dual-Stream Transformer for Generic Event Boundary Captioning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー