ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

要約

このレポートでは、ECCV 2022 の Ego4D モーメントクエリチャレンジへの ReLER@ZJU1 提出物を紹介します。このタスクの目標は、自己中心的なビデオで考えられるアクティビティのすべてのインスタンスを取得してローカライズすることです。
Ego4D データセットは、ビデオの時間的継続時間が非常に長く、各ビデオにきめ細かいアクションクラスを持つ複数のアクションインスタンスが含まれているため、一時的なアクションのローカリゼーションタスクにとって困難です。
これらの問題に対処するために、マルチスケールトランスフォーマーを利用してさまざまなアクションカテゴリを分類し、各インスタンスの境界を予測します。
さらに、長いビデオの長期的な時間的依存関係をより適切にキャプチャするために、セグメントレベルの再発メカニズムを提案します。
すべてのビデオ機能をトランスエンコーダーに直接供給する場合と比較して、提案されたセグメントレベルの再帰メカニズムは、最適化の問題を軽減し、より優れたパフォーマンスを実現します。
最終提出物は Recall@1,tIoU=0.5 スコア 37.24、平均 mAP スコア 17.67 を達成し、リーダーボードで 3 位になりました。

要約(オリジナル)

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022. In this task, the goal is to retrieve and localize all instances of possible activities in egocentric videos. Ego4D dataset is challenging for the temporal action localization task as the temporal duration of the videos is quite long and each video contains multiple action instances with fine-grained action classes. To address these problems, we utilize a multi-scale transformer to classify different action categories and predict the boundary of each instance. Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism. Compared with directly feeding all video features to the transformer encoder, the proposed segment-level recurrence mechanism alleviates the optimization difficulties and achieves better performance. The final submission achieved Recall@1,tIoU=0.5 score of 37.24, average mAP score of 17.67 and took 3-rd place on the leaderboard.

arxiv情報

著者	Jiayi Shao,Xiaohan Wang,Yi Yang
発行日	2022-11-17 14:28:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー