Anticipating Next Active Objects for Egocentric Videos

要約

この論文では、アクションが発生する前に、接触が発生する可能性がある特定の自己中心的なビデオクリップについて、将来の次のアクティブオブジェクトの位置を予測する問題を扱います。
観察されたクリップとアクションセグメントがいわゆる「接触時間」(TTC)セグメントによって分離されているシナリオでそのようなオブジェクトの位置を推定することを目的としているため、この問題はかなり困難です。
以前の手の動きや周囲との相互作用に基づいて人の行動を予測するための多くの方法が提案されています。
しかし、次に考えられる対話可能なオブジェクトと、TTC ウィンドウ中の一人称の動きと視野のドリフトに関するその将来の位置を調査する試みは行われていません。
これを次のアクティブなオブジェクトの予測 (ANACTO) のタスクとして定義します。
この目的を達成するために、自己中心的なクリップ内の次のアクティブオブジェクトを識別して位置を特定するためのトランスフォーマーベースのセルフアテンションフレームワークを提案します。
EpicKitchens-100、EGTEA+、Ego4D の 3 つのデータセットでメソッドのベンチマークを行います。
最初の 2 つのデータセットにはアノテーションも提供します。
私たちのアプローチは、関連するベースライン手法と比較して最も優れたパフォーマンスを発揮します。
また、さまざまな条件における提案された方法とベースラインの方法の有効性を理解するためにアブレーション研究も実施します。
コードおよび ANACTO タスクの注釈は、書類が受理されると利用可能になります。

要約(オリジナル)

This paper addresses the problem of anticipating the next-active-object location in the future, for a given egocentric video clip where the contact might happen, before any action takes place. The problem is considerably hard, as we aim at estimating the position of such objects in a scenario where the observed clip and the action segment are separated by the so-called “time to contact” (TTC) segment. Many methods have been proposed to anticipate the action of a person based on previous hand movements and interactions with the surroundings. However, there have been no attempts to investigate the next possible interactable object, and its future location with respect to the first-person’s motion and the field-of-view drift during the TTC window. We define this as the task of Anticipating the Next ACTive Object (ANACTO). To this end, we propose a transformer-based self-attention framework to identify and locate the next-active-object in an egocentric clip. We benchmark our method on three datasets: EpicKitchens-100, EGTEA+ and Ego4D. We also provide annotations for the first two datasets. Our approach performs best compared to relevant baseline methods. We also conduct ablation studies to understand the effectiveness of the proposed and baseline methods on varying conditions. Code and ANACTO task annotations will be made available upon paper acceptance.

arxiv情報

著者	Sanket Thakur,Cigdem Beyan,Pietro Morerio,Vittorio Murino,Alessio Del Bue
発行日	2023-10-31 15:42:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Anticipating Next Active Objects for Egocentric Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー