ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

要約

時間的なアクションのセグメンテーションと長期的なアクションの予測は、ビデオ内のアクションを時間的に分析するための 2 つの一般的なビジョンタスクです。
明らかな関連性と潜在的な相補性にもかかわらず、これら 2 つの問題は、別個の別個の課題として研究されてきました。
この研究では、ActFusion と呼ばれる統合拡散モデルを共同で使用して、アクションのセグメント化とアクションの予測という 2 つの問題に取り組みます。
統合の重要なアイデアは、シーケンスの可視部分と不可視部分の両方を統合された方法で効果的に処理できるようにモデルをトレーニングすることです。
目に見える部分は時間的な分割に使用され、目に見えない部分は将来の予測に使用されます。
この目的を達成するために、トレーニング中に新しい予測マスキング戦略を導入します。この戦略では、ビデオフレームの後半部分が不可視としてマスクされ、学習可能なトークンがこれらのフレームを置き換えて、不可視の未来を予測する方法を学習します。
実験結果は、アクションのセグメント化と予測の間の双方向の利点を示しています。
ActFusion は、50 サラダ、朝食、GTEA の標準ベンチマーク全体で最先端のパフォーマンスを達成し、共同学習による単一の統合モデルを使用して 2 つのタスクの両方でタスク固有のモデルを上回ります。

要約(オリジナル)

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated as separate and distinct tasks. In this work, we tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model dubbed ActFusion. The key idea to unification is to train the model to effectively handle both visible and invisible parts of the sequence in an integrated manner; the visible part is for temporal segmentation, and the invisible part is for future anticipation. To this end, we introduce a new anticipative masking strategy during training in which a late part of the video frames is masked as invisible, and learnable tokens replace these frames to learn to predict the invisible future. Experimental results demonstrate the bi-directional benefits between action segmentation and anticipation. ActFusion achieves the state-of-the-art performance across the standard benchmarks of 50 Salads, Breakfast, and GTEA, outperforming task-specific models in both of the two tasks with a single unified model through joint learning.

arxiv情報

著者	Dayoung Gong,Suha Kwak,Minsu Cho
発行日	2024-12-05 17:12:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー