Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

要約

ビデオからの動的なシーングラフの生成は、シーンの時間的なダイナミクスと、予測に固有の時間的な変動のために困難です。
長期的な一時的な依存関係をキャプチャすることが、動的シーングラフを効果的に生成するための鍵であるという仮説を立てています。
トランスフォーマーを使用して、オブジェクトレベルの長期的なトラックレットでオブジェクトレベルの一貫性とオブジェクト間の関係のダイナミクスをキャプチャすることにより、ビデオで長期的な依存関係を学習することを提案します。
実験結果は、ダイナミックシーングラフ検出トランスフォーマー (DSG-DETR) が、ベンチマークデータセット Action Genome で最先端の方法よりも大幅に優れていることを示しています。
私たちのアブレーション研究は、提案されたアプローチの各コンポーネントの有効性を検証します。
ソースコードは、https://github.com/Shengyu-Feng/DSG-DETR で入手できます。

要約(オリジナル)

Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We hypothesize that capturing long-term temporal dependencies is the key to effective generation of dynamic scene graphs. We propose to learn the long-term dependencies in a video by capturing the object-level consistency and inter-object relationship dynamics over object-level long-term tracklets using transformers. Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods by a significant margin on the benchmark dataset Action Genome. Our ablation studies validate the effectiveness of each component of the proposed approach. The source code is available at https://github.com/Shengyu-Feng/DSG-DETR.

arxiv情報

著者	Shengyu Feng,Subarna Tripathi,Hesham Mostafa,Marcel Nassar,Somdeb Majumdar
発行日	2022-10-19 16:58:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー