Learning from Streaming Video with Orthogonal Gradients

要約

私たちは、自己教師の方法で、入力としての動画の連続的なストリームから学習する表現の課題に対処します。
これは、従来のトレーニングパラダイムによって予想される独立した同一に分布している（IID）サンプル仮定を満たす非冗長バッチを作成するために、トレーニング中にビデオが刻まれてシャッフルされるビデオ学習への標準的なアプローチとは異なります。
動画が入力の連続的なストリームとしてのみ利用可能である場合、IIDの仮定は明らかに壊れており、パフォーマンスが低下します。
シャッフルからシーケンシャル学習に移行するときのパフォーマンスの低下を実証します。1つのビデオ表現学習方法DORA、マルチビデオデータセットの標準VideoMAME、および将来のビデオ予測のタスクです。
このドロップに対処するために、トレーニング中に直交勾配を利用してバッチを切り離すために、標準オプティマイザーへの幾何学的修正を提案します。
提案された変更は、任意のオプティマイザーに適用できます。確率的勾配降下（SGD）およびAdamWで実証します。
提案されている直交オプティマイザーは、ダウンストリームタスクで評価されているように、ストリーミングビデオからトレーニングされたモデルを表現学習パフォーマンスの低下を軽減できます。
3つのシナリオ（Dora、VideoMomae、将来の予測）で、3つのシナリオすべてでOrthogonal Optimizerが強力なAdamwを上回ることを示します。

要約(オリジナル)

We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner. This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch that satisfies the independently and identically distributed (IID) sample assumption expected by conventional training paradigms. When videos are only available as a continuous stream of input, the IID assumption is evidently broken, leading to poor performance. We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks: the one-video representation learning method DoRA, standard VideoMAE on multi-video datasets, and the task of future video prediction. To address this drop, we propose a geometric modification to standard optimizers, to decorrelate batches by utilising orthogonal gradients during training. The proposed modification can be applied to any optimizer — we demonstrate it with Stochastic Gradient Descent (SGD) and AdamW. Our proposed orthogonal optimizer allows models trained from streaming videos to alleviate the drop in representation learning performance, as evaluated on downstream tasks. On three scenarios (DoRA, VideoMAE, future prediction), we show our orthogonal optimizer outperforms the strong AdamW in all three scenarios.

arxiv情報

著者	Tengda Han,Dilara Gokay,Joseph Heyward,Chuhan Zhang,Daniel Zoran,Viorica Pătrăucean,João Carreira,Dima Damen,Andrew Zisserman
発行日	2025-04-02 17:59:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning from Streaming Video with Orthogonal Gradients

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー