Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

要約

近年のインタラクティブビデオ生成の進歩は有望な結果を示しているが、既存のアプローチは、履歴コンテキストの利用が限られているため、長時間のビデオ生成におけるシーンコンシステントメモリ機能に苦戦している。本研究では、動画生成のためのメモリとして過去のコンテキストを利用するContext-as-Memoryを提案する。(1)追加の後処理なしにコンテキストをフレーム形式で保存する。(2)入力時にフレーム次元に沿って予測されるコンテキストとフレームを連結することで条件付けを行い、外部制御モジュールを必要としない。さらに、すべての履歴コンテキストを取り込む膨大な計算オーバーヘッドを考慮し、カメラポーズ間のFOV（Field of View）オーバーラップを決定することにより、真に関連するコンテキストフレームを選択する記憶検索モジュールを提案し、実質的な情報損失なしに候補フレーム数を大幅に削減する。実験により、Context-as-Memoryは、SOTAと比較して、インタラクティブな長時間のビデオ生成において優れた記憶能力を達成し、トレーニング中に見られなかったオープンドメインのシナリオにも効果的に汎化することが実証された。私たちのプロジェクトページのリンクはhttps://context-as-memory.github.io/。

要約(オリジナル)

Recent advances in interactive video generation have shown promising results, yet existing approaches struggle with scene-consistent memory capabilities in long video generation due to limited use of historical context. In this work, we propose Context-as-Memory, which utilizes historical context as memory for video generation. It includes two simple yet effective designs: (1) storing context in frame format without additional post-processing; (2) conditioning by concatenating context and frames to be predicted along the frame dimension at the input, requiring no external control modules. Furthermore, considering the enormous computational overhead of incorporating all historical context, we propose the Memory Retrieval module to select truly relevant context frames by determining FOV (Field of View) overlap between camera poses, which significantly reduces the number of candidate frames without substantial information loss. Experiments demonstrate that Context-as-Memory achieves superior memory capabilities in interactive long video generation compared to SOTAs, even generalizing effectively to open-domain scenarios not seen during training. The link of our project page is https://context-as-memory.github.io/.

arxiv情報

著者	Jiwen Yu,Jianhong Bai,Yiran Qin,Quande Liu,Xintao Wang,Pengfei Wan,Di Zhang,Xihui Liu
発行日	2025-06-03 17:59:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー