Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

要約

ビデオ生成の最近の進歩により、ビデオ編集技術の開発が促進されました。これは、反転ベースとエンドツーエンドの方法に分類できます。
ただし、現在のビデオ編集方法は依然としていくつかの課題に悩まされています。
反転ベースの方法は、トレーニングなしで柔軟性がありますが、推論中は時間がかかり、きめの細かい編集命令と闘い、アーティファクトとジッターを生成します。
一方、トレーニングのために編集されたビデオペアに依存しているエンドツーエンドの方法は、より速い推論速度を提供しますが、多くの場合、高品質のトレーニングビデオペアがないために編集結果が低下します。
この論文では、エンドツーエンドの方法のギャップを閉じるために、高品質のビデオ編集データセットであるSe \ 〜Norita-2Mを紹介します。
se \ 〜norita-2mは、約200万のビデオ編集ペアで構成されています。
4つの高品質で専門のビデオ編集モデルを作成することで構築されており、それぞれがチームによって作成および訓練され、最先端の編集結果を達成します。
また、編集が不十分なビデオペアを排除するために、フィルタリングパイプラインを提案します。
さらに、一般的なビデオ編集アーキテクチャを調査して、現在の事前に訓練された生成モデルに基づいて最も効果的な構造を特定します。
広範な実験では、データセットが非常に高品質のビデオ編集結果を生み出すのに役立つことが示されています。
詳細については、https：//senorita.github.ioをご覧ください。

要約(オリジナル)

Recent advancements in video generation have spurred the development of video editing techniques, which can be divided into inversion-based and end-to-end methods. However, current video editing methods still suffer from several challenges. Inversion-based methods, though training-free and flexible, are time-consuming during inference, struggle with fine-grained editing instructions, and produce artifacts and jitter. On the other hand, end-to-end methods, which rely on edited video pairs for training, offer faster inference speeds but often produce poor editing results due to a lack of high-quality training video pairs. In this paper, to close the gap in end-to-end methods, we introduce Se\~norita-2M, a high-quality video editing dataset. Se\~norita-2M consists of approximately 2 millions of video editing pairs. It is built by crafting four high-quality, specialized video editing models, each crafted and trained by our team to achieve state-of-the-art editing results. We also propose a filtering pipeline to eliminate poorly edited video pairs. Furthermore, we explore common video editing architectures to identify the most effective structure based on current pre-trained generative model. Extensive experiments show that our dataset can help to yield remarkably high-quality video editing results. More details are available at https://senorita.github.io.

arxiv情報

著者	Bojia Zi,Penghui Ruan,Marco Chen,Xianbiao Qi,Shaozhe Hao,Shihao Zhao,Youze Huang,Bin Liang,Rong Xiao,Kam-Fai Wong
発行日	2025-02-10 17:58:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー