VideoPoet: A Large Language Model for Zero-Shot Video Generation

要約

我々は、多種多様な調整信号から、高品質のビデオと一致するオーディオを合成できる言語モデルである VideoPoet を紹介します。
VideoPoet は、画像、ビデオ、テキスト、オーディオなどのマルチモーダル入力を処理するデコーダー専用のトランスフォーマーアーキテクチャを採用しています。
トレーニングプロトコルは大規模言語モデル (LLM) のプロトコルに従い、事前トレーニングとタスク固有の適応という 2 つの段階で構成されます。
VideoPoet は、事前トレーニング中に、自己回帰 Transformer フレームワーク内にマルチモーダルな生成目標の混合を組み込みます。
事前トレーニングされた LLM は、さまざまなビデオ生成タスクに適応できる基盤として機能します。
ゼロショットビデオ生成におけるモデルの最先端の機能を実証する実験結果を紹介し、特に高忠実度のモーションを生成する VideoPoet の機能を強調します。
プロジェクトページ: http://sites.research.google/videopoet/

要約(オリジナル)

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs — including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model’s state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet’s ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/

arxiv情報

著者	Dan Kondratyuk,Lijun Yu,Xiuye Gu,José Lezama,Jonathan Huang,Rachel Hornung,Hartwig Adam,Hassan Akbari,Yair Alon,Vighnesh Birodkar,Yong Cheng,Ming-Chang Chiu,Josh Dillon,Irfan Essa,Agrim Gupta,Meera Hahn,Anja Hauth,David Hendon,Alonso Martinez,David Minnen,David Ross,Grant Schindler,Mikhail Sirotenko,Kihyuk Sohn,Krishna Somandepalli,Huisheng Wang,Jimmy Yan,Ming-Hsuan Yang,Xuan Yang,Bryan Seybold,Lu Jiang
発行日	2023-12-21 18:46:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VideoPoet: A Large Language Model for Zero-Shot Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー