Text-Animator: Controllable Visual Text Video Generation

要約

ビデオ生成は、ゲーム、電子商取引、広告などのさまざまな業界において、困難ではあるものの極めて重要なタスクです。
T2V 内で未解決の重要な側面の 1 つは、生成されたビデオ内のテキストの効果的な視覚化です。
Text-to-Video~(T2V) 生成の進歩にも関わらず、現在の方法では、主に意味論的なシーン情報の要約、理解、アクションの描写に焦点を当てているため、ビデオ内のテキストを直接効果的に視覚化することはできません。
画像レベルのビジュアルテキスト生成における最近の進歩は期待できるものですが、これらの技術をビデオドメインに移行すると、特にテキストの忠実性と動きの一貫性の維持において問題に直面します。
この論文では、ビジュアルテキストビデオ生成のための Text-Animator と呼ばれる革新的なアプローチを提案します。
Text-Animator には、生成されたビデオ内のビジュアルテキストの構造を正確に描写するためのテキスト埋め込み注入モジュールが含まれています。
さらに、カメラ制御モジュールとテキスト洗練モジュールを開発し、カメラの動きと視覚化されたテキストの動きを制御することにより、生成されたビジュアルテキストの安定性を向上させます。
定量的および定性的な実験結果は、生成されたビジュアルテキストの精度に対する当社のアプローチが最先端のビデオ生成方法よりも優れていることを示しています。
プロジェクトページは https://laulampaul.github.io/text-animator.html にあります。

要約(オリジナル)

Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress achieved in Text-to-Video~(T2V) generation, current methods still cannot effectively visualize texts in videos directly, as they mainly focus on summarizing semantic scene information, understanding, and depicting actions. While recent advances in image-level visual text generation show promise, transitioning these techniques into the video domain faces problems, notably in preserving textual fidelity and motion coherence. In this paper, we propose an innovative approach termed Text-Animator for visual text video generation. Text-Animator contains a text embedding injection module to precisely depict the structures of visual text in generated videos. Besides, we develop a camera control module and a text refinement module to improve the stability of generated visual text by controlling the camera movement as well as the motion of visualized text. Quantitative and qualitative experimental results demonstrate the superiority of our approach to the accuracy of generated visual text over state-of-the-art video generation methods. The project page can be found at https://laulampaul.github.io/text-animator.html.

arxiv情報

著者	Lin Liu,Quande Liu,Shengju Qian,Yuan Zhou,Wengang Zhou,Houqiang Li,Lingxi Xie,Qi Tian
発行日	2024-06-25 17:59:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text-Animator: Controllable Visual Text Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー