Word-Level Fine-Grained Story Visualization

要約

ストーリービジュアライゼーションの目的は、一連の画像を生成して、動的なシーンやキャラクター全体でグローバルな一貫性を保ちながら、複数のセンテンスストーリーの各センテンスをナレーションすることです。
現在の作品は、出力画像の品質と一貫性にまだ苦労しており、追加のセマンティック情報または補助キャプションネットワークに依存しています。
これらの課題に対処するために、最初に新しい文表現を導入します。これは、すべてのストーリー文から単語情報を組み込んで、矛盾の問題を軽減します。
次に、融合機能を備えた新しい弁別器を提案し、空間的注意をさらに拡張して、画質とストーリーの一貫性を向上させます。
さまざまなデータセットと人間による評価に関する広範な実験により、セグメンテーションマスクも補助キャプションネットワークも使用しない最先端の方法と比較して、私たちのアプローチの優れたパフォーマンスが実証されています。

要約(オリジナル)

Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters. Current works still struggle with output images’ quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. Extensive experiments on different datasets and human evaluation demonstrate the superior performance of our approach, compared to state-of-the-art methods, neither using segmentation masks nor auxiliary captioning networks.

arxiv情報

著者	Bowen Li,Thomas Lukasiewicz
発行日	2022-08-25 07:34:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Word-Level Fine-Grained Story Visualization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー