StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

要約

LLM をビジュアル合成に活用するために、従来の方法では、特殊なビジュアルモジュールを介してラスターイメージ情報を離散グリッドトークンに変換する一方で、ビジュアルシーンの真のセマンティック表現をキャプチャするモデルの機能を妨害します。
この論文では、画像の代替表現であるベクトルグラフィックスが、より自然で意味的に一貫した画像情報のセグメンテーションを可能にすることで、この制限を効果的に克服できると主張しています。
そこで、ベクターグラフィックス上でより優れた視覚表現「ストロークトークン」を探求する先駆的な作品である StrokeNUWA を紹介します。これは、本質的に視覚的なセマンティクスが豊富で、LLM と自然に互換性があり、高度に圧縮されています。
ストロークトークンを備えた StrokeNUWA は、ベクターグラフィック生成タスクのさまざまなメトリクスにわたって、従来の LLM ベースおよび最適化ベースの方法を大幅に上回ることができます。
さらに、StrokeNUWA は 6.9% という優れた SVG コード圧縮率により、従来の方法と比較して最大 94 倍の推論高速化を実現します。

要約(オリジナル)

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model’s ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ”stroke tokens” on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%.

arxiv情報

著者	Zecheng Tang,Chenfei Wu,Zekai Zhang,Mingheng Ni,Shengming Yin,Yu Liu,Zhengyuan Yang,Lijuan Wang,Zicheng Liu,Juntao Li,Nan Duan
発行日	2024-01-30 15:20:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー