VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

要約

視覚モデルの領域では、主な表現モードはピクセルを使用して視覚世界をラスタライズすることです。
しかし、これは、特にポリゴンなどのジオメトリプリミティブを使用して世界を描写するデザイナーやアーティストにとって、ビジュアルコンテンツを表現するための最良または独自の方法であるとは限りません。
一方、ベクターグラフィックス (VG) はビジュアルコンテンツのテキスト表現を提供し、漫画やスケッチなどのコンテンツに対してより簡潔かつ強力になります。
最近の研究では、有能なラージ言語モデル (LLM) を使用したベクターグラフィックスの処理に関する有望な結果が示されています。
ただし、そのような作品は、定性的な結果、理解、または特定の種類のベクターグラフィックスのみに焦点を当てています。
私たちは、(a) 視覚的理解と生成の両方、(b) さまざまなベクトルグラフィック形式の評価、(c) 多様な質問タイプ、(d) 幅広い質問など、さまざまな側面を通じてベクトルグラフィックスを処理するための LLM 用の包括的なベンチマークである VGBench を提案します。
プロンプト技術、(e) 複数の LLM の下。
収集した 4279 の理解サンプルと 5845 の生成サンプルを評価すると、LLM は両方の側面で強力な機能を示しますが、低レベルフォーマット (SVG) ではそれほど望ましくないパフォーマンスを示すことがわかりました。
データと評価パイプラインは両方とも https://vgbench.github.io でオープンソース化されます。

要約(オリジナル)

In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world using geometry primitives such as polygons. Vector graphics (VG), on the other hand, offer a textual representation of visual content, which can be more concise and powerful for content like cartoons or sketches. Recent studies have shown promising results on processing vector graphics with capable Large Language Models (LLMs). However, such works focus solely on qualitative results, understanding, or a specific type of vector graphics. We propose VGBench, a comprehensive benchmark for LLMs on handling vector graphics through diverse aspects, including (a) both visual understanding and generation, (b) evaluation of various vector graphics formats, (c) diverse question types, (d) wide range of prompting techniques, (e) under multiple LLMs. Evaluating on our collected 4279 understanding and 5845 generation samples, we find that LLMs show strong capability on both aspects while exhibiting less desirable performance on low-level formats (SVG). Both data and evaluation pipeline will be open-sourced at https://vgbench.github.io.

arxiv情報

著者	Bocheng Zou,Mu Cai,Jianrui Zhang,Yong Jae Lee
発行日	2024-07-15 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー