Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

要約

大規模な言語モデル（LLM）の最近の進歩は、単純な自然言語の発話を通じて視覚化のプロセスを自動化する上で大きな可能性を示しています。
ただし、自然言語を使用したLLMの指示は、視覚化の意図を伝えるための精度と表現力が制限されており、誤解と時間のかかる反復につながります。
これらの制限に対処するために、LLMSが視覚化オーサリングのコンテキストで曖昧または不完全なテキストプロンプトをどのように解釈するか、およびLLMがユーザーの意図を誤って解釈する条件を理解するための経験的研究を実施します。
調査結果から通知を受けて、視覚的なプロンプトをテキストプロンプトに補完的な入力モダリティとして導入します。これは、ユーザーの意図を明確にし、LLMSの解釈能力を改善します。
視覚化オーサリングにおけるマルチモーダルプロンプトの可能性を調査するために、vispilotを設計します。これにより、ユーザーは、既存の視覚化に関するテキスト、スケッチ、直接操作など、マルチモーダルプロンプトを使用して視覚化を簡単に作成できます。
2つのケーススタディと制御されたユーザー調査を通じて、vispilotは、テキストのみのプロンプトアプローチと比較して、全体的なタスク効率に影響を与えることなく視覚化を作成するためのより直感的な方法を提供することを実証します。
さらに、さまざまな視覚化タスクにおけるテキストと視覚的なプロンプトの影響を分析します。
私たちの調査結果は、視覚化オーサリングのためのLLMSの使いやすさを改善する上でマルチモーダルプロンプトの重要性を強調しています。
将来の視覚化システムのデザインの意味合いについて説明し、マルチモーダルプロンプトが創造的な視覚化タスクにおける人間とのコラボレーションをどのように強化できるかについての洞察を提供します。
すべての材料は、https：//osf.io/2qrakで入手できます。

要約(オリジナル)

Recent advances in large language models (LLMs) have shown great potential in automating the process of visualization authoring through simple natural language utterances. However, instructing LLMs using natural language is limited in precision and expressiveness for conveying visualization intent, leading to misinterpretation and time-consuming iterations. To address these limitations, we conduct an empirical study to understand how LLMs interpret ambiguous or incomplete text prompts in the context of visualization authoring, and the conditions making LLMs misinterpret user intent. Informed by the findings, we introduce visual prompts as a complementary input modality to text prompts, which help clarify user intent and improve LLMs’ interpretation abilities. To explore the potential of multimodal prompting in visualization authoring, we design VisPilot, which enables users to easily create visualizations using multimodal prompts, including text, sketches, and direct manipulations on existing visualizations. Through two case studies and a controlled user study, we demonstrate that VisPilot provides a more intuitive way to create visualizations without affecting the overall task efficiency compared to text-only prompting approaches. Furthermore, we analyze the impact of text and visual prompts in different visualization tasks. Our findings highlight the importance of multimodal prompting in improving the usability of LLMs for visualization authoring. We discuss design implications for future visualization systems and provide insights into how multimodal prompts can enhance human-AI collaboration in creative visualization tasks. All materials are available at https://OSF.IO/2QRAK.

arxiv情報

著者	Zhen Wen,Luoxuan Weng,Yinghao Tang,Runjin Zhang,Yuxin Liu,Bo Pan,Minfeng Zhu,Wei Chen
発行日	2025-04-18 14:00:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー