Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

要約

スケッチは、UI デザイナーが初期段階のアイデアを概念化するための自然でアクセスしやすい媒体です。
ただし、UI/UX 自動化に関する既存の研究では、Figma デザインや詳細なスクリーンショットなどの忠実度の高い入力が必要になることが多く、アクセシビリティが制限され、効率的なデザインの反復が妨げられます。
このギャップを埋めるために、基本的なスケッチから Web ページプロトタイプへの変換を自動化する最先端のビジョン言語モデル (VLM) を評価するベンチマークである Sketch2Code を紹介します。
エンドツーエンドのベンチマークを超えて、Sketch2Code は、現実世界の設計ワークフローを模倣する対話型エージェント評価をサポートします。VLM ベースのエージェントは、シミュレートされたユーザーと通信することで、フィードバック指示を受動的に受け取るか、積極的に説明の質問をして、繰り返し生成を調整します。
私たちは 10 個の商用モデルとオープンソースモデルを包括的に分析し、Sketch2Code が既存の VLM にとって困難であることを示しています。
最も有能なモデルであっても、スケッチを正確に解釈し、着実な改善につながる効果的な質問を組み立てるのは困難です。
それにもかかわらず、UI/UX 専門家とのユーザー調査では、受動的フィードバック受信よりも積極的な質問をすることを非常に好むことが明らかになり、マルチターン会話エージェントのためのより効果的なパラダイムを開発する必要性が強調されています。

要約(オリジナル)

Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas. However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. Beyond end-to-end benchmarking, Sketch2Code supports interactive agent evaluation that mimics real-world design workflows, where a VLM-based agent iteratively refines its generations by communicating with a simulated user, either passively receiving feedback instructions or proactively asking clarification questions. We comprehensively analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs; even the most capable models struggle to accurately interpret sketches and formulate effective questions that lead to steady improvement. Nevertheless, a user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception, highlighting the need to develop more effective paradigms for multi-turn conversational agents.

arxiv情報

著者	Ryan Li,Yanzhe Zhang,Diyi Yang
発行日	2024-10-21 17:39:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー