Probing and Inducing Combinational Creativity in Vision-Language Models

要約

既存の概念を斬新なアイデアに組み合わせる能力は、人間の知性の基本的な特徴として存在します。
GPT-4VやDalle-3などの視覚言語モデル（VLM）の最近の進歩は、出力がM. A. Boden（1998）によって定義されている組み合わせの創造性を反映しているかどうかについての議論を引き起こしました。
認知科学からインスピレーションを得て、コンセプトブレンドのレンズからのVLMの組み合わせの創造性を調査します。
識別と説明 – 実装（IEI）フレームワークを提案します。これは、クリエイティブプロセスを3つのレベルに分解し、入力スペースの識別、共有属性の抽出、新しい意味の意味を導き出すという3つのレベルに分解されます。
このフレームワークを検証するために、IEIフレームワークに従って注釈が付けられた666人のアーティストで生成された視覚マッシュアップの高品質のデータセットであるCreativeMashupをキュレートします。
広範な実験を通じて、理解タスクでは、最高のVLMが平均的な人間のパフォーマンスを上回っている間、専門家レベルの理解に達していないことを実証します。
世代のタスクでは、IEIフレームワークをGeneration Pipelineに組み込むことで、VLMSの出力の創造的な品質が大幅に向上します。
私たちの調査結果は、人工的な創造性を評価するための理論的基盤と、VLMSの創造的生成を改善するための実用的なガイドラインの両方を確立しています。

要約(オリジナル)

The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in Vision-Language Models (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity–defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts–or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs’ outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.

arxiv情報

著者	Yongqian Peng,Yuxi Ma,Mengmeng Wang,Yuxuan Wang,Yizhou Wang,Chi Zhang,Yixin Zhu,Zilong Zheng
発行日	2025-04-29 14:51:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Probing and Inducing Combinational Creativity in Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー