Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

要約

生成芸術の急速な進歩は、視覚的に心地よいイメージの作成を民主化しました。
しかし、本物の芸術的影響を達成すること – より深く、より意味のあるレベルで視聴者と共鳴する種類 – には、洗練された美的感性が必要です。
この感性には、単なる視覚的魅力を超えて拡張される多面的な推論プロセスが含まれます。これは、現在の計算モデルで見落とされがちです。
この論文は、マルチモーダルLLMS（MLLM）の推論能力が審美的判断のためにどのように効果的に引き出されるかを調査することにより、この複雑なプロセスをキャプチャするアプローチを開拓します。
私たちの分析は重要な課題を明らかにしています。MLLMSは、主観的な意見と根拠のない芸術的解釈を特徴とする審美的推論中に幻覚に向けた傾向を示します。
さらに、これらの制限は、提案されたベースラインであるArtcotによって実証されているように、証拠に基づいた客観的推論プロセスを採用することで克服できることを実証します。
この原則によって促されたMLLMSは、人間の判断に非常に優れている多面的で詳細な審美的推論を生成します。
これらの調査結果は、AIアートの個別指導や生成芸術の報酬モデルなどの分野で直接的な応用を備えています。
最終的に、私たちの作品は、賢明な人間の美的基準と一致するアートワークを真に理解し、感謝し、生成できるAIシステムへの道を開きます。

要約(オリジナル)

The rapid progress of generative art has democratized the creation of visually pleasing imagery. However, achieving genuine artistic impact – the kind that resonates with viewers on a deeper, more meaningful level – requires a sophisticated aesthetic sensibility. This sensibility involves a multi-faceted reasoning process extending beyond mere visual appeal, which is often overlooked by current computational models. This paper pioneers an approach to capture this complex process by investigating how the reasoning capabilities of Multimodal LLMs (MLLMs) can be effectively elicited for aesthetic judgment. Our analysis reveals a critical challenge: MLLMs exhibit a tendency towards hallucinations during aesthetic reasoning, characterized by subjective opinions and unsubstantiated artistic interpretations. We further demonstrate that these limitations can be overcome by employing an evidence-based, objective reasoning process, as substantiated by our proposed baseline, ArtCoT. MLLMs prompted by this principle produce multi-faceted and in-depth aesthetic reasoning that aligns significantly better with human judgment. These findings have direct applications in areas such as AI art tutoring and as reward models for generative art. Ultimately, our work paves the way for AI systems that can truly understand, appreciate, and generate artworks that align with the sensible human aesthetic standard.

arxiv情報

著者	Ruixiang Jiang,Changwen Chen
発行日	2025-04-17 17:14:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー