Fast-Slow Thinking for Large Vision-Language Model Reasoning

要約

大規模なビジョン言語モデル（LVLMS）の最近の進歩により、\ textit {オーバーシンク}現象が明らかになりました。ここでは、モデルが質問に関係なくすべてのタスクにわたって冗長な推論を生成します。
この問題に対処するために、\ textbf {fast}、noble \ textbf {fa} st- \ textbf {s} low \ textbf {t} hinkingフレームワークを提示します。
経験的分析を通じて、応答の長さとデータ分布がパフォーマンスにどのように影響するかを調査することにより、LVLMSでの高速スロー思考の実現可能性を確立します。
質問の特性評価のためのモデルベースのメトリック、適応的思考報酬メカニズム、および難易度を認識するKLの正則化の3つのコンポーネントを使用して、ファストグラポを開発します。
7つの推論ベンチマークにわたる実験では、基本モデルと比較して10 \％を超える相対改善を備えた最先端の精度が高速であると同時に、以前のゆっくりと考えているアプローチと比較してトークンの使用量を32.7-67.3 \％減少させ、推論の長さと精度を効果的にバランスさせます。

要約(オリジナル)

Recent advances in large vision-language models (LVLMs) have revealed an \textit{overthinking} phenomenon, where models generate verbose reasoning across all tasks regardless of questions. To address this issue, we present \textbf{FAST}, a novel \textbf{Fa}st-\textbf{S}low \textbf{T}hinking framework that dynamically adapts reasoning depth based on question characteristics. Through empirical analysis, we establish the feasibility of fast-slow thinking in LVLMs by investigating how response length and data distribution affect performance. We develop FAST-GRPO with three components: model-based metrics for question characterization, an adaptive thinking reward mechanism, and difficulty-aware KL regularization. Experiments across seven reasoning benchmarks demonstrate that FAST achieves state-of-the-art accuracy with over 10\% relative improvement compared to the base model, while reducing token usage by 32.7-67.3\% compared to previous slow-thinking approaches, effectively balancing reasoning length and accuracy.

arxiv情報

著者	Wenyi Xiao,Leilei Gan,Weilong Dai,Wanggui He,Ziwei Huang,Haoyuan Li,Fangxun Shu,Zhelun Yu,Peng Zhang,Hao Jiang,Fei Wu
発行日	2025-04-25 16:11:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fast-Slow Thinking for Large Vision-Language Model Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー