ProxyThinker: Test-Time Guidance through Small Visual Reasoners

要約

検証可能な報酬による強化学習の最近の進歩により、大規模なビジョン言語モデル（LVLMS）の視覚的推論能力の境界が押し上げられました。
ただし、強化微調整（RFT）を使用したLVLMSのトレーニングは計算高価であり、モデルサイズのスケーリングに大きな課題を抱えています。
この作業では、Proxythinkerを提案します。これは、大規模なモデルがトレーニングなしで小さくゆっくりと考えている視覚的推論者から視覚的推論機能を継承できるようにすることを提案します。
RFT推論者のベースモデルの出力分布を減算することにより、Proxythinkerはデコードダイナミクスを変更し、自己検証や自己修正などの洗練された洗練された行動によって実証された遅い考えの推論をうまく引き出します。
Proxythinkerは、空間的、数学的、および学際的な推論での挑戦的な視覚的ベンチマークのパフォーマンスを一貫して向上させ、本格的なRFTカウンターパートのパフォーマンスと競合できるようになります。
さらに、実装は、複数の言語モデルを並列処理テクニックと効率的に調整し、以前のデコード時間方法と比較して最大38 $ \ Times $の推論を達成し、Proxythinkerの実際の展開への道を開きます。
コードはhttps://github.com/mrzilinxiao/proxythinkerで入手できます。

要約(オリジナル)

Recent advancements in reinforcement learning with verifiable rewards have pushed the boundaries of the visual reasoning capabilities in large vision-language models (LVLMs). However, training LVLMs with reinforcement fine-tuning (RFT) is computationally expensive, posing a significant challenge to scaling model size. In this work, we propose ProxyThinker, an inference-time technique that enables large models to inherit the visual reasoning capabilities from small, slow-thinking visual reasoners without any training. By subtracting the output distributions of base models from those of RFT reasoners, ProxyThinker modifies the decoding dynamics and successfully elicits the slow-thinking reasoning demonstrated by the emerged sophisticated behaviors such as self-verification and self-correction. ProxyThinker consistently boosts performance on challenging visual benchmarks on spatial, mathematical, and multi-disciplinary reasoning, enabling untuned base models to compete with the performance of their full-scale RFT counterparts. Furthermore, our implementation efficiently coordinates multiple language models with parallelism techniques and achieves up to 38 $\times$ faster inference compared to previous decoding-time methods, paving the way for the practical deployment of ProxyThinker. Code is available at https://github.com/MrZilinXiao/ProxyThinker.

arxiv情報

著者	Zilin Xiao,Jaywon Koo,Siru Ouyang,Jefferson Hernandez,Yu Meng,Vicente Ordonez
発行日	2025-05-30 17:59:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ProxyThinker: Test-Time Guidance through Small Visual Reasoners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー