SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

要約

画像セグメンテーションのためのマルチモーダル大規模モデルを活用することは、顕著な研究方向になりました。
ただし、通常、既存のアプローチは、コストがかかり、時間がかかる明示的な推論プロセスを含む手動で注釈付きのデータセットに大きく依存しています。
最近の進歩は、補強学習（RL）が、そのような推論が発表されたデータを必要とせずに、推論能力を大規模なモデルに委ねることができることを示唆しています。
このホワイトペーパーでは、マルチモーダルの大規模モデルが画像理解タスクで微調整された推論を実行できるようにする新しいフレームワークであるSAM-R1を提案します。
私たちのアプローチは、マルチモーダル推論モデルのトレーニング中に、細粒セグメンテーション設定を組み込んだ最初のアプローチです。
タスク固有の微細な報酬を調整された最適化目標と統合することにより、モデルの推論とセグメンテーションのアラインメントをさらに強化します。
また、学習プロセスを導くために、強力で柔軟な報酬プロバイダーとして、あらゆるモデル（SAM）をセグメントを活用します。
SAM-R1は3Kトレーニングサンプルのみで、複数のベンチマークにわたって強力なパフォーマンスを実現し、マルチモーダルモデルにセグメンテーション指向の推論機能を装備する際の強化学習の有効性を実証します。

要約(オリジナル)

Leveraging multimodal large models for image segmentation has become a prominent research direction. However, existing approaches typically rely heavily on manually annotated datasets that include explicit reasoning processes, which are costly and time-consuming to produce. Recent advances suggest that reinforcement learning (RL) can endow large models with reasoning capabilities without requiring such reasoning-annotated data. In this paper, we propose SAM-R1, a novel framework that enables multimodal large models to perform fine-grained reasoning in image understanding tasks. Our approach is the first to incorporate fine-grained segmentation settings during the training of multimodal reasoning models. By integrating task-specific, fine-grained rewards with a tailored optimization objective, we further enhance the model’s reasoning and segmentation alignment. We also leverage the Segment Anything Model (SAM) as a strong and flexible reward provider to guide the learning process. With only 3k training samples, SAM-R1 achieves strong performance across multiple benchmarks, demonstrating the effectiveness of reinforcement learning in equipping multimodal models with segmentation-oriented reasoning capabilities.

arxiv情報

著者	Jiaqi Huang,Zunnan Xu,Jun Zhou,Ting Liu,Yicheng Xiao,Mingwen Ou,Bowen Ji,Xiu Li,Kehong Yuan
発行日	2025-05-28 17:08:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー