Unhackable Temporal Rewarding for Scalable Video MLLMs

要約

優れたビデオ処理MLLMを追求するために、私たちは困惑するパラドックスに遭遇しました：「反スケーリング法」では、より多くのデータとより大きなモデルがパフォーマンスを悪化させます。
この研究では、犯人がマスクされていません：「時間的ハッキング」は、特定のフレームに固定してショートカットされ、完全なビデオの物語が欠けている現象です。
この作業では、一時的なハッキングの包括的な理論を体系的に確立し、強化学習の観点からそれを定義し、時間的困惑（TPL）スコアを導入してこの不整合を評価し、不可解な時間的やりがいのある（UTR）フレームワークを提案して、時間的ハッキングを軽減する
。
理論的および経験的には、TPLは、フレームの活性化パターンと強く相関して、時間モデリングの品質の信頼できる指標であることが証明されています。
広範な実験により、UTRは一時的なハッキングに対抗するだけでなく、ビデオ理解能力を大幅に高めることが明らかになりました。
この作業は、ビデオシステムを進歩させるだけでなく、プロキシ報酬をMLLM開発における真の目的と整合することの重要な重要性を明らかにしています。

要約(オリジナル)

In the pursuit of superior video-processing MLLMs, we have encountered a perplexing paradox: the ‘anti-scaling law’, where more data and larger models lead to worse performance. This study unmasks the culprit: ‘temporal hacking’, a phenomenon where models shortcut by fixating on select frames, missing the full video narrative. In this work, we systematically establish a comprehensive theory of temporal hacking, defining it from a reinforcement learning perspective, introducing the Temporal Perplexity (TPL) score to assess this misalignment, and proposing the Unhackable Temporal Rewarding (UTR) framework to mitigate the temporal hacking. Both theoretically and empirically, TPL proves to be a reliable indicator of temporal modeling quality, correlating strongly with frame activation patterns. Extensive experiments reveal that UTR not only counters temporal hacking but significantly elevates video comprehension capabilities. This work not only advances video-AI systems but also illuminates the critical importance of aligning proxy rewards with true objectives in MLLM development.

arxiv情報

著者	En Yu,Kangheng Lin,Liang Zhao,Yana Wei,Zining Zhu,Haoran Wei,Jianjian Sun,Zheng Ge,Xiangyu Zhang,Jingyu Wang,Wenbing Tao
発行日	2025-02-17 17:55:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unhackable Temporal Rewarding for Scalable Video MLLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー