SlotPi: Physics-informed Object-centric Reasoning Models

要約

現実世界の人間の能力に似た視覚的観察を通じて、物理的法則によって支配されるダイナミクスについての理解と推論は、大きな課題をもたらします。
現在、人間の行動をエミュレートするオブジェクト中心の動的シミュレーション方法は、顕著な進歩を達成しましたが、2つの重要な側面を見落としています。1）物理的知識のモデルへの統合。
人間は世界を観察することによって身体的洞察を得、この知識をさまざまな動的なシナリオについて正確に推論するために適用します。
2）多様なシナリオ全体のモデル適応性の検証。
実際のダイナミクス、特に流体とオブジェクトを含むダイナミクスは、オブジェクトの相互作用をキャプチャするだけでなく、流体の流れの特性をシミュレートするモデルを要求します。
これらのギャップに対処するために、スロットベースの物理情報に基づいたオブジェクト中心の推論モデルであるSlotPiを導入します。
SLOTPIは、ハミルトニアンの原理に基づいた物理モジュールを、動的予測のための時空間予測モジュールと統合します。
私たちの実験は、ベンチマークや流体データセットの予測や視覚的質問の回答（VQA）などのタスクにおけるモデルの強みを強調しています。
さらに、オブジェクトの相互作用、流体ダイナミクス、および流体オブジェクトの相互作用を含む実際のデータセットを作成し、モデルの機能を検証しました。
すべてのデータセットにおけるモデルの堅牢なパフォーマンスは、その強力な適応性を強調し、より高度な世界モデルを開発するための基盤を築きます。

要約(オリジナル)

Understanding and reasoning about dynamics governed by physical laws through visual observation, akin to human capabilities in the real world, poses significant challenges. Currently, object-centric dynamic simulation methods, which emulate human behavior, have achieved notable progress but overlook two critical aspects: 1) the integration of physical knowledge into models. Humans gain physical insights by observing the world and apply this knowledge to accurately reason about various dynamic scenarios; 2) the validation of model adaptability across diverse scenarios. Real-world dynamics, especially those involving fluids and objects, demand models that not only capture object interactions but also simulate fluid flow characteristics. To address these gaps, we introduce SlotPi, a slot-based physics-informed object-centric reasoning model. SlotPi integrates a physical module based on Hamiltonian principles with a spatio-temporal prediction module for dynamic forecasting. Our experiments highlight the model’s strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets. Furthermore, we have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model’s capabilities. The model’s robust performance across all datasets underscores its strong adaptability, laying a foundation for developing more advanced world models.

arxiv情報

著者	Jian Li,Wan Han,Ning Lin,Yu-Liang Zhan,Ruizhi Chengze,Haining Wang,Yi Zhang,Hongsheng Liu,Zidong Wang,Fan Yu,Hao Sun
発行日	2025-06-12 14:53:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SlotPi: Physics-informed Object-centric Reasoning Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー