「cs.CV」カテゴリーアーカイブ

Neural Inverse Rendering from Propagating Light

投稿日: 2025年6月6日作成者: jarxiv

要約伝播光のマルチビューポイントビデオから物理的に基づいた神経逆レンダリングの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、視覚機能を備えた事前に訓練され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction

投稿日: 2025年6月6日作成者: jarxiv

要約このペーパーでは、複雑な動きで動的な3Dシーンを再構築するという課題に取り … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Contrastive Flow Matching

投稿日: 2025年6月6日作成者: jarxiv

要約無条件のフローマッチング列車拡散モデルは、サンプルペア間のフローが一意であ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

投稿日: 2025年6月6日作成者: jarxiv

要約現実世界のビデオ設定での数学的推論は、静的な画像やテキストよりも根本的に異 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

投稿日: 2025年6月6日作成者: jarxiv

要約既存の統一モデルは、ビジョン言語の理解とテキストからイメージの生成において … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers

投稿日: 2025年6月6日作成者: jarxiv

要約ビデオ拡散変圧器の細かく効率的な制御可能性は、適用可能性に対する増大する欲 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

投稿日: 2025年6月5日作成者: jarxiv

要約毎日の環境で一般的な明確なオブジェクトの再構築は、拡張/仮想現実とロボット … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG, cs.MM, cs.RO | コメントを受け付けていません

Zero-Shot Temporal Interaction Localization for Egocentric Videos

投稿日: 2025年6月5日作成者: jarxiv

要約ビデオ内のヒューマンオブジェクト相互作用（HOI）アクションを見つけること … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning

投稿日: 2025年6月5日作成者: jarxiv

要約この論文では、自己網性モデルと視覚運動ポリシーを学習するための拡散モデルを … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Neural Inverse Rendering from Propagating Light

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction

Contrastive Flow Matching

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

Zero-Shot Temporal Interaction Localization for Egocentric Videos

Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー