投稿者「jarxiv」のアーカイブ

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

投稿日: 2025年6月6日作成者: jarxiv

要約時空間局在は、生物学的研究から自律的なナビゲーションやインタラクティブなイ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh

投稿日: 2025年6月6日作成者: jarxiv

要約テクスチャメッシュと対応するマルチビューパノラマ画像として表される屋内スペ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

投稿日: 2025年6月6日作成者: jarxiv

要約具体化されたAIおよびデジタルコンテンツの作成には、現実的な3D屋内シーン … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Refer to Anything with Vision-Language Prompts

投稿日: 2025年6月6日作成者: jarxiv

要約最近の画像セグメンテーションモデルは、画像を視覚エンティティの高品質のマス … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ContentV: Efficient Training of Video Generation Models with Limited Compute

投稿日: 2025年6月6日作成者: jarxiv

要約ビデオ生成の最近の進歩は、計算コストのエスカレートを緩和するためにますます … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Neural Inverse Rendering from Propagating Light

投稿日: 2025年6月6日作成者: jarxiv

要約伝播光のマルチビューポイントビデオから物理的に基づいた神経逆レンダリングの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

投稿日: 2025年6月6日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、視覚機能を備えた事前に訓練され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction

投稿日: 2025年6月6日作成者: jarxiv

要約このペーパーでは、複雑な動きで動的な3Dシーンを再構築するという課題に取り … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Contrastive Flow Matching

投稿日: 2025年6月6日作成者: jarxiv

要約無条件のフローマッチング列車拡散モデルは、サンプルペア間のフローが一意であ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

投稿日: 2025年6月6日作成者: jarxiv

要約現実世界のビデオ設定での数学的推論は、静的な画像やテキストよりも根本的に異 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Refer to Anything with Vision-Language Prompts

ContentV: Efficient Training of Video Generation Models with Limited Compute

Neural Inverse Rendering from Propagating Light

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction

Contrastive Flow Matching

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

最近の投稿

最近のコメント

アーカイブ

カテゴリー