「cs.CV」カテゴリーアーカイブ

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision

投稿日: 2025年4月4日作成者: jarxiv

要約ロボットビジョンは、マルチモーダル融合技術と視覚言語モデル（VLM）の進歩 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

投稿日: 2025年4月4日作成者: jarxiv

要約本稿では、3D多関節オブジェクトのモデリングと条件生成のための新しいフレー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

投稿日: 2025年4月4日作成者: jarxiv

要約運動と空間に関する推論は、複数の実世界アプリケーションで必要とされる基本的 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.RO | コメントを受け付けていません

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

投稿日: 2025年4月4日作成者: jarxiv

要約近年の画像ベースのヒューマンアニメーション手法は、リアルな身体や顔の動き合 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

投稿日: 2025年4月4日作成者: jarxiv

要約 ILLUME+は、二重の視覚的トークン化と拡散デコーダを活用し、深い意味理 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

投稿日: 2025年4月4日作成者: jarxiv

要約疎なビューから3Dシーンを復元することは、その本質的な非正規問題のために困 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GSR4B: Biomass Map Super-Resolution with Sentinel-1/2 Guidance

投稿日: 2025年4月4日作成者: jarxiv

要約大規模かつ高い時空間分解能での正確な地上バイオマス（AGB）マッピングは、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks

投稿日: 2025年4月4日作成者: jarxiv

要約教師なし領域適応（UDA）フレームワークは、クリーンデータ上の3次元点群セ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

投稿日: 2025年4月4日作成者: jarxiv

要約鳥瞰（BEV）知覚は、複数の視点画像を融合するための統一的な表現を提供し、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

投稿日: 2025年4月3日作成者: jarxiv

要約セグメンテーション、深さ、エッジなどのさまざまなモダリティの複数の空間制御 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

GSR4B: Biomass Map Super-Resolution with Sentinel-1/2 Guidance

Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

最近の投稿

最近のコメント

アーカイブ

カテゴリー