「cs.CV」カテゴリーアーカイブ

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

投稿日: 2025年2月21日作成者: jarxiv

要約チャートやドキュメントなどの豊富なテキストを持つ画像に関する推論は、ビジョ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

投稿日: 2025年2月21日作成者: jarxiv

要約 $ \ textbf {vidstyleode} $を提案します。生成敵 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

投稿日: 2025年2月21日作成者: jarxiv

要約マルチモーダル検索の高等世代（MRAG）は、外部の知識を統合することにより … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

投稿日: 2025年2月21日作成者: jarxiv

要約歴史的および文化的アーティファクトを理解するには、人間の専門知識と高度な計 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Generalizable Humanoid Manipulation with 3D Diffusion Policies

投稿日: 2025年2月20日作成者: jarxiv

要約多様な環境で自律的な操作が可能なヒューマノイドロボットは、長い間ロボット奏 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

投稿日: 2025年2月20日作成者: jarxiv

要約実際のシナリオでは、通常、マルチビューカメラが微調整された操作タスクに採用 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Improving Collision-Free Success Rate For Object Goal Visual Navigation Via Two-Stage Training With Collision Prediction

投稿日: 2025年2月20日作成者: jarxiv

要約オブジェクト目標の視覚ナビゲーションは、エゴセントリックな視覚観測を使用し … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Towards Fusing Point Cloud and Visual Representations for Imitation Learning

投稿日: 2025年2月20日作成者: jarxiv

要約操作のための学習には、ポイントクラウドやRGB画像などの豊富な感覚情報にア … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment

投稿日: 2025年2月20日作成者: jarxiv

要約ディープニューラルネットワークモデルは、クローズドセットの設定でトレーニン … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

MonoForce: Learnable Image-conditioned Physics Engine

投稿日: 2025年2月20日作成者: jarxiv

要約オンボードカメラの画像からの大まかなオフロード地形でのロボット軌跡の予測の … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません