「cs.CV」カテゴリーアーカイブ

AV-Flow: Transforming Text to Audio-Visual Human-like Interactions

投稿日: 2025年2月19日作成者: jarxiv

要約テキスト入力のみが与えられた写真と現実的な4Dトーキングアバターをアニメー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

投稿日: 2025年2月19日作成者: jarxiv

要約空間インテリジェンスは、具体化されたAIの重要なコンポーネントであり、ロボ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

投稿日: 2025年2月19日作成者: jarxiv

要約既存のエンドツーエンドの自律運転（AD）アルゴリズムは通常、模倣学習（IL … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

投稿日: 2025年2月19日作成者: jarxiv

要約大型ビジョン言語モデル（VLMS）の出現により、視覚的モダリティを統合する … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

投稿日: 2025年2月19日作成者: jarxiv

要約最近のマルチモーダル大手言語モデル（MLLM）は驚くべきパフォーマンスを達 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023

投稿日: 2025年2月19日作成者: jarxiv

要約 SCICAPデータセットが2021年に開始されて以来、研究コミュニティは学 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

MagicArticulate: Make Your 3D Models Articulation-Ready

投稿日: 2025年2月19日作成者: jarxiv

要約 3Dコンテンツ作成の爆発的な成長により、静的3Dモデルを自動的に現実的なア … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Novel computational workflows for natural and biomedical image processing based on hypercomplex algebras

投稿日: 2025年2月18日作成者: jarxiv

要約 HyperComplex画像処理は、代数および幾何学的原理を含む統一された … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

投稿日: 2025年2月18日作成者: jarxiv

要約実際のシナリオでは、通常、マルチビューカメラが微調整された操作タスクに採用 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

投稿日: 2025年2月18日作成者: jarxiv

要約テレオ操作は、監督者が農業ロボットをリモートで制御できるようにするための重 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

AV-Flow: Transforming Text to Audio-Visual Human-like Interactions

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023

MagicArticulate: Make Your 3D Models Articulation-Ready

Novel computational workflows for natural and biomedical image processing based on hypercomplex algebras

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation

最近の投稿

最近のコメント

アーカイブ

カテゴリー