「cs.CV」カテゴリーアーカイブ

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

投稿日: 2024年10月31日作成者: jarxiv

要約大規模言語モデル (LLM) の自己回帰は、すべての言語タスクを次のトーク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

投稿日: 2024年10月31日作成者: jarxiv

要約視覚表現の事前トレーニングにより、ロボットの学習効率が向上しました。大規 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

投稿日: 2024年10月30日作成者: jarxiv

要約イベントカメラは、高い時間解像度と高いダイナミックレンジを備えているた … 続きを読む →

カテゴリー: cs.CV, cs.RO, eess.IV | コメントを受け付けていません

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

投稿日: 2024年10月30日作成者: jarxiv

要約大規模な現実世界の運転データセットは、自動運転のためのデータ駆動型モーショ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DOFS: A Real-world 3D Deformable Object Dataset with Full Spatial Information for Dynamics Model Learning

投稿日: 2024年10月30日作成者: jarxiv

要約この研究では、DOFS を提案します。これは、新しい低コストのデータ収集プ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting

投稿日: 2024年10月30日作成者: jarxiv

要約私たちは、ガウススプラッティングを活用した自律的な高忠実度再構成システム … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

SMART: Scalable Multi-agent Real-time Generation via Next-token Prediction

投稿日: 2024年10月30日作成者: jarxiv

要約データ駆動型の自動運転モーション生成タスクは、データセットサイズの制限と … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Non-rigid Relative Placement through 3D Dense Diffusion

投稿日: 2024年10月30日作成者: jarxiv

要約「相対配置」のタスクは、あるオブジェクトの別のオブジェクトに対する配置を予 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Are VLMs Really Blind

投稿日: 2024年10月30日作成者: jarxiv

要約ビジョン言語モデルは、光学式文字認識 (OCR)、視覚的質問応答 (VQA … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

No ‘Zero-Shot’ Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

投稿日: 2024年10月30日作成者: jarxiv

要約 Web クロールされた事前トレーニングデータセットは、分類/検索用の C … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

DOFS: A Real-world 3D Deformable Object Dataset with Full Spatial Information for Dynamics Model Learning

ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting

SMART: Scalable Multi-agent Real-time Generation via Next-token Prediction

Non-rigid Relative Placement through 3D Dense Diffusion

Are VLMs Really Blind

No ‘Zero-Shot’ Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

最近の投稿

最近のコメント

アーカイブ

カテゴリー