「cs.CV」カテゴリーアーカイブ

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

投稿日: 2025年2月7日作成者: jarxiv

要約このペーパーでは、視覚、オーディオ、テキスト入力を同時に網羅するマルチモー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

投稿日: 2025年2月7日作成者: jarxiv

要約特にGPT-4Oに続く大規模な言語モデルの最近の進歩により、より多くのモダ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS, eess.IV | コメントを受け付けていません

SMART: Advancing Scalable Map Priors for Driving Topology Reasoning

投稿日: 2025年2月7日作成者: jarxiv

要約トポロジーの推論は、車線と交通要素の間の接続性と関係を包括的に理解すること … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Intelligent Sensing-to-Action for Robust Autonomy at the Edge: Opportunities and Challenges

投稿日: 2025年2月6日作成者: jarxiv

要約ロボット工学、スマートシティ、および自律車の自律的なエッジコンピューティン … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

SD++: Enhancing Standard Definition Maps by Incorporating Road Knowledge using LLMs

投稿日: 2025年2月6日作成者: jarxiv

要約高解像度マップ（HDマップ）は、レーンセンターラインと道路要素をキャプチャ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

RoboGrasp: A Universal Grasping Policy for Robust Robotic Control

投稿日: 2025年2月6日作成者: jarxiv

要約模倣学習と世界モデルは、一般化可能なロボット学習を進めることに大きな約束を … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Edge Attention Module for Object Classification

投稿日: 2025年2月6日作成者: jarxiv

要約この研究では、オブジェクト分類タスクに関する新しい「エッジ注意ベースの畳み … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Tell2Reg: Establishing spatial correspondence between images by the same language prompts

投稿日: 2025年2月6日作成者: jarxiv

要約空間的対応は、セグメント化された領域のペアで表すことができ、画像登録ネット … 続きを読む →

カテゴリー: 00B25, cs.AI, cs.CV, eess.IV, I.2.7 | コメントを受け付けていません

3D Face Reconstruction From Radar Images

投稿日: 2025年2月6日作成者: jarxiv

要約顔の3D再構成は、コンピュータービジョンで広く注目され、たとえばアニメーシ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Assessing Open-world Forgetting in Generative Image Model Customization

投稿日: 2025年2月6日作成者: jarxiv

要約拡散モデルの最近の進歩により、画像生成機能が大幅に向上しています。ただし … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

SMART: Advancing Scalable Map Priors for Driving Topology Reasoning

Intelligent Sensing-to-Action for Robust Autonomy at the Edge: Opportunities and Challenges

SD++: Enhancing Standard Definition Maps by Incorporating Road Knowledge using LLMs

RoboGrasp: A Universal Grasping Policy for Robust Robotic Control

Edge Attention Module for Object Classification

Tell2Reg: Establishing spatial correspondence between images by the same language prompts

3D Face Reconstruction From Radar Images

Assessing Open-world Forgetting in Generative Image Model Customization

最近の投稿

最近のコメント

アーカイブ

カテゴリー