「cs.CV」カテゴリーアーカイブ

Retrospective Learning from Interactions

投稿日: 2024年10月18日作成者: jarxiv

要約大規模言語モデル (LLM) とユーザーの間の複数ターンの対話には、当然、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Can MLLMs Understand the Deep Implication Behind Chinese Images?

投稿日: 2024年10月18日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の機能が向上し続けるにつれて … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY | コメントを受け付けていません

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

投稿日: 2024年10月18日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は大幅に進歩しているにもかか … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

投稿日: 2024年10月18日作成者: jarxiv

要約 3D の視覚的基盤はロボットにとって極めて重要であり、自然言語と 3D シ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

投稿日: 2024年10月18日作成者: jarxiv

要約マルチモーダル基礎モデルの最近の進歩により、視覚言語の理解に大きな進歩がも … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DepthSplat: Connecting Gaussian Splatting and Depth

投稿日: 2024年10月18日作成者: jarxiv

要約ガウススプラッティングとシングル/マルチビュー深度推定は通常、単独で研究 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniDrive: Towards Universal Driving Perception Across Camera Configurations

投稿日: 2024年10月18日作成者: jarxiv

要約ビジョン中心の自動運転は、経済的なセンサーを使用して優れたパフォーマンスを … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

投稿日: 2024年10月18日作成者: jarxiv

要約ビジョンにおける自己回帰モデルをスケールアップすることは、大規模な言語モデ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2

投稿日: 2024年10月18日作成者: jarxiv

要約解剖学的ランドマークは、ナビゲーションや異常検出のための医療画像処理におい … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair

投稿日: 2024年10月17日作成者: jarxiv

要約インフラの表面亀裂は、効率的に修復しないと大幅な劣化や高額なメンテナンスに … 続きを読む →

カテゴリー: cs.CV, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Retrospective Learning from Interactions

Can MLLMs Understand the Deep Implication Behind Chinese Images?

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

DepthSplat: Connecting Gaussian Splatting and Depth

UniDrive: Towards Universal Driving Perception Across Camera Configurations

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2

Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair

最近の投稿

最近のコメント

アーカイブ

カテゴリー