「cs.CV」カテゴリーアーカイブ

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

投稿日: 2025年3月5日作成者: jarxiv

要約 3Dアフォーダンス検出は、さまざまなロボットタスクに関する幅広いアプリケー … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy

投稿日: 2025年3月4日作成者: jarxiv

要約言語条件付きロボット操作ポリシーを新しいタスクに汎化することは、適切なシミ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers

投稿日: 2025年3月4日作成者: jarxiv

要約自律走行では、動的環境やコーナーケースが、自車両の意思決定のロバスト性に大 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

投稿日: 2025年3月4日作成者: jarxiv

要約本研究では、多指ロボットハンドが多様な姿勢で多様な物体を操作するための、視 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

投稿日: 2025年3月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル(MLLM)は素晴らしい能力を発揮してきた。 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

投稿日: 2025年3月4日作成者: jarxiv

要約人工知能(AI)は、ヘルスケア、特に疾病診断や治療計画において大きな可能性 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

投稿日: 2025年3月4日作成者: jarxiv

要約最近のタイムステップ拡散モデルの進歩により、非拡散マルチステップモデルに匹 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models

投稿日: 2025年3月4日作成者: jarxiv

要約ディープラーニングの急速な発展に伴い、物体検出器は目覚ましい性能を発揮して … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

投稿日: 2025年3月4日作成者: jarxiv

要約自律走行シミュレーションシステムは、自動運転データを強化し、複雑で稀な交通 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings

投稿日: 2025年3月4日作成者: jarxiv

要約美術の歴史は、作品の創作方法において大きな変遷があり、創作過程を理解するこ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy

VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models

S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings

最近の投稿

最近のコメント

アーカイブ

カテゴリー