「cs.CV」カテゴリーアーカイブ

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

投稿日: 2025年5月29日作成者: jarxiv

要約世界モデルの最近の進歩は、動的環境シミュレーションに革命をもたらし、システ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Zero-Shot 3D Visual Grounding from Vision-Language Models

投稿日: 2025年5月29日作成者: jarxiv

要約 3D Visual Grounding（3DVG）は、自然言語の説明を使用 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

投稿日: 2025年5月29日作成者: jarxiv

要約トレーニング後の段階でのマルチモーダル大手言語モデル（MLLMS）の改善は … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Fostering Video Reasoning via Next-Event Prediction

投稿日: 2025年5月29日作成者: jarxiv

要約次のトークン予測は、LLMSの推論を可能にする基礎学習タスクとして機能しま … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Universal Domain Adaptation for Semantic Segmentation

投稿日: 2025年5月29日作成者: jarxiv

要約セマンティックセグメンテーション（UDA-SS）の監視されていないドメイン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels

投稿日: 2025年5月29日作成者: jarxiv

要約 3D占有予測は、強力な幾何学的認識とオブジェクト認識能力のために、自律運転 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Single Domain Generalization for Alzheimer’s Detection from 3D MRIs with Pseudo-Morphological Augmentations and Contrastive Learning

投稿日: 2025年5月29日作成者: jarxiv

要約 AlzheimerのMRISによる疾患検出は、現代の深い学習モデルのおかげ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

投稿日: 2025年5月29日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLRMS）の出現により、強化学習と考え方（ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Closer Look at Multimodal Representation Collapse

投稿日: 2025年5月29日作成者: jarxiv

要約私たちは、モダリティ崩壊の基本的な理解を開発することを目指しています。これ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Understanding Adversarial Training with Energy-based Models

投稿日: 2025年5月29日作成者: jarxiv

要約エネルギーベースのモデル（EBM）フレームワークを使用して、分類器の敵対的 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

Zero-Shot 3D Visual Grounding from Vision-Language Models

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Fostering Video Reasoning via Next-Event Prediction

Universal Domain Adaptation for Semantic Segmentation

SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels

Single Domain Generalization for Alzheimer’s Detection from 3D MRIs with Pseudo-Morphological Augmentations and Contrastive Learning

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

A Closer Look at Multimodal Representation Collapse

Understanding Adversarial Training with Energy-based Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー