「cs.CV」カテゴリーアーカイブ

Captured by Captions: On Memorization and its Mitigation in CLIP Models

投稿日: 2025年5月20日作成者: jarxiv

要約クリップなどのマルチモーダルモデルは、画像検索やゼロショット分類などのタス … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

JetFormer: An Autoregressive Generative Model of Raw Images and Text

投稿日: 2025年5月20日作成者: jarxiv

要約モデリングの制約を削除し、ドメイン全体でアーキテクチャを統合することは、大 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning

投稿日: 2025年5月20日作成者: jarxiv

要約大きなビジョン言語モデル（LVLMS）の急速な進歩にもかかわらず、既存のビ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Joint Depth and Reflectivity Estimation using Single-Photon LiDAR

投稿日: 2025年5月20日作成者: jarxiv

要約単一光子光の検出と範囲（SPライダーは、長距離、高精度の3D視力タスクの主 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Anomaly Anything: Promptable Unseen Visual Anomaly Generation

投稿日: 2025年5月20日作成者: jarxiv

要約視覚異常検出（AD）は、異常なデータサンプルの希少性のために重要な課題を提 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning

投稿日: 2025年5月20日作成者: jarxiv

要約この作業では、問題の困難の事前情報を明示的にモデル化することで、マルチモー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DB3D-L: Depth-aware BEV Feature Transformation for Accurate 3D Lane Detection

投稿日: 2025年5月20日作成者: jarxiv

要約 3Dレーン検出は、自律運転において重要な役割を果たします。最近の進歩は、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Quantifying Context Bias in Domain Adaptation for Object Detection

投稿日: 2025年5月20日作成者: jarxiv

要約オブジェクト検出のためのドメイン適応（DAOD）は、訓練されたモデルをソー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

Event-Driven Dynamic Scene Depth Completion

投稿日: 2025年5月20日作成者: jarxiv

要約ダイナミックシーンの深さの完了は、RGB画像やLIDAR測定などの入力モダ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts

投稿日: 2025年5月20日作成者: jarxiv

要約機械学習（ML）モデルの急速な改善により、認知科学者は人間の考え方との整合 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Captured by Captions: On Memorization and its Mitigation in CLIP Models

JetFormer: An Autoregressive Generative Model of Raw Images and Text

FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning

Joint Depth and Reflectivity Estimation using Single-Photon LiDAR

Anomaly Anything: Promptable Unseen Visual Anomaly Generation

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning

DB3D-L: Depth-aware BEV Feature Transformation for Accurate 3D Lane Detection

Quantifying Context Bias in Domain Adaptation for Object Detection

Event-Driven Dynamic Scene Depth Completion

Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts

最近の投稿

最近のコメント

アーカイブ

カテゴリー