「cs.CV」カテゴリーアーカイブ

PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model

投稿日: 2025年3月25日作成者: jarxiv

要約大規模なビジョン言語モデル（LVLMS）の既存の多言語ベンチマークは、言語 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

投稿日: 2025年3月25日作成者: jarxiv

要約大規模なビジョン言語モデル（LVLMS）の大幅な成功にもかかわらず、これら … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

投稿日: 2025年3月25日作成者: jarxiv

要約最近、グラフィカルユーザーインターフェイス（GUI）を直接知覚し、対応する … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

投稿日: 2025年3月25日作成者: jarxiv

要約フォーミュラ認識は、複雑な構造と数学的表現のさまざまな表記による重要な課題 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Believing is Seeing: Unobserved Object Detection using Generative Models

投稿日: 2025年3月25日作成者: jarxiv

要約画像には見えないが、カメラの近くにあるオブジェクトは検出できますか？この … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

Any6D: Model-free 6D Pose Estimation of Novel Objects

投稿日: 2025年3月25日作成者: jarxiv

要約 6Dオブジェクトポーズ推定のモデルフリーフレームワークであるAny6Dを紹 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

投稿日: 2025年3月25日作成者: jarxiv

要約手術ビデオのツール追跡は、スキル評価、安全ゾーンの推定、人間のコラボレーシ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Understanding Model Calibration — A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

投稿日: 2025年3月25日作成者: jarxiv

要約信頼できると見なされるためには、各決定に対する信頼が真の結果を密接に反映す … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, stat.ME, stat.ML | コメントを受け付けていません

RankCLIP: Ranking-Consistent Language-Image Pretraining

投稿日: 2025年3月25日作成者: jarxiv

要約クリップなどの自己監視対照学習モデルは、多くの下流タスクでビジョン言語モデ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos

投稿日: 2025年3月25日作成者: jarxiv

要約エゴセントリックのオープンスラージビデオは、手術室での外科的処置と人間の行 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Believing is Seeing: Unobserved Object Detection using Generative Models

Any6D: Model-free 6D Pose Estimation of Novel Objects

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Understanding Model Calibration — A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

RankCLIP: Ranking-Consistent Language-Image Pretraining

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos

最近の投稿

最近のコメント

アーカイブ

カテゴリー