「cs.CV」カテゴリーアーカイブ

Spline-based Transformers

投稿日: 2025年4月4日作成者: jarxiv

要約我々は、スプラインベースのトランスフォーマーを紹介する。スプラインベースの … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence

投稿日: 2025年4月4日作成者: jarxiv

要約大規模な視覚言語モデルは、AI主導の画像理解のための新しいパラダイムを提供 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

F-ViTA: Foundation Model Guided Visible to Thermal Translation

投稿日: 2025年4月4日作成者: jarxiv

要約赤外線画像は、特に低照度や夜間の状況把握に欠かせない。しかし、赤外線画像の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

投稿日: 2025年4月4日作成者: jarxiv

要約 BOPチャレンジ2024の評価方法、データセット、結果について発表する。B … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers

投稿日: 2025年4月4日作成者: jarxiv

要約アラビア語の手書きテキスト認識(HTR)は、多様な書き方やアラビア文字固有 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

投稿日: 2025年4月4日作成者: jarxiv

要約多くの3D生成モデルは、コンパクトな形状表現を学習するために変分オートエン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings

投稿日: 2025年4月4日作成者: jarxiv

要約ある特徴が幾何学的な変換の下でも不変である対称性は、畳み込みニューラルネッ … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV, eess.SP | コメントを受け付けていません

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

投稿日: 2025年4月4日作成者: jarxiv

要約スパースオートエンコーダ(SAE)は近年、大規模言語モデル(LLM)におけ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

投稿日: 2025年4月4日作成者: jarxiv

要約コンピュータ支援スクリーニング（CAS）システムの進歩は、X線手荷物検査に … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

投稿日: 2025年4月4日作成者: jarxiv

要約大規模視覚言語モデル（LVLM）における幻覚の軽減は、依然として未解決の問 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Spline-based Transformers

Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence

F-ViTA: Foundation Model Guided Visible to Thermal Translation

BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー