「cs.AI」カテゴリーアーカイブ

Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models

投稿日: 2024年5月6日作成者: jarxiv

要約ロボット工学とコンピュータビジョンの分野では、複雑な環境を理解し相互作用で … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks

投稿日: 2024年5月6日作成者: jarxiv

要約本稿では、ニューラルネットワークにおける概念を説明するための最近のアプロー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.NE | コメントを受け付けていません

Forensic License Plate Recognition with Compression-Informed Transformers

投稿日: 2024年5月6日作成者: jarxiv

要約フォレンジックナンバープレート認識(FLPR)は、犯罪捜査のような法的な文 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Visual Enumeration is Challenging for Large-scale Generative AI

投稿日: 2024年5月6日作成者: jarxiv

要約このような技能は、多くの動物種や、言語発達や正式な学校教育を受ける前の乳幼 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.NE | コメントを受け付けていません

Zero-shot generalization across architectures for visual classification

投稿日: 2024年5月6日作成者: jarxiv

要約未知のデータへの汎化はディープネットワークにとって重要な課題であるが、その … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, I.2.6 | コメントを受け付けていません

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

投稿日: 2024年5月6日作成者: jarxiv

要約我々は、画像分類を解釈可能にするためのトランスフォーマーの新しい使い方を提 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Improving Interpretation Faithfulness for Vision Transformers

投稿日: 2024年5月6日作成者: jarxiv

要約ヴィジョン・トランスフォーマー（ViT）は、様々な視覚タスクにおいて最先端 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

What matters when building vision-language models?

投稿日: 2024年5月6日作成者: jarxiv

要約視覚言語モデル（VLM）への関心の高まりは、大規模言語モデルや視覚変換器の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

投稿日: 2024年5月6日作成者: jarxiv

要約 Vibe-Evalは、マルチモーダルチャットモデルを評価するための新しいオ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

投稿日: 2024年5月6日作成者: jarxiv

要約大規模言語モデル(LLM)は、数学的推論に関する多くのベンチマークで目覚ま … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models

From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks

Forensic License Plate Recognition with Compression-Informed Transformers

Visual Enumeration is Challenging for Large-scale Generative AI

Zero-shot generalization across architectures for visual classification

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Improving Interpretation Faithfulness for Vision Transformers

What matters when building vision-language models?

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

最近の投稿

最近のコメント

アーカイブ

カテゴリー