「cs.CV」カテゴリーアーカイブ

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

投稿日: 2025年3月26日作成者: jarxiv

要約多様なロボットデータセットでトレーニングされた最近のビジョン言語アクション … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations

投稿日: 2025年3月26日作成者: jarxiv

要約 3Dシーンの理解は、自然言語を介した相互作用を可能にするオープンボキャブラ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer’s disease MRI dataset using explainable deep learning

投稿日: 2025年3月26日作成者: jarxiv

要約背景。深いニューラルネットワークは、アルツハイマー病（AD）を分類する際 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts

投稿日: 2025年3月26日作成者: jarxiv

要約セグメンテーションはコンピュータービジョンの基本的なタスクであり、柔軟性の … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

投稿日: 2025年3月26日作成者: jarxiv

要約ビジョンと言語モデル（VLM）を使用して、オープンボキャブラリーセマンティ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

投稿日: 2025年3月26日作成者: jarxiv

要約大型ビジョン言語モデル（LVLMS）は、視覚的な質問応答や画像キャプション … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

投稿日: 2025年3月26日作成者: jarxiv

要約大規模な対照的な視覚言語のプリトレーニングは、視覚表現学習に大きな進歩を示 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models

投稿日: 2025年3月26日作成者: jarxiv

要約 Vision-Language Models（VLM）は最近、画像キャプシ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

投稿日: 2025年3月26日作成者: jarxiv

要約大規模なビジョン言語モデル（VLM）は、タスク固有のトレーニングなしで、プ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

OpenSDI: Spotting Diffusion-Generated Images in the Open World

投稿日: 2025年3月26日作成者: jarxiv

要約このペーパーでは、Opensdiを特定します。これは、オープンワールド設定 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations

Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer’s disease MRI dataset using explainable deep learning

BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

OpenSDI: Spotting Diffusion-Generated Images in the Open World

最近の投稿

最近のコメント

アーカイブ

カテゴリー