「cs.AI」カテゴリーアーカイブ

RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration

投稿日: 2025年5月26日作成者: jarxiv

要約安定した拡散などの潜在的な拡散モデル（LDMS）の使用は、オールインワン画 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation

投稿日: 2025年5月26日作成者: jarxiv

要約視覚言語モデル（VLM）をダウンストリームタスクに適応させるために広く採用 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Multi-Faceted Multimodal Monosemanticity

投稿日: 2025年5月26日作成者: jarxiv

要約人間は、ビジョン、言語、スピーチなどの複数のモダリティを通じて世界を経験し … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

投稿日: 2025年5月26日作成者: jarxiv

要約長型のビデオ理解は、広範な時間空間的な複雑さと、そのような拡張されたコンテ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

投稿日: 2025年5月26日作成者: jarxiv

要約大規模な視覚言語モデル（LVLMS）の最近の進捗により、レポート生成や視覚 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers

投稿日: 2025年5月26日作成者: jarxiv

要約人工知能（AI）は、医療イメージング診断において顕著な可能性を示しています … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression

投稿日: 2025年5月26日作成者: jarxiv

要約大規模な視覚言語モデル（LVLMS）は、特に視覚的な質問応答（MEDVQA … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VideoGameBench: Can Vision-Language Models complete popular video games?

投稿日: 2025年5月26日作成者: jarxiv

要約ビジョン言語モデル（VLMS）は、人間にとって挑戦的なコーディングと数学の … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

投稿日: 2025年5月26日作成者: jarxiv

要約 WonderPlayは、単一の画像からアクションコンディショニングされた動 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

投稿日: 2025年5月26日作成者: jarxiv

要約強化学習（RL）は、ビジョン言語モデル（VLM）の推論を強化するための効果 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation

Multi-Faceted Multimodal Monosemanticity

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression

VideoGameBench: Can Vision-Language Models complete popular video games?

WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー