「cs.CV」カテゴリーアーカイブ

Multibranch Generative Models for Multichannel Imaging with an Application to PET/CT Synergistic Reconstruction

投稿日: 2025年2月4日作成者: jarxiv

要約本論文では、マルチブランチ生成モデルを用いて、医用画像の相乗的再構成を学習 … 続きを読む →

カテゴリー: cs.CV, eess.IV, physics.med-ph | コメントを受け付けていません

Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment

投稿日: 2025年2月4日作成者: jarxiv

要約 CLIPのような視覚言語モデル(VLM)は、卓越した汎化能力を実証しており … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts

投稿日: 2025年2月4日作成者: jarxiv

要約ビデオアクション検出（VAD：Video Action Detection … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

投稿日: 2025年2月4日作成者: jarxiv

要約画像編集の分野では、制御性、背景保存、効率性という3つの核となる課題が残っ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Robust Hyperbolic Learning with Curvature-Aware Optimization

投稿日: 2025年2月4日作成者: jarxiv

要約双曲面深層学習は、代替埋め込み空間によって与えられるユニークな特性のため、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

投稿日: 2025年2月4日作成者: jarxiv

要約大規模言語モデル(LLM)の進歩は、外部ツールを呼び出すためのコントローラ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Reflective Gaussian Splatting

投稿日: 2025年2月4日作成者: jarxiv

要約 NeRFや3DGSに基づく手法の性能向上により、新しいビュー合成は大きく進 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

投稿日: 2025年2月4日作成者: jarxiv

要約ディープモデルを理解することは、セーフティクリティカルなアプリケーションに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

投稿日: 2025年2月4日作成者: jarxiv

要約本テクニカルレポートでは、Prithvi-EO-2.0を紹介します。Pri … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

投稿日: 2025年2月4日作成者: jarxiv

要約デジタルエージェントは、ウェブページ、ソフトウェアアプリケーション、オペレ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Multibranch Generative Models for Multichannel Imaging with an Application to PET/CT Synergistic Reconstruction

Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment

JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

Robust Hyperbolic Learning with Curvature-Aware Optimization

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Reflective Gaussian Splatting

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

最近の投稿

最近のコメント

アーカイブ

カテゴリー