投稿者「jarxiv」のアーカイブ

SpikeGen: Generative Framework for Visual Spike Stream Processing

投稿日: 2025年5月26日作成者: jarxiv

要約スパイクカメラなどの神経型の視覚システムは、動的な条件下で透明なテクスチャ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

投稿日: 2025年5月26日作成者: jarxiv

要約視覚変圧器は、より大きく、より正確で、計算がより高価です。トークンの数は … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching

投稿日: 2025年5月26日作成者: jarxiv

要約存在した心エコー検査セグメンテーション法は、形状の変動、部分観察、および2 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation

投稿日: 2025年5月26日作成者: jarxiv

要約視覚言語モデル（VLM）をダウンストリームタスクに適応させるために広く採用 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Multi-Faceted Multimodal Monosemanticity

投稿日: 2025年5月26日作成者: jarxiv

要約人間は、ビジョン、言語、スピーチなどの複数のモダリティを通じて世界を経験し … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer

投稿日: 2025年5月26日作成者: jarxiv

要約背景：壁外血管浸潤（EVI）およびメソレクトル筋膜浸潤（MFI）の正確なM … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Semantic Correspondence: Unified Benchmarking and a Strong Baseline

投稿日: 2025年5月26日作成者: jarxiv

要約セマンティック対応を確立することは、キーポイントを異なる画像間で同じセマン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

投稿日: 2025年5月26日作成者: jarxiv

要約 Clipを効果的で一般化可能なFace Forgery Detectorに … 続きを読む →

カテゴリー: cs.CR, cs.CV, cs.LG | コメントを受け付けていません

DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

投稿日: 2025年5月26日作成者: jarxiv

要約制御可能なビデオ生成（CVG）は迅速に進歩していますが、複数のアクターが騒 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

投稿日: 2025年5月26日作成者: jarxiv

要約長型のビデオ理解は、広範な時間空間的な複雑さと、そのような拡張されたコンテ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

SpikeGen: Generative Framework for Visual Spike Stream Processing

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation

Multi-Faceted Multimodal Monosemanticity

A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer

Semantic Correspondence: Unified Benchmarking and a Strong Baseline

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー