「cs.CV」カテゴリーアーカイブ

AMA-SAM: Adversarial Multi-Domain Alignment of Segment Anything Model for High-Fidelity Histology Nuclei Segmentation

投稿日: 2025年3月28日作成者: jarxiv

要約組織病理学の画像における細胞核の正確なセグメンテーションは、多数の生物医学 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

投稿日: 2025年3月28日作成者: jarxiv

要約深い思考モデルの最近の進歩により、数学的およびコーディングタスクに関する顕 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX

投稿日: 2025年3月28日作成者: jarxiv

要約フロンティアモデルは言語のみであるか、主にビジョンと言語のモダリティに焦点 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.SD | コメントを受け付けていません

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

投稿日: 2025年3月28日作成者: jarxiv

要約大規模なビジョン言語モデルの進歩により、正確で正確な画像キャプションがもた … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.DB | コメントを受け付けていません

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

投稿日: 2025年3月28日作成者: jarxiv

要約拡散技術の最近の進歩により、画像とビデオ生成が前例のないレベルの品質を推進 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

投稿日: 2025年3月28日作成者: jarxiv

要約 Slowaffast-llava-1.5（SF-llava-1.5として省 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance

投稿日: 2025年3月28日作成者: jarxiv

要約テキスト間合成の評価は、確立されたメトリックと人間の好みとの間の不整合のた … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation

投稿日: 2025年3月28日作成者: jarxiv

要約閉塞は、3Dハンドポーズを推定する際の挑戦的な問題の1つです。この問題は … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography

投稿日: 2025年3月28日作成者: jarxiv

要約対照的な言語イメージ前訓練（CLIP）は、医療画像分析に強い可能性を示して … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

投稿日: 2025年3月28日作成者: jarxiv

要約一時的な認識、質問が提起されたときにタイムスタンプに基づいて動的に推論する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

AMA-SAM: Adversarial Multi-Domain Alignment of Segment Anything Model for High-Fidelity Histology Nuclei Segmentation

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance

OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation

Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

最近の投稿

最近のコメント

アーカイブ

カテゴリー