月別アーカイブ: 2025年5月

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

投稿日: 2025年5月28日作成者: jarxiv

要約大規模なマルチモダリティモデル（LMM）は、視覚的理解と生成に大きな進歩を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration

投稿日: 2025年5月28日作成者: jarxiv

要約リモートセンシング画像のオブジェクト検出は、極端なスケールの変動、密なオブ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

投稿日: 2025年5月28日作成者: jarxiv

要約最新のシングルイメージスーパー解像度（SISR）モデルは、訓練されているス … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics

投稿日: 2025年5月28日作成者: jarxiv

要約科学、ビジネス、およびコミュニケーションのコンテキストにおけるチャートの中 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

投稿日: 2025年5月28日作成者: jarxiv

要約 COTの推論とトレーニング後のRLの最近の進歩は、MLLMのビデオ推論機能 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

投稿日: 2025年5月28日作成者: jarxiv

要約超高解像度（UHR）リモートセンシング（RS）画像は、地球観測に貴重なデー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility

投稿日: 2025年5月28日作成者: jarxiv

要約この作品は、新しいテキストからベクトルへのグラフィック生成アプローチである … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding

投稿日: 2025年5月28日作成者: jarxiv

要約 Pointmambaなどの状態空間モデル（SSM）は、線形の複雑さを伴うポ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

投稿日: 2025年5月28日作成者: jarxiv

要約強力な表現学習機能を活用して、深いマルチビュークラスタリング方法は、近年、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

投稿日: 2025年5月28日作成者: jarxiv

要約ますます現実的に生成されるAIの時代には、詐欺と偽情報を緩和するためには、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.CY | コメントを受け付けていません

月別アーカイブ: 2025年5月

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility

ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

最近の投稿

最近のコメント

アーカイブ

カテゴリー