月別アーカイブ: 2025年1月

DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

投稿日: 2025年1月9日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) は、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation

投稿日: 2025年1月9日作成者: jarxiv

要約ビジョントランスフォーマー (ViT) は、長距離相関を捕捉することによ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

投稿日: 2025年1月9日作成者: jarxiv

要約グラフの解釈は視覚的なデータ分析にとって重要ですが、グラフから情報を正確に … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

RadGPT: Constructing 3D Image-Text Tumor Datasets

投稿日: 2025年1月9日作成者: jarxiv

要約米国では年間 8,500 万件を超える CT スキャンが実行されており、放 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

投稿日: 2025年1月9日作成者: jarxiv

要約単一画像の 3D オブジェクトの再構成の問題を研究します。最近の研究は、 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Re-ranking the Context for Multimodal Retrieval Augmented Generation

投稿日: 2025年1月9日作成者: jarxiv

要約検索拡張生成 (RAG) は、外部知識を組み込んで大規模言語モデル (LL … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.IT, cs.LG, math.IT | コメントを受け付けていません

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation

投稿日: 2025年1月9日作成者: jarxiv

要約特殊なドメインタスクで優れた性能を発揮するように設計された、ゼロショット … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Grokking at the Edge of Numerical Stability

投稿日: 2025年1月9日作成者: jarxiv

要約グロッキング (長期にわたる過学習の後に起こる突然の一般化) は、深層学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, stat.ML | コメントを受け付けていません

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

投稿日: 2025年1月9日作成者: jarxiv

要約テキストからビデオへの生成は、普及モデルを通じて目覚ましい進歩を遂げました … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EditAR: Unified Conditional Generation with Autoregressive Models

投稿日: 2025年1月9日作成者: jarxiv

要約制御可能な画像の生成と編集における最近の進歩は、主に拡散ベースの方法によっ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年1月

DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

RadGPT: Constructing 3D Image-Text Tumor Datasets

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation

Grokking at the Edge of Numerical Stability

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

EditAR: Unified Conditional Generation with Autoregressive Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー