投稿者「jarxiv」のアーカイブ

Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling

投稿日: 2025年1月9日作成者: jarxiv

要約正規の製品ビュー内の分離された衣服の画像と人物の別個の画像が与えられた場合 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

投稿日: 2025年1月9日作成者: jarxiv

要約マルチモーダルモデルの最近の進歩により、視覚認識、推論能力、視覚言語理解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

投稿日: 2025年1月9日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) は、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation

投稿日: 2025年1月9日作成者: jarxiv

要約ビジョントランスフォーマー (ViT) は、長距離相関を捕捉することによ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

投稿日: 2025年1月9日作成者: jarxiv

要約グラフの解釈は視覚的なデータ分析にとって重要ですが、グラフから情報を正確に … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

RadGPT: Constructing 3D Image-Text Tumor Datasets

投稿日: 2025年1月9日作成者: jarxiv

要約米国では年間 8,500 万件を超える CT スキャンが実行されており、放 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

投稿日: 2025年1月9日作成者: jarxiv

要約単一画像の 3D オブジェクトの再構成の問題を研究します。最近の研究は、 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Re-ranking the Context for Multimodal Retrieval Augmented Generation

投稿日: 2025年1月9日作成者: jarxiv

要約検索拡張生成 (RAG) は、外部知識を組み込んで大規模言語モデル (LL … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.IT, cs.LG, math.IT | コメントを受け付けていません

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation

投稿日: 2025年1月9日作成者: jarxiv

要約特殊なドメインタスクで優れた性能を発揮するように設計された、ゼロショット … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Grokking at the Edge of Numerical Stability

投稿日: 2025年1月9日作成者: jarxiv

要約グロッキング (長期にわたる過学習の後に起こる突然の一般化) は、深層学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, stat.ML | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation

Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations

RadGPT: Constructing 3D Image-Text Tumor Datasets

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation

Grokking at the Edge of Numerical Stability

最近の投稿

最近のコメント

アーカイブ

カテゴリー