投稿者「jarxiv」のアーカイブ

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

投稿日: 2025年6月10日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、言語のみの対応物の特性と同様の特性であるコ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

投稿日: 2025年6月10日作成者: jarxiv

要約 LlavaやQwen-VLのような生成的大規模マルチモーダルモデル（LMM … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

投稿日: 2025年6月10日作成者: jarxiv

要約推論セグメンテーション（RS）は、暗黙のテキストクエリに基づいてオブジェク … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

投稿日: 2025年6月10日作成者: jarxiv

要約手続き的なアクティビティを理解するには、アクションステップがシーンをどのよ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、多様な問題ドメイン全体で優れた … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, I.2.10 | コメントを受け付けていません

Creating a Historical Migration Dataset from Finnish Church Records, 1800-1920

投稿日: 2025年6月10日作成者: jarxiv

要約この記事では、デジタル化された教会の移動記録を使用して、1800年から19 … 続きを読む →

カテゴリー: cs.CV, I.4.6, J.5 | コメントを受け付けていません

Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

投稿日: 2025年6月10日作成者: jarxiv

要約大規模な言語モデル（LLMS）に基づいて、最近の大規模なマルチモーダルモデ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design

投稿日: 2025年6月10日作成者: jarxiv

要約手動スライドの作成は労働集約的であり、専門家の事前知識が必要です。既存の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、さまざまなマルチモーダルタスク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CyberV: Cybernetics for Test-time Scaling in Video Understanding

投稿日: 2025年6月10日作成者: jarxiv

要約現在のマルチモーダル大手言語モデル（MLLMS）は、テスト時に計算的な要求 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

Creating a Historical Migration Dataset from Finnish Church Records, 1800-1920

Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

CyberV: Cybernetics for Test-time Scaling in Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー