月別アーカイブ: 2024年6月

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

投稿日: 2024年6月28日作成者: jarxiv

要約大規模なビデオ言語モデル (VLM) の事前トレーニングは、さまざまなダウ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Shortcut Learning in Medical Image Segmentation

投稿日: 2024年6月28日作成者: jarxiv

要約ショートカット学習とは、機械学習モデルが、トレーニングセットを超えて一般 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

投稿日: 2024年6月28日作成者: jarxiv

要約グラフィカルユーザーインターフェイス (GUI) は、デジタルデバイ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

投稿日: 2024年6月28日作成者: jarxiv

要約 Web スクリーンショットやポスターなどとして一般的に見られるマルチパネル … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

投稿日: 2024年6月28日作成者: jarxiv

要約 GPT-4V などのマルチモーダル大規模言語モデル (MLLM) の急速な … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Human Modelling and Pose Estimation Overview

投稿日: 2024年6月28日作成者: jarxiv

要約人間のモデリングと姿勢推定は、コンピュータービジョン、コンピューターグ … 続きを読む →

カテゴリー: cs.CV, I.4.8 | コメントを受け付けていません

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

投稿日: 2024年6月28日作成者: jarxiv

要約継続的学習は、以前のタスクのパフォーマンス低下を最小限に抑えながら新しいタ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Compositional Image Decomposition with Diffusion Models

投稿日: 2024年6月28日作成者: jarxiv

要約自然のシーンの画像が与えられると、それをオブジェクト、照明、影、前景などの … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

PNeRV: A Polynomial Neural Representation for Videos

投稿日: 2024年6月28日作成者: jarxiv

要約ビデオデータ上の Implicit Neural Representat … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Mapping Land Naturalness from Sentinel-2 using Deep Contextual and Geographical Priors

投稿日: 2024年6月28日作成者: jarxiv

要約ここ数十年で、気候変動の原因と結果は加速し、前例のない規模で地球に影響を与 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年6月

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

Shortcut Learning in Medical Image Segmentation

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Human Modelling and Pose Estimation Overview

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

Compositional Image Decomposition with Diffusion Models

PNeRV: A Polynomial Neural Representation for Videos

Mapping Land Naturalness from Sentinel-2 using Deep Contextual and Geographical Priors

最近の投稿

最近のコメント

アーカイブ

カテゴリー