月別アーカイブ: 2025年5月

A Deep Learning-Driven Inhalation Injury Grading Assistant Using Bronchoscopy Images

投稿日: 2025年5月16日作成者: jarxiv

要約吸入損傷は、短縮損傷スコア（AIS）が主観的であり、機械的換気期間や患者の … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

A portable diagnosis model for Keratoconus using a smartphone

投稿日: 2025年5月16日作成者: jarxiv

要約ケラトコノス（KC）は角膜障害であり、ぼやけて歪んだ視力をもたらします。 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models

投稿日: 2025年5月16日作成者: jarxiv

要約投機的デコードは、軽量のドラフトモデルが複数のターゲットモデルが同時に検証 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Multi-Image Question Answering via Submodular Subset Selection

投稿日: 2025年5月16日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、単一の画像を含むビジョン言語タス … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis

投稿日: 2025年5月16日作成者: jarxiv

要約最近の進歩により、マルチイメージ情報を理解するためのマルチモーダル大手言語 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

投稿日: 2025年5月16日作成者: jarxiv

要約模倣は人間の基本的な学習メカニズムであり、個人が専門家を観察し模倣すること … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data

投稿日: 2025年5月16日作成者: jarxiv

要約光リアリックな拡散モデルの開発により、合成データで部分的または完全にトレー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

投稿日: 2025年5月16日作成者: jarxiv

要約スケーラブルなベクトルグラフィックス（SVG）は、解像度の独立性とよく組織 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

投稿日: 2025年5月16日作成者: jarxiv

要約大規模なマルチモーダルモデルのトレーニングに広く使用されている自然言語画像 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

End-to-End Vision Tokenizer Tuning

投稿日: 2025年5月16日作成者: jarxiv

要約既存の視覚トークン化は、視覚トークンがさまざまなタスク、例えば画像生成や視 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年5月

A Deep Learning-Driven Inhalation Injury Grading Assistant Using Bronchoscopy Images

A portable diagnosis model for Keratoconus using a smartphone

MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models

Enhancing Multi-Image Question Answering via Submodular Subset Selection

Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

End-to-End Vision Tokenizer Tuning

最近の投稿

最近のコメント

アーカイブ

カテゴリー