「cs.CV」カテゴリーアーカイブ

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

投稿日: 2024年10月23日作成者: jarxiv

要約最近、マルチモーダル大規模言語モデル (MLLM) は、その卓越したクロス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Automated Spinal MRI Labelling from Reports Using a Large Language Model

投稿日: 2024年10月23日作成者: jarxiv

要約私たちは、大規模な言語モデルを使用して放射線医学レポートからのラベルの抽出 … 続きを読む →

カテゴリー: cs.CL, cs.CV, eess.IV | コメントを受け付けていません

Frontiers in Intelligent Colonoscopy

投稿日: 2024年10月23日作成者: jarxiv

要約結腸内視鏡検査は現在、結腸直腸がんの最も感度の高いスクリーニング法の 1 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

投稿日: 2024年10月23日作成者: jarxiv

要約我々は、ラージビュー合成モデル (LVSM) を提案します。これは、スパ … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

投稿日: 2024年10月23日作成者: jarxiv

要約対比損失は表現学習の強力なアプローチであり、バッチサイズが大きくなると、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

投稿日: 2024年10月23日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) では … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

投稿日: 2024年10月23日作成者: jarxiv

要約英語以外の言語での大規模マルチモーダルモデル (LMM) に関する研究を … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

投稿日: 2024年10月23日作成者: jarxiv

要約我々は、3D ガウススプラッティング (3DGS) と物理ベースレンダ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Altogether: Image Captioning via Re-aligning Alt-text

投稿日: 2024年10月23日作成者: jarxiv

要約このペーパーでは、画像キャプションの品質を向上させるための合成データの作成 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

投稿日: 2024年10月23日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は、幅広い領域にわたる視覚言 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

Automated Spinal MRI Labelling from Reports Using a Large Language Model

Frontiers in Intelligent Colonoscopy

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes

Altogether: Image Captioning via Re-aligning Alt-text

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

最近の投稿

最近のコメント

アーカイブ

カテゴリー