「cs.CV」カテゴリーアーカイブ

ControlAR: Controllable Image Generation with Autoregressive Models

投稿日: 2024年10月4日作成者: jarxiv

要約自己回帰(AR)モデルは、画像生成をネクストトーク予測として再構築し、顕著 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LLaVA-Critic: Learning to Evaluate Multimodal Models

投稿日: 2024年10月4日作成者: jarxiv

要約 LLaVA-Criticを紹介する。LLaVA-Criticは、幅広いマル … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Video Instruction Tuning With Synthetic Data

投稿日: 2024年10月4日作成者: jarxiv

要約動画ラージ・マルチモーダルモデル（LMM）の開発は、ウェブから大量の高品質 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer’s Disease

投稿日: 2024年10月4日作成者: jarxiv

要約アルツハイマー病（AD）は進行性の神経変性疾患であり、高齢化社会の中で有病 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Autoregressive Pre-Training on Pixels and Texts

投稿日: 2024年10月4日作成者: jarxiv

要約視覚情報とテキスト情報の統合は、言語モデルの進歩において有望な方向性を示し … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects

投稿日: 2024年10月4日作成者: jarxiv

要約未知の環境におけるオブジェクトナビゲーションは、実世界のアプリケーションに … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.RO | コメントを受け付けていません

Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation

投稿日: 2024年10月4日作成者: jarxiv

要約近年、基礎モデルや、大規模モデルを下流のタスクに転送する事前学習と適応のパ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Contrastive Localized Language-Image Pre-Training

投稿日: 2024年10月4日作成者: jarxiv

要約コントラスト言語画像事前学習(CLIP)は、様々なアプリケーションを促進す … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

投稿日: 2024年10月4日作成者: jarxiv

要約映像の奥行き推定は、時間的に一貫した奥行きを推測することを目的としている。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

投稿日: 2024年10月4日作成者: jarxiv

要約分単位の長い動画を生成することは、望ましいが難しい。自己回帰型大規模言語モ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

ControlAR: Controllable Image Generation with Autoregressive Models

LLaVA-Critic: Learning to Evaluate Multimodal Models

Video Instruction Tuning With Synthetic Data

AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer’s Disease

Autoregressive Pre-Training on Pixels and Texts

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects

Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation

Contrastive Localized Language-Image Pre-Training

NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー