「cs.CV」カテゴリーアーカイブ

KOSMOS-2.5: A Multimodal Literate Model

投稿日: 2024年8月22日作成者: jarxiv

要約テキスト中心の画像の自動読み取りは、汎用人工知能 (AGI) の実現に向け … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

A Survey for Foundation Models in Autonomous Driving

投稿日: 2024年8月22日作成者: jarxiv

要約基礎モデルの出現により、自然言語処理とコンピュータービジョンの分野に革命 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Exploiting Diffusion Prior for Out-of-Distribution Detection

投稿日: 2024年8月22日作成者: jarxiv

要約配布外 (OOD) の検出は、特にセキュリティが重要な領域において、堅牢な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

投稿日: 2024年8月22日作成者: jarxiv

要約画像融合タスクでは、さまざまなソースからの画像が異なる特徴を持っています。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

投稿日: 2024年8月22日作成者: jarxiv

要約ビデオシャドウ検出 (VSD) は、フレームシーケンスを使用してシャド … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

投稿日: 2024年8月22日作成者: jarxiv

要約ドメイン一般化された核セグメンテーションは、ソースドメインから学習した知 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework

投稿日: 2024年8月22日作成者: jarxiv

要約現在のビデオ生成モデルは、短くてリアルなクリップの作成には優れていますが、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.SE, TsingHua University | コメントを受け付けていません

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

投稿日: 2024年8月22日作成者: jarxiv

要約マルチモーダル研究の分野では、多くの研究が実質的な画像とテキストのペアを活 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models

投稿日: 2024年8月22日作成者: jarxiv

要約従来のビジュアルストーリーテリングは複雑であり、専門的な知識と多大なリソー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

投稿日: 2024年8月22日作成者: jarxiv

要約デモンストレーションから学習することは、特に遠隔操作システムで最近収集され … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

KOSMOS-2.5: A Multimodal Literate Model

A Survey for Foundation Models in Autonomous Driving

Exploiting Diffusion Prior for Out-of-Distribution Detection

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models

ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

最近の投稿

最近のコメント

アーカイブ

カテゴリー