「cs.CV」カテゴリーアーカイブ

PaliGemma: A versatile 3B VLM for transfer

投稿日: 2024年10月11日作成者: jarxiv

要約 PaliGemma は、SigLIP-So400m ビジョンエンコーダと … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs

投稿日: 2024年10月11日作成者: jarxiv

要約この論文では、視覚情報がモデルの内部常識知識と矛盾する、マルチモーダル大規 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

OpenDAS: Open-Vocabulary Domain Adaptation for Segmentation

投稿日: 2024年10月11日作成者: jarxiv

要約最近、視覚言語モデル (VLM) は、事前定義されたオブジェクトクラスの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Progressive Autoregressive Video Diffusion Models

投稿日: 2024年10月11日作成者: jarxiv

要約現在のフロンティアビデオ普及モデルは、高品質ビデオの生成において顕著な結果 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace

投稿日: 2024年10月11日作成者: jarxiv

要約術前に取得した CT スキャンと X 線画像の術中の 2D-3D レジスト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

投稿日: 2024年10月11日作成者: jarxiv

要約拡散モデルは、ビジュアル生成の主要なアプローチとなっています。これらは、 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Agent S: An Open Agentic Framework that Uses Computers Like a Human

投稿日: 2024年10月11日作成者: jarxiv

要約 Agent S は、グラフィカルユーザーインターフェイス (GUI) … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Visual Scratchpads: Enabling Global Reasoning in Vision

投稿日: 2024年10月11日作成者: jarxiv

要約最新の視覚モデルは、局所的な特徴がターゲットに関する重要な情報を提供するベ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

投稿日: 2024年10月11日作成者: jarxiv

要約トレーニング中にペアの合成シーン画像を必要としない、効果的なゼロショット … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

On the Evaluation of Generative Robotic Simulations

投稿日: 2024年10月11日作成者: jarxiv

要約広範な現実世界のデータを取得するのが難しいため、ロボットシミュレーション … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

PaliGemma: A versatile 3B VLM for transfer

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs

OpenDAS: Open-Vocabulary Domain Adaptation for Segmentation

Progressive Autoregressive Video Diffusion Models

RayEmb: Arbitrary Landmark Detection in X-Ray Images Using Ray Embedding Subspace

DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Visual Scratchpads: Enabling Global Reasoning in Vision

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

On the Evaluation of Generative Robotic Simulations

最近の投稿

最近のコメント

アーカイブ

カテゴリー