「cs.CV」カテゴリーアーカイブ

Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models

投稿日: 2024年10月11日作成者: jarxiv

要約テキストから画像への生成における拡散モデルの採用の増加により、その信頼性に … 続きを読む →

カテゴリー: cs.CV, cs.LG, stat.ML | コメントを受け付けていません

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

投稿日: 2024年10月11日作成者: jarxiv

要約大規模言語モデル (LLM) の急速な進歩により、その機能をマルチモーダル … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Interactive4D: Interactive 4D LiDAR Segmentation

投稿日: 2024年10月11日作成者: jarxiv

要約インタラクティブなセグメンテーションは、将来の LiDAR データセットの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

投稿日: 2024年10月11日作成者: jarxiv

要約離散拡散モデルは、画像生成やマスクされた言語モデリングなどのタスクでは成功 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

投稿日: 2024年10月11日作成者: jarxiv

要約この論文では、身体化された AI における 3D 空間認識の重要性を強調す … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

投稿日: 2024年10月11日作成者: jarxiv

要約現在の大規模マルチモーダルモデル (LMM) は、モデルが言語コンポーネ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

投稿日: 2024年10月11日作成者: jarxiv

要約単一点教師あり指向物体検出は注目を集め、コミュニティ内で初期の進歩を遂げま … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

投稿日: 2024年10月11日作成者: jarxiv

要約大規模ビジョン言語事前トレーニング (VLP) モデル (CLIP など) … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Reliable Probabilistic Human Trajectory Prediction for Autonomous Applications

投稿日: 2024年10月11日作成者: jarxiv

要約車両やロボットなどの自律システムでは、人間と機械の安全なインタラクションの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

投稿日: 2024年10月11日作成者: jarxiv

要約視覚言語ナビゲーション (VLN) として知られる、言語指示と視覚情報に基 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Interactive4D: Interactive 4D LiDAR Segmentation

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

Reliable Probabilistic Human Trajectory Prediction for Autonomous Applications

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

最近の投稿

最近のコメント

アーカイブ

カテゴリー