月別アーカイブ: 2024年5月

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

投稿日: 2024年5月27日作成者: jarxiv

要約大規模言語モデル (LLM) は、画像分類を含む多くのコンピュータービジ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

投稿日: 2024年5月27日作成者: jarxiv

要約 3D ガウススプラッティングの進歩により、3D の再構築と生成が大幅に加 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

投稿日: 2024年5月27日作成者: jarxiv

要約データ駆動型の自動運転モーション生成タスクは、データセットサイズの制限と … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

投稿日: 2024年5月27日作成者: jarxiv

要約実用的なアプリケーションとしての大規模視覚言語モデル (LVLM) への最 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models

投稿日: 2024年5月27日作成者: jarxiv

要約テキストから画像への拡散モデルは、柔軟でリアルな画像合成のための前例のない … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

投稿日: 2024年5月27日作成者: jarxiv

要約視覚と言語モダリティの間のギャップを埋めるために、マルチモーダル大規模言語 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models

投稿日: 2024年5月27日作成者: jarxiv

要約従来の人口統計推論手法は、主に正確にラベル付けされたデータの監視下で運用さ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes

投稿日: 2024年5月27日作成者: jarxiv

要約教師なし 3D 物体検出手法は、トレーニングに手動ラベルを必要とせずに、膨 … 続きを読む →

カテゴリー: 62H35, 68T05, 68T10, 68U10, cs.CV, I.2.10 | コメントを受け付けていません

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

投稿日: 2024年5月27日作成者: jarxiv

要約ガウススプラッティング (3DGS) に基づく高品質のシーンの再構成と新 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

投稿日: 2024年5月27日作成者: jarxiv

要約 ControlNet は、深度マップ、落書き/スケッチ、人間のポーズなど、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年5月

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

SMART: Scalable Multi-agent Real-time Simulation via Next-token Prediction

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models

UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

最近の投稿

最近のコメント

アーカイブ

カテゴリー