月別アーカイブ: 2024年6月

Composing Object Relations and Attributes for Image-Text Matching

投稿日: 2024年6月18日作成者: jarxiv

要約画像とテキストのマッチングのための視覚的意味埋め込み問題を研究します。既 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning

投稿日: 2024年6月18日作成者: jarxiv

要約言語および視覚アシスタントの最近の進歩は素晴らしい機能を示していますが、透 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

投稿日: 2024年6月18日作成者: jarxiv

要約フォトリアルな屋内シーンの Blender ベースのプロシージャルジェネ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

投稿日: 2024年6月18日作成者: jarxiv

要約デコーダ専用トランスフォーマに基づく大規模言語モデル (LLM) は、CL … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unveiling Encoder-Free Vision-Language Models

投稿日: 2024年6月18日作成者: jarxiv

要約既存のビジョン言語モデル (VLM) は、主にビジョンエンコーダに依存し … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

投稿日: 2024年6月18日作成者: jarxiv

要約マルチモーダルな人間の入力と通信するための自然で意味のある応答を生成するこ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

投稿日: 2024年6月18日作成者: jarxiv

要約この研究では、大規模で高解像度のデータセット上で高パラメータ 3D ガウス … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

OoDIS: Anomaly Instance Segmentation Benchmark

投稿日: 2024年6月18日作成者: jarxiv

要約自動運転車が安全に走行するには、周囲の環境を正確に理解する必要があります。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

投稿日: 2024年6月18日作成者: jarxiv

要約 VQGAN に代表される画像量子化の領域では、このプロセスにより、事前定義 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

投稿日: 2024年6月18日作成者: jarxiv

要約直接優先最適化 (DPO) は、大規模言語モデル (LLM) の調整に効果 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年6月

Composing Object Relations and Attributes for Image-Text Matching

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Unveiling Encoder-Free Vision-Language Models

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

OoDIS: Anomaly Instance Segmentation Benchmark

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー