月別アーカイブ: 2024年4月

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

投稿日: 2024年4月3日作成者: jarxiv

要約大規模言語モデル (LLM) をビジョンエンコーダに結合し、大規模ビジョ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

投稿日: 2024年4月3日作成者: jarxiv

要約画像の高品質なセグメンテーションマスクを作成することは、コンピューター … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

投稿日: 2024年4月3日作成者: jarxiv

要約ビジョン言語モデル (VLM) における最近の進歩により、ビジョンコミュ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery

投稿日: 2024年4月3日作成者: jarxiv

要約この研究では、高解像度の光学リモートセンシング衛星画像を使用して船舶を効果 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Iterated Learning Improves Compositionality in Large Vision-Language Models

投稿日: 2024年4月3日作成者: jarxiv

要約人間の視覚と自然言語の両方に共通する基本的な特徴は、その構成的な性質です。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models

投稿日: 2024年4月3日作成者: jarxiv

要約最近の 3D 生成の進歩は主に、インターネット規模の画像データで事前トレー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

投稿日: 2024年4月3日作成者: jarxiv

要約最近、アニメーション化可能な頭部アバターのモデリングにおいて、さまざまな体 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

投稿日: 2024年4月3日作成者: jarxiv

要約オールインワンの画像復元では、劣化ごとにタスク固有の非汎用モデルを使用する … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

投稿日: 2024年4月3日作成者: jarxiv

要約 3D シーンの寸法におけるスケールの曖昧さは、神経放射フィールドの体積密度 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Segment Any 3D Object with Language

投稿日: 2024年4月3日作成者: jarxiv

要約この論文では、自由形式の言語命令を使用した Open-Vocabulary … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年4月

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery

Iterated Learning Improves Compositionality in Large Vision-Language Models

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

Segment Any 3D Object with Language

最近の投稿

最近のコメント

アーカイブ

カテゴリー