月別アーカイブ: 2024年4月

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

投稿日: 2024年4月9日作成者: jarxiv

要約 3D 対応の敵対的生成ネットワーク (GAN) の最近の進歩は、ほぼ正面か … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Retrieval-Augmented Open-Vocabulary Object Detection

投稿日: 2024年4月9日作成者: jarxiv

要約オープン語彙オブジェクト検出 (OVD) は、事前トレーニングされたカテゴ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery

投稿日: 2024年4月9日作成者: jarxiv

要約衛星画像は、環境モニタリングや都市計画などのタスクにとって非常に重要です。 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

投稿日: 2024年4月9日作成者: jarxiv

要約トレーニング画像の正確なカメラポーズを収集することは、3D 対応の敵対的生 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Energy-Calibrated VAE with Test Time Free Lunch

投稿日: 2024年4月9日作成者: jarxiv

要約この論文では、変分オートエンコーダ (VAE) を強化するために条件付きエ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

投稿日: 2024年4月9日作成者: jarxiv

要約個人コンテンツの効果的な編集は、個人が創造性を表現し、視覚的なストーリーの … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

投稿日: 2024年4月9日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の最近の進歩は注目に値します … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.HC | コメントを受け付けていません

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

投稿日: 2024年4月9日作成者: jarxiv

要約大規模言語モデル (LLM) の成功により、ビジョンモデルを LLM に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Finding Visual Task Vectors

投稿日: 2024年4月9日作成者: jarxiv

要約視覚的なプロンプトは、追加のトレーニングを行わずに、コンテキスト内の例を通 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

投稿日: 2024年4月9日作成者: jarxiv

要約この研究では、LLM の開発において中国語を優先するという極めて重要な移行 … 続きを読む →

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

月別アーカイブ: 2024年4月

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Retrieval-Augmented Open-Vocabulary Object Detection

Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

Energy-Calibrated VAE with Test Time Free Lunch

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Finding Visual Task Vectors

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

最近の投稿

最近のコメント

アーカイブ

カテゴリー