月別アーカイブ: 2025年4月

SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting

投稿日: 2025年4月15日作成者: jarxiv

要約きめ細かい3Dベースのインタラクティブ編集の重要な課題は、特定のメモリ制約 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

投稿日: 2025年4月15日作成者: jarxiv

要約大規模な事前訓練を受けた画像から3Dの生成モデルは、多様な形状の世代に顕著 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MIEB: Massive Image Embedding Benchmark

投稿日: 2025年4月15日作成者: jarxiv

要約画像表現は、多くの場合、見返りのあるタスク固有のプロトコルによって評価され … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

投稿日: 2025年4月15日作成者: jarxiv

要約 Native Multimodal Pre-Trainingパラダイムを備 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

投稿日: 2025年4月15日作成者: jarxiv

要約この論文では、基本的な質問に取り組んでいます。「潜在的な拡散モデルと、変分 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Decoupled Diffusion Sparks Adaptive Scene Generation

投稿日: 2025年4月15日作成者: jarxiv

要約制御可能なシーンの生成は、自律運転のために多様なデータ収集のコストを大幅に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting

投稿日: 2025年4月15日作成者: jarxiv

要約 Monocular Videosから信頼できるアニメーション可能な人間のア … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

投稿日: 2025年4月15日作成者: jarxiv

要約最近のオープンボキャブラリーセマンティックセグメンテーション（OVSS）モ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments

投稿日: 2025年4月15日作成者: jarxiv

要約制約された環境での航空操作者のモーション計画は、通常、既知の環境に限定 … 続きを読む →

カテゴリー: cs.RO | コメントを受け付けていません

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

投稿日: 2025年4月15日作成者: jarxiv

要約マルチモーダルLLMS（MLLM）を使用してシステムを提示して、時間的変化 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.CY | コメントを受け付けていません

月別アーカイブ: 2025年4月

SplatMesh: Interactive 3D Segmentation and Editing Using Mesh-Based Gaussian Splatting

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

MIEB: Massive Image Embedding Benchmark

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Decoupled Diffusion Sparks Adaptive Scene Generation

DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

最近の投稿

最近のコメント

アーカイブ

カテゴリー