月別アーカイブ: 2025年2月

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

投稿日: 2025年2月21日作成者: jarxiv

要約このホワイトペーパーでは、DC（Decouple）-Controlnetを … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting

投稿日: 2025年2月21日作成者: jarxiv

要約 AR、VR、および強力なカメラを備えた最新のスマートフォンが人間コンピュー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

投稿日: 2025年2月21日作成者: jarxiv

要約 3D大手言語モデル（3DLLMS）の最近の進歩は、3D現実世界の汎用エージ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

投稿日: 2025年2月21日作成者: jarxiv

要約元のSiglipの成功に基づいて構築された新しい多言語ビジョン言語エンコー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

投稿日: 2025年2月21日作成者: jarxiv

要約最近の作業により、大規模な訓練を受けた2Dモデルを使用して得られた事前に訓 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird’s Eye View Segmentation

投稿日: 2025年2月21日作成者: jarxiv

要約 Bird’s Eye View（BEV）セマンティックマップは … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

投稿日: 2025年2月21日作成者: jarxiv

要約このペーパーでは、現在のヒューマノイドロボット制御フレームワークの制限に対 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

A Survey on Text-Driven 360-Degree Panorama Generation

投稿日: 2025年2月21日作成者: jarxiv

要約テキスト駆動型の360度のパノラマ生成の出現は、テキストの説明から直接36 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

AVD2: Accident Video Diffusion for Accident Video Description

投稿日: 2025年2月21日作成者: jarxiv

要約交通事故は、自律運転の複雑な課題を提示し、しばしば正確なシステムの解釈と応 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

投稿日: 2025年2月21日作成者: jarxiv

要約基礎モデルは、医療ドメインでますます効果的になりつつあり、下流のタスクに容 … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

月別アーカイブ: 2025年2月

DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing

RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird’s Eye View Segmentation

Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

A Survey on Text-Driven 360-Degree Panorama Generation

AVD2: Accident Video Diffusion for Accident Video Description

FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

最近の投稿

最近のコメント

アーカイブ

カテゴリー