「cs.CV」カテゴリーアーカイブ

Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

投稿日: 2025年4月7日作成者: jarxiv

要約これは、均質化された標準的な体型を学習することが容易なため、既存のテキスト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

投稿日: 2025年4月7日作成者: jarxiv

要約汎用的な協調エージェントを開発するためには、(1)新しいドメインに適応でき … 続きを読む →

カテゴリー: 68T37, 68T50, cs.AI, cs.CL, cs.CV, I.2.7 | コメントを受け付けていません

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

投稿日: 2025年4月7日作成者: jarxiv

要約既存のMLLMベンチマークは、Unified MLLM（U-MLLM）の評 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

投稿日: 2025年4月7日作成者: jarxiv

要約トーキングヘッド合成は、バーチャルアバターや人間とコンピュータのインタラク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

投稿日: 2025年4月7日作成者: jarxiv

要約非言語的コミュニケーションは、発話の意味を伝えるのに役立つ意味豊かなジェス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

投稿日: 2025年4月7日作成者: jarxiv

要約強化学習(RL)は、近年、大規模言語モデルの推論能力を向上させる強い可能性 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

投稿日: 2025年4月4日作成者: jarxiv

要約ビジュアル・ナラティブ生成は、テキスト・ナラティブを、テキストの内容を説明 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

投稿日: 2025年4月4日作成者: jarxiv

要約トーキングヘッド合成は、バーチャルアバターや人間とコンピュータのインタラク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MAD: Makeup All-in-One with Cross-Domain Diffusion Model

投稿日: 2025年4月4日作成者: jarxiv

要約既存のメイクアップ技術では、異なる入力を処理するために複数のモデルを設計し … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement

投稿日: 2025年4月4日作成者: jarxiv

要約走査型透過電子顕微鏡（STEM）は、原子配列をサブオングストローム分解能で … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

MAD: Makeup All-in-One with Cross-Domain Diffusion Model

Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement

最近の投稿

最近のコメント

アーカイブ

カテゴリー