「cs.CV」カテゴリーアーカイブ

HoloPart: Generative 3D Part Amodal Segmentation

投稿日: 2025年4月11日作成者: jarxiv

要約 3D部品のアモーダルセグメンテーション – 3D形状を完全で意 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GenEAva: Generating Cartoon Avatars with Fine-Grained Facial Expressions from Realistic Diffusion-based Faces

投稿日: 2025年4月11日作成者: jarxiv

要約漫画のアバターは、ソーシャルメディア、オンラインチューター、ゲームなど、さ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Taming Data and Transformers for Scalable Audio Generation

投稿日: 2025年4月11日作成者: jarxiv

要約アンビエントサウンドジェネレーターのスケーラビリティは、データ不足、キャプ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

投稿日: 2025年4月11日作成者: jarxiv

要約デジタルアバターのコミュニティからの関心が高まっているため、コミュニケーシ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

投稿日: 2025年4月11日作成者: jarxiv

要約マルチモーダル信号を通じて世界を効果的に知覚できる汎用モデルの構築は、長年 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

投稿日: 2025年4月11日作成者: jarxiv

要約 DeepSeek-R1の成功に触発されて、知覚政策学習のためのトレーニング … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

投稿日: 2025年4月11日作成者: jarxiv

要約このペーパーでは、オブジェクトポーズ推定のための一般化可能なRGBベースの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

投稿日: 2025年4月11日作成者: jarxiv

要約考え方（COT）の推論の進歩により、大規模な言語モデル（LLMS）と大規模 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

MM-IFEngine: Towards Multimodal Instruction Following

投稿日: 2025年4月11日作成者: jarxiv

要約次の（IF）能力は、マルチモーダルの大手言語モデル（MLLM）がどの程度よ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Detect Anything 3D in the Wild

投稿日: 2025年4月11日作成者: jarxiv

要約密集した3Dオブジェクトの検出における深い学習の成功にもかかわらず、既存の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

HoloPart: Generative 3D Part Amodal Segmentation

GenEAva: Generating Cartoon Avatars with Fine-Grained Facial Expressions from Realistic Diffusion-based Faces

Taming Data and Transformers for Scalable Audio Generation

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

MM-IFEngine: Towards Multimodal Instruction Following

Detect Anything 3D in the Wild

最近の投稿

最近のコメント

アーカイブ

カテゴリー