月別アーカイブ: 2024年5月

Don’t drop your samples! Coherence-aware training benefits Conditional diffusion

投稿日: 2024年5月31日作成者: jarxiv

要約条件付き拡散モデルは、クラスラベル、セグメンテーションマスク、テキスト … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

投稿日: 2024年5月31日作成者: jarxiv

要約ビデオ属性の変更における拡散ベースのビデオ編集モデルの目覚ましい進歩にもか … 続きを読む →

カテゴリー: 68T10, 68T45, cs.CV | コメントを受け付けていません

GECO: Generative Image-to-3D within a SECOnd

投稿日: 2024年5月31日作成者: jarxiv

要約近年、3D世代の進歩は目覚ましいものがあります。スコア蒸留法などの既存の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

4DHands: Reconstructing Interactive Hands in 4D with Transformers

投稿日: 2024年5月31日作成者: jarxiv

要約この論文では、インタラクティブなハンドメッシュとその相対的な動きを単眼入 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

投稿日: 2024年5月31日作成者: jarxiv

要約コンピュータ支援介入を成功させるには、ツールの正確な追跡が不可欠です。こ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VividDream: Generating 3D Scene with Ambient Dynamics

投稿日: 2024年5月31日作成者: jarxiv

要約単一の入力画像またはテキストプロンプトからアンビエントダイナミクスを備 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

投稿日: 2024年5月31日作成者: jarxiv

要約この作品では、テキストの歌詞入力から直接 3D の全体的な体の動きを生成し … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

投稿日: 2024年5月31日作成者: jarxiv

要約効果的な自動運転には、3D シーンの進化を理解することが重要です。従来の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Visual Perception by Large Language Model’s Weights

投稿日: 2024年5月31日作成者: jarxiv

要約既存のマルチモーダル大規模言語モデル (MLLM) は、視覚特徴を大規模言 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

投稿日: 2024年5月31日作成者: jarxiv

要約この研究では、大規模言語モデル (LLM) の強力な機能を活用して、マルチ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年5月

Don’t drop your samples! Coherence-aware training benefits Conditional diffusion

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

GECO: Generative Image-to-3D within a SECOnd

4DHands: Reconstructing Interactive Hands in 4D with Transformers

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

VividDream: Generating 3D Scene with Ambient Dynamics

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

Visual Perception by Large Language Model’s Weights

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

最近の投稿

最近のコメント

アーカイブ

カテゴリー