「cs.CV」カテゴリーアーカイブ

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

投稿日: 2024年12月18日作成者: jarxiv

要約この論文は、車両センサーデータからのフォトリアリスティックなビュー合成の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

投稿日: 2024年12月18日作成者: jarxiv

要約ニューラル 3D 再構成は大幅に進歩していますが、通常、慎重に初期化された … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MotionBridge: Dynamic Video Inbetweening with Flexible Controls

投稿日: 2024年12月18日作成者: jarxiv

要約 2 つの画像フレーム間で妥当かつスムーズなトランジションを生成することによ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

投稿日: 2024年12月18日作成者: jarxiv

要約 3D セマンティック占有予測は、周囲環境の包括的なセマンティック認識を提供 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

投稿日: 2024年12月18日作成者: jarxiv

要約テキストから画像への拡散モデルは、フォトリアリスティックな画像の生成には優 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

投稿日: 2024年12月18日作成者: jarxiv

要約デジタル世界におけるインターネット閲覧エージェントや、物理世界における家庭 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Causal Diffusion Transformers for Generative Modeling

投稿日: 2024年12月18日作成者: jarxiv

要約拡散モデルの自己回帰 (AR) 対応物として因果拡散を導入します。これは … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

投稿日: 2024年12月18日作成者: jarxiv

要約従来の強化学習ベースのロボット制御手法はタスク固有であることが多く、多様な … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.RO | コメントを受け付けていません

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

投稿日: 2024年12月18日作成者: jarxiv

要約この論文では、2D CAD 図面から 3D パラメトリックモデルを再構成 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

投稿日: 2024年12月17日作成者: jarxiv

要約コンパクトで有益な 3D シーン表現を構築することは、特に長期間にわたる複 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

MotionBridge: Dynamic Video Inbetweening with Flexible Controls

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Causal Diffusion Transformers for Generative Modeling

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー