月別アーカイブ: 2024年5月

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

投稿日: 2024年5月28日作成者: jarxiv

要約ワールドモデルはさまざまなアクションの結果を予測できますが、これは自動運転 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

投稿日: 2024年5月28日作成者: jarxiv

要約私たちは、トレーニング不要の拡散モデルのパーソナライゼーションのための新し … 続きを読む →

カテゴリー: cs.CV, cs.LG, stat.ML | コメントを受け付けていません

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

投稿日: 2024年5月28日作成者: jarxiv

要約我々は、任意の視点の下で単一の画像から高品質で時空間的に一貫した人間のビデ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

投稿日: 2024年5月28日作成者: jarxiv

要約最近、ビデオ生成に関する研究が大幅に進歩し、テキストプロンプトや画像から … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

投稿日: 2024年5月28日作成者: jarxiv

要約 $Q$ 学習アルゴリズムは、データ効率が高いため、現実世界のアプリケーショ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

投稿日: 2024年5月28日作成者: jarxiv

要約ロボット操作ポリシーは、新しいタスクやオブジェクトのインスタンスに直面した … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

投稿日: 2024年5月28日作成者: jarxiv

要約自動運転やロボット支援手術などの安全性が重要なアプリケーションに機械学習モ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

投稿日: 2024年5月28日作成者: jarxiv

要約野生で何気なく撮影された単眼ビデオから動的シーンの斬新なビューを再構成およ … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

投稿日: 2024年5月28日作成者: jarxiv

要約 3D 物体検出は、関連する物体の 3D 情報を復元することを目的としており … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Privacy-Aware Visual Language Models

投稿日: 2024年5月28日作成者: jarxiv

要約このホワイトペーパーは、ビジュアル言語モデル (VLM) がプライバシー … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年5月

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

Privacy-Aware Visual Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー