月別アーカイブ: 2024年4月

V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions

投稿日: 2024年4月1日作成者: jarxiv

要約現在の LiDAR ベースの Vehicle-to-Everything … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs

投稿日: 2024年4月1日作成者: jarxiv

要約 YOSO は、迅速かつスケーラブルで忠実度の高いワンステップ画像合成用に設 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Rethinking Multi-view Representation Learning via Distilled Disentangling

投稿日: 2024年4月1日作成者: jarxiv

要約マルチビュー表現学習の目的は、多様なデータソースからビューの一貫性とビュ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

投稿日: 2024年4月1日作成者: jarxiv

要約一般的な大規模視覚言語モデル (VLM) は急速に開発されていますが、リモ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Self-learning Canonical Space for Multi-view 3D Human Pose Estimation

投稿日: 2024年4月1日作成者: jarxiv

要約マルチビュー 3D 人間の姿勢推定は、当然ながら単一ビューの推定よりも優れ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DragVideo: Interactive Drag-style Video Editing

投稿日: 2024年4月1日作成者: jarxiv

要約ビデオ生成モデルは、写真のようにリアルなビデオを生成する優れた能力を示して … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

投稿日: 2024年4月1日作成者: jarxiv

要約マルチターゲットマルチカメラ追跡は、複数のカメラからのビデオストリーム … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

3DInAction: Understanding Human Actions in 3D Point Clouds

投稿日: 2024年4月1日作成者: jarxiv

要約我々は、3D点群アクション認識のための新しい方法を提案します。 RGB ビ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation

投稿日: 2024年4月1日作成者: jarxiv

要約コンセプトのパーソナライゼーション手法により、大規模なテキストから画像への … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Long-Tailed Anomaly Detection with Learnable Class Names

投稿日: 2024年4月1日作成者: jarxiv

要約異常検出 (AD) は、欠陥のある画像を特定し、その欠陥 (存在する場合) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年4月

V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions

You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs

Rethinking Multi-view Representation Learning via Distilled Disentangling

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

Self-learning Canonical Space for Multi-view 3D Human Pose Estimation

DragVideo: Interactive Drag-style Video Editing

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

3DInAction: Understanding Human Actions in 3D Point Clouds

U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation

Long-Tailed Anomaly Detection with Learnable Class Names

最近の投稿

最近のコメント

アーカイブ

カテゴリー