「cs.CV」カテゴリーアーカイブ

Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration

投稿日: 2024年12月31日作成者: jarxiv

要約ブラインドフェイス復元は、さまざまな未確認の劣化源から高品質の顔画像を復元 … 続きを読む →

カテゴリー: 68U10, cs.CV, cs.MM, I.4.3 | コメントを受け付けていません

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

投稿日: 2024年12月31日作成者: jarxiv

要約拡散モデルは、生成モデリングの強力なフレームワークとして登場し、さまざまな … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

投稿日: 2024年12月31日作成者: jarxiv

要約我々は、視覚生成モデル (画像生成とビデオ生成の両方) を人間の好みに合わ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Varformer: Adapting VAR’s Generative Prior for Image Restoration

投稿日: 2024年12月31日作成者: jarxiv

要約広範な高品質データセットでトレーニングされた生成モデルは、きれいな画像の構 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Edicho: Consistent Image Editing in the Wild

投稿日: 2024年12月31日作成者: jarxiv

要約ニーズが実証されているため、実際の画像全体で一貫した編集を行うことは、オブ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

投稿日: 2024年12月31日作成者: jarxiv

要約私たちは、自己中心的な視覚言語モデルに基づいて構築されたリアルタイムの具体 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

投稿日: 2024年12月31日作成者: jarxiv

要約この研究では、オブジェクトレベルとシーンレベルの両方で数秒でテキストか … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

What Makes for a Good Stereoscopic Image?

投稿日: 2024年12月31日作成者: jarxiv

要約仮想現実 (VR) ヘッドセットの急速な進歩により、没入型で快適な 3D … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Large-Scale Study on Video Action Dataset Condensation

投稿日: 2024年12月31日作成者: jarxiv

要約データセットの圧縮は、画像領域で大幅に進歩しました。画像とは異なり、ビデ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Action-Agnostic Point-Level Supervision for Temporal Action Detection

投稿日: 2024年12月31日作成者: jarxiv

要約軽く注釈を付けたデータセットで正確なアクションインスタンスの検出を実現する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Varformer: Adapting VAR’s Generative Prior for Image Restoration

Edicho: Consistent Image Editing in the Wild

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

What Makes for a Good Stereoscopic Image?

A Large-Scale Study on Video Action Dataset Condensation

Action-Agnostic Point-Level Supervision for Temporal Action Detection

最近の投稿

最近のコメント

アーカイブ

カテゴリー