月別アーカイブ: 2023年6月

‘Let’s not Quote out of Context’: Unified Vision-Language Pretraining for Context Assisted Image Captioning

投稿日: 2023年6月2日作成者: jarxiv

要約マーケティング資料などの企業コンテンツ内の適切な形式のコンテキスト認識型画 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Train Offline, Test Online: A Real Robot Learning Benchmark

投稿日: 2023年6月2日作成者: jarxiv

要約 3 つの課題がロボット学習研究の進歩を制限しています。ロボットは高価である … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

投稿日: 2023年6月2日作成者: jarxiv

要約私たちの想像の中の出来事やシナリオから鮮やかなビデオを作成することは、本当 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Differential Diffusion: Giving Each Pixel Its Strength

投稿日: 2023年6月2日作成者: jarxiv

要約近年、テキストベースの画像編集が大幅に進歩しました。普及モデルの台頭によ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG, I.3.3 | コメントを受け付けていません

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

投稿日: 2023年6月2日作成者: jarxiv

要約 ObjectFolder Benchmark は、視覚、聴覚、触覚によるオ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.HC, cs.RO | コメントを受け付けていません

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

投稿日: 2023年6月2日作成者: jarxiv

要約テキスト条件付き拡散モデルは、多様なコンテンツを含む忠実度の高い画像を生成 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image

投稿日: 2023年6月2日作成者: jarxiv

要約単一の画像から 3D シーンを理解してモデリングすることは実際的な問題です … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

The Hidden Language of Diffusion Models

投稿日: 2023年6月2日作成者: jarxiv

要約テキストから画像への拡散モデルは、テキストの概念 (「医師」、「愛」など) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GRES: Generalized Referring Expression Segmentation

投稿日: 2023年6月2日作成者: jarxiv

要約参照式セグメンテーション (RES) は、指定された言語式で記述されたオブ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

投稿日: 2023年6月2日作成者: jarxiv

要約最近、拡散モデルを使用したパーソナライズされたテキストから画像への生成が提 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

月別アーカイブ: 2023年6月

‘Let’s not Quote out of Context’: Unified Vision-Language Pretraining for Context Assisted Image Captioning

Train Offline, Test Online: A Real Robot Learning Benchmark

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Differential Diffusion: Giving Each Pixel Its Strength

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image

The Hidden Language of Diffusion Models

GRES: Generalized Referring Expression Segmentation

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー