月別アーカイブ: 2023年5月

Self-Chained Image-Language Model for Video Localization and Question Answering

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：ビデオのローカリゼーションと質問回答のためのセルフチェーンドイメ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：Vision Transformerを用いた開放ボキャブラリー物 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Virtual Occlusions Through Implicit Depth

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル: 暗黙的な深度による仮想遮蔽要約: – 拡張現実（ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Exploiting Diffusion Prior for Real-World Image Super-Resolution

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：実世界の画像スーパーレゾリューションのための拡散事前知識の活用 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

An Inverse Scaling Law for CLIP Training

投稿日: 2023年5月12日作成者: jarxiv

要約【タイトル】CLIPトレーニングにおける逆スケーリング則【要約】 &#8 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：Musketeer（一方の為に、そして全ての為に）：タスク説明提 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Simple Token-Level Confidence Improves Caption Correctness

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：トークンレベル信頼度の単純な改善がキャプションの正確性を向上させ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SparseGNV: Generating Novel Views of Indoor Scenes with Sparse Input Views

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：SparseGNV：疎な入力視点で室内シーンの新しい視点を生成す … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Decentralization and Acceleration Enables Large-Scale Bundle Adjustment

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル：Decentralization and Acceleratio … 続きを読む →

カテゴリー: cs.CV, cs.RO, math.OC | コメントを受け付けていません

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

投稿日: 2023年5月12日作成者: jarxiv

要約タイトル: Cascade Group Attentionを用いたメモリー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2023年5月

Self-Chained Image-Language Model for Video Localization and Question Answering

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Virtual Occlusions Through Implicit Depth

Exploiting Diffusion Prior for Real-World Image Super-Resolution

An Inverse Scaling Law for CLIP Training

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

Simple Token-Level Confidence Improves Caption Correctness

SparseGNV: Generating Novel Views of Indoor Scenes with Sparse Input Views

Decentralization and Acceleration Enables Large-Scale Bundle Adjustment

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

最近の投稿

最近のコメント

アーカイブ

カテゴリー