月別アーカイブ: 2022年8月

Retrieval-Augmented Transformer for Image Captioning

投稿日: 2022年8月23日作成者: jarxiv

要約画像キャプションモデルは、入力画像の自然言語による説明を提供することで、 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Revising Image-Text Retrieval via Multi-Modal Entailment

投稿日: 2022年8月23日作成者: jarxiv

要約優れた画像テキスト検索モデルは、高品質のラベル付きデータに依存しています。 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

投稿日: 2022年8月23日作成者: jarxiv

要約マッチングベースの方法、特に時空間メモリに基づく方法は、半教師付きビデオ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Rethinking Knowledge Distillation via Cross-Entropy

投稿日: 2022年8月23日作成者: jarxiv

要約 Knowledge Distillation (KD) は広範囲に開発され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unsupervised Prompt Learning for Vision-Language Models

投稿日: 2022年8月23日作成者: jarxiv

要約 CLIP のような対照的な視覚言語モデルは、転移学習において大きな進歩を遂 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

投稿日: 2022年8月23日作成者: jarxiv

要約マルチビュー 3D オブジェクト検出には、正確な深度を学習することが不可欠 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Meta-Causal Feature Learning for Out-of-Distribution Generalization

投稿日: 2022年8月23日作成者: jarxiv

要約因果推論は、不変の特徴を抽出することを目的とした分布外 (OOD) 一般化 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types

投稿日: 2022年8月23日作成者: jarxiv

要約ヘッドマウントデバイスで撮影された目の画像の世界最大の統合公開データセ … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Prompt-Matched Semantic Segmentation

投稿日: 2022年8月23日作成者: jarxiv

要約この作業の目的は、事前トレーニング済みの基盤モデルを、画像セマンティック … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Non-generative Generalized Zero-shot Learning via Task-correlated Disentanglement and Controllable Samples Synthesis

投稿日: 2022年8月23日作成者: jarxiv

要約現在、疑似サンプルの合成は、一般化ゼロショット学習 (GZSL) 問題を解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2022年8月

Retrieval-Augmented Transformer for Image Captioning

Revising Image-Text Retrieval via Multi-Modal Entailment

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

Rethinking Knowledge Distillation via Cross-Entropy

Unsupervised Prompt Learning for Vision-Language Models

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

Meta-Causal Feature Learning for Out-of-Distribution Generalization

TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types

Prompt-Matched Semantic Segmentation

Non-generative Generalized Zero-shot Learning via Task-correlated Disentanglement and Controllable Samples Synthesis

最近の投稿

最近のコメント

アーカイブ

カテゴリー