月別アーカイブ: 2024年3月

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

投稿日: 2024年3月8日作成者: jarxiv

要約近隣注目は、各トークンの注目範囲をその最も近い隣接トークンに制限することで … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

投稿日: 2024年3月8日作成者: jarxiv

要約本稿では、4K 解像度の画像を直接生成できる拡散変換モデル (DiT) で … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data

投稿日: 2024年3月8日作成者: jarxiv

要約物質とその状態を視覚的に理解してセグメント化することは、物理世界を理解する … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

投稿日: 2024年3月8日作成者: jarxiv

要約フェイシャルアクションユニット (AU) は、感情コンピューティングの … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

投稿日: 2024年3月8日作成者: jarxiv

要約複数オブジェクト追跡 (MOT) は、コンピュータビジョンの重要な領域で … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

投稿日: 2024年3月8日作成者: jarxiv

要約最近のビジョンベースのモデルの大規模なマルチモーダルトレーニングとその汎 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices

投稿日: 2024年3月8日作成者: jarxiv

要約抽象的な推論の実行を学習するには、多くの場合、問題のタスクを中間のサブ目標 … 続きを読む →

カテゴリー: 68T05, cs.AI, cs.CV, cs.LG, I.2.10 | コメントを受け付けていません

Masked Capsule Autoencoders

投稿日: 2024年3月8日作成者: jarxiv

要約私たちは、自己教師ありの方法で事前トレーニングを利用する最初のカプセルネ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VeCLIP: Improving CLIP Training via Visual-enriched Captions

投稿日: 2024年3月8日作成者: jarxiv

要約 Web クロールされた大規模なデータセットは、CLIP などの視覚言語モデ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

How Far Are We from Intelligent Visual Deductive Reasoning?

投稿日: 2024年3月8日作成者: jarxiv

要約 GPT-4V などの視覚言語モデル (VLM) は、最近、多様な視覚言語タ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年3月

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices

Masked Capsule Autoencoders

VeCLIP: Improving CLIP Training via Visual-enriched Captions

How Far Are We from Intelligent Visual Deductive Reasoning?

最近の投稿

最近のコメント

アーカイブ

カテゴリー