月別アーカイブ: 2024年7月

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

投稿日: 2024年7月18日作成者: jarxiv

要約 Controllable Image Captioning (CIC) は … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection

投稿日: 2024年7月18日作成者: jarxiv

要約ディープラーニングベースの半導体欠陥検査は近年注目を集めており、ナノスケー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model

投稿日: 2024年7月18日作成者: jarxiv

要約手と物体の物理的接触のモデル化は、不正確な手のポーズを修正し、3D 手と物 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models

投稿日: 2024年7月18日作成者: jarxiv

要約大規模マルチモーダルモデル (LMM) は、さまざまな視覚言語タスクを大 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

投稿日: 2024年7月18日作成者: jarxiv

要約オープンボキャブラリー 3D オブジェクト検出 (OV-3DDet) は、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

EchoSight: Advancing Visual-Language Models with Wiki Knowledge

投稿日: 2024年7月18日作成者: jarxiv

要約知識ベースのビジュアル質問応答 (KVQA) タスクでは、広範な背景知識を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

投稿日: 2024年7月18日作成者: jarxiv

要約ビジョントランスフォーマー (ViT) は、コンピュータービジョンへの … 続きを読む →

カテゴリー: cs.AI, cs.AR, cs.CV | コメントを受け付けていません

GroundUp: Rapid Sketch-Based 3D City Massing

投稿日: 2024年7月18日作成者: jarxiv

要約私たちは、都市部の 3D 都市集合体のための初のスケッチベースのアイデア作 … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding

投稿日: 2024年7月18日作成者: jarxiv

要約テキストから 3D への合成は、事前トレーニング済みのテキストから画像への … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LookupViT: Compressing visual information to a limited number of tokens

投稿日: 2024年7月18日作成者: jarxiv

要約ビジョントランスフォーマー (ViT) は、数多くの業界グレードのビジョ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年7月

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection

NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

EchoSight: Advancing Visual-Language Models with Wiki Knowledge

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

GroundUp: Rapid Sketch-Based 3D City Massing

DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding

LookupViT: Compressing visual information to a limited number of tokens

最近の投稿

最近のコメント

アーカイブ

カテゴリー