「cs.AI」カテゴリーアーカイブ

WordVIS: A Color Worth A Thousand Words

投稿日: 2024年12月16日作成者: jarxiv

要約文書の分類は、自動化された文書処理システムにおける重要な要素と考えられてい … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models

投稿日: 2024年12月16日作成者: jarxiv

要約人と新しい衣服の入力ビデオが与えられた場合、この論文の目的は、時空間の一貫 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Multi-Head Encoding for Extreme Label Classification

投稿日: 2024年12月16日作成者: jarxiv

要約現実世界のインスタンスのカテゴリの数は通常膨大であり、各インスタンスには複 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

投稿日: 2024年12月16日作成者: jarxiv

要約私たちは、スマートフォンなどの汎用デバイスでキャプチャされた単眼ビデオから … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

投稿日: 2024年12月16日作成者: jarxiv

要約ここでは、大規模な専門家混合 (MoE) ビジョン言語モデルの高度なシリー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

BrushEdit: All-In-One Image Inpainting and Editing

投稿日: 2024年12月16日作成者: jarxiv

要約画像編集は、反転ベースの方法と命令ベースの方法の両方を使用した拡散モデルの … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

投稿日: 2024年12月16日作成者: jarxiv

要約 Web ページ、ソフトウェアアプリケーション、オペレーティングシステム … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A dual contrastive framework

投稿日: 2024年12月16日作成者: jarxiv

要約現在のマルチモーダルタスクでは、モデルは通常、領域キャプションなどのタス … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Apollo: An Exploration of Video Understanding in Large Multimodal Models

投稿日: 2024年12月16日作成者: jarxiv

要約ビデオ認識機能は大規模マルチモーダルモデル (LMM) に急速に統合され … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

投稿日: 2024年12月16日作成者: jarxiv

要約ビジョンベースの自動運転は、その満足のいく性能と低コストにより、大きな可能 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

WordVIS: A Color Worth A Thousand Words

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models

Multi-Head Encoding for Extreme Label Classification

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

BrushEdit: All-In-One Image Inpainting and Editing

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

A dual contrastive framework

Apollo: An Exploration of Video Understanding in Large Multimodal Models

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

最近の投稿

最近のコメント

アーカイブ

カテゴリー