「cs.AI」カテゴリーアーカイブ

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

投稿日: 2024年11月26日作成者: jarxiv

要約空間理解は、ロボットが環境に基づいて根拠のある意思決定を行うための重要な能 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.RO | コメントを受け付けていません

A Review of Mechanistic Models of Event Comprehension

投稿日: 2024年11月26日作成者: jarxiv

要約このレビューでは、談話理解理論から現代の出来事認識フレームワークへの進化を … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

投稿日: 2024年11月26日作成者: jarxiv

要約 CLIP のようなマルチモーダルエンコーダは、ゼロショット画像分類やクロ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.IR, cs.LG | コメントを受け付けていません

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

投稿日: 2024年11月26日作成者: jarxiv

要約新しく提案された Generalized Referring Expres … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval

投稿日: 2024年11月26日作成者: jarxiv

要約テキストから画像への人物検索 (TIPR) の目的は、指定されたテキストの … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Imperceptible Adversarial Examples in the Physical World

投稿日: 2024年11月26日作成者: jarxiv

要約ディープラーニングベースのコンピュータービジョンモデルに対するデジタルドメ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Word4Per: Zero-shot Composed Person Retrieval

投稿日: 2024年11月26日作成者: jarxiv

要約特定の人物の検索には大きな社会的利点とセキュリティ上の価値があり、多くの場 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.IR | コメントを受け付けていません

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

投稿日: 2024年11月26日作成者: jarxiv

要約この研究では、最大 2,560$\times$2,560 の解像度で画像を … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

投稿日: 2024年11月26日作成者: jarxiv

要約ストーリーテリングビデオ生成 (SVG) は、入力テキストスクリプトで … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

OminiControl: Minimal and Universal Control for Diffusion Transformer

投稿日: 2024年11月26日作成者: jarxiv

要約このペーパーでは、画像条件を事前トレーニング済みの拡散変換 (DiT) モ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

A Review of Mechanistic Models of Event Comprehension

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval

Imperceptible Adversarial Examples in the Physical World

Word4Per: Zero-shot Composed Person Retrieval

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

OminiControl: Minimal and Universal Control for Diffusion Transformer

最近の投稿

最近のコメント

アーカイブ

カテゴリー