「cs.CV」カテゴリーアーカイブ

Spatio-Temporal Context Prompting for Zero-Shot Action Detection

投稿日: 2024年8月29日作成者: jarxiv

要約時空間アクションの検出には、ビデオ内の個々のアクションの位置を特定し、分類 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

投稿日: 2024年8月29日作成者: jarxiv

要約複雑な視覚情報を正確に解釈する機能は、マルチモーダル大規模言語モデル (M … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

投稿日: 2024年8月29日作成者: jarxiv

要約テキストリッチ文書理解 (TDU) とは、実質的なテキストコンテンツを含む … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

投稿日: 2024年8月29日作成者: jarxiv

要約既存のマルチモーダル顕著物体検出 (SOD) 手法のほとんどは、モデルを最 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling

投稿日: 2024年8月29日作成者: jarxiv

要約この研究論文では、デジタルウルドゥー語テキスト用に特別に開発された新しい … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

投稿日: 2024年8月29日作成者: jarxiv

要約さまざまな非記号データ (画像やビデオなど) をシンボルにまとめるマルチモ … 続きを読む →

カテゴリー: 68T30, cs.AI, cs.CL, cs.CV, I.2.4 | コメントを受け付けていません

A Neurosymbolic Approach to Adaptive Feature Extraction in SLAM

投稿日: 2024年8月28日作成者: jarxiv

要約自律ロボット、自律車両、複合現実ヘッドセットを装着した人間には、動的に変化 … 続きを読む →

カテゴリー: cs.CV, cs.RO, cs.SC | コメントを受け付けていません

Depth Restoration of Hand-Held Transparent Objects for Human-to-Robot Handover

投稿日: 2024年8月28日作成者: jarxiv

要約透明なオブジェクトは日常生活でよく使われますが、その独特の光学特性は、正確 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

投稿日: 2024年8月28日作成者: jarxiv

要約世界中で聴覚障害者の人口が増加しており、認定手話通訳者の不足が続いているた … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, I.2, I.2.7, I.4, I.4.9 | コメントを受け付けていません

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

投稿日: 2024年8月28日作成者: jarxiv

要約さまざまな非記号データ (画像やビデオなど) をシンボルにまとめるマルチモ … 続きを読む →

カテゴリー: 68T30, cs.AI, cs.CL, cs.CV, I.2.4 | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Spatio-Temporal Context Prompting for Zero-Shot Action Detection

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

Urdu Digital Text Word Optical Character Recognition Using Permuted Auto Regressive Sequence Modeling

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

A Neurosymbolic Approach to Adaptive Feature Extraction in SLAM

Depth Restoration of Hand-Held Transparent Objects for Human-to-Robot Handover

From Rule-Based Models to Deep Learning Transformers Architectures for Natural Language Processing and Sign Language Translation Systems: Survey, Taxonomy and Performance Evaluation

VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily Activities

最近の投稿

最近のコメント

アーカイブ

カテゴリー