「cs.CV」カテゴリーアーカイブ

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

投稿日: 2025年2月24日作成者: jarxiv

要約視聴覚音声認識（AVSR）の最近の進歩により、この分野では前例のない成果が … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

投稿日: 2025年2月24日作成者: jarxiv

要約マルチモーダル言語モデル（MLMS）は、特定のアダプターを介してビジョンエ … 続きを読む →

カテゴリー: (Primary), 6804, cs.CV, I.2.10 | コメントを受け付けていません

Long Video Understanding with Learnable Retrieval in Video-Language Models

投稿日: 2025年2月24日作成者: jarxiv

要約大規模な言語モデル（LLM）の驚くべき自然言語の理解、推論、および生成能力 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations

投稿日: 2025年2月24日作成者: jarxiv

要約乳がん磁気共鳴画像法（MRI）の人工知能（AI）研究は、限られた専門家標識 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.DB | コメントを受け付けていません

The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting

投稿日: 2025年2月24日作成者: jarxiv

要約 Vision-Language Models（VLMS）は、入力画像と矛盾 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

投稿日: 2025年2月24日作成者: jarxiv

要約既存の最高パフォーマンスの自律駆動システムは、通常、信頼できるシーンの理解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Chitrarth: Bridging Vision and Language for a Billion People

投稿日: 2025年2月24日作成者: jarxiv

要約最近のマルチモーダルファンデーションモデルは、主に英語または高リソースのヨ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models

投稿日: 2025年2月24日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、ビデオ理解タスクで顕著なパフォー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Enhancing Vehicle Make and Model Recognition with 3D Attention Modules

投稿日: 2025年2月24日作成者: jarxiv

要約車両の製造およびモデル認識（VMMR）は、インテリジェント輸送システムの重 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

投稿日: 2025年2月24日作成者: jarxiv

要約ビデオモーメント検索（VMR）は、非トリムビデオのテキストクエリに対応する … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

Long Video Understanding with Learnable Retrieval in Video-Language Models

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations

The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Chitrarth: Bridging Vision and Language for a Billion People

LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models

Enhancing Vehicle Make and Model Recognition with 3D Attention Modules

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval

最近の投稿

最近のコメント

アーカイブ

カテゴリー