投稿者「jarxiv」のアーカイブ

Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution

投稿日: 2025年2月24日作成者: jarxiv

要約単一のRGB画像からの3DOF回転を推定することは、重要でありながら挑戦的 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision

投稿日: 2025年2月24日作成者: jarxiv

要約既存のビデオシーングラフ生成（VIDSGG）の研究は、完全に監視された方法 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

投稿日: 2025年2月24日作成者: jarxiv

要約視聴覚音声認識（AVSR）の最近の進歩により、この分野では前例のない成果が … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

投稿日: 2025年2月24日作成者: jarxiv

要約マルチモーダル言語モデル（MLMS）は、特定のアダプターを介してビジョンエ … 続きを読む →

カテゴリー: (Primary), 6804, cs.CV, I.2.10 | コメントを受け付けていません

Long Video Understanding with Learnable Retrieval in Video-Language Models

投稿日: 2025年2月24日作成者: jarxiv

要約大規模な言語モデル（LLM）の驚くべき自然言語の理解、推論、および生成能力 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations

投稿日: 2025年2月24日作成者: jarxiv

要約乳がん磁気共鳴画像法（MRI）の人工知能（AI）研究は、限られた専門家標識 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.DB | コメントを受け付けていません

The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting

投稿日: 2025年2月24日作成者: jarxiv

要約 Vision-Language Models（VLMS）は、入力画像と矛盾 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

投稿日: 2025年2月24日作成者: jarxiv

要約既存の最高パフォーマンスの自律駆動システムは、通常、信頼できるシーンの理解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Chitrarth: Bridging Vision and Language for a Billion People

投稿日: 2025年2月24日作成者: jarxiv

要約最近のマルチモーダルファンデーションモデルは、主に英語または高リソースのヨ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models

投稿日: 2025年2月24日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、ビデオ理解タスクで顕著なパフォー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Towards Robust Probabilistic Modeling on SO(3) via Rotation Laplace Distribution

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

Long Video Understanding with Learnable Retrieval in Video-Language Models

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations

The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

Chitrarth: Bridging Vision and Language for a Billion People

LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー