「cs.MM」カテゴリーアーカイブ

Evaluating Image Review Ability of Vision Language Models

投稿日: 2024年2月20日作成者: jarxiv

要約大規模ビジョン言語モデル (LVLM) は、単一のモデルで画像とテキスト入 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Unified Hallucination Detection for Multimodal Large Language Models

投稿日: 2024年2月19日作成者: jarxiv

要約マルチモーダルタスクの大幅な進歩にも関わらず、マルチモーダル大規模言語モ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IR, cs.LG, cs.MM | コメントを受け付けていません

Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond

投稿日: 2024年2月19日作成者: jarxiv

要約生成言語モデルの最近の進歩により、文書から知識を記憶し、知識を思い出してユ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.IR, cs.MM | コメントを受け付けていません

UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction

投稿日: 2024年2月16日作成者: jarxiv

要約マルチカメラ 3D 認識は、自動運転における著名な研究分野として浮上してお … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO | コメントを受け付けていません

Lester: rotoscope animation through video object segmentation and tracking

投稿日: 2024年2月16日作成者: jarxiv

要約この記事では、ビデオからレトロスタイルの 2D アニメーションを自動的に合 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.MM | コメントを受け付けていません

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

投稿日: 2024年2月16日作成者: jarxiv

要約知覚では、複数の感覚情報が統合されて、2D ビューからの視覚情報が 3D … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

投稿日: 2024年2月15日作成者: jarxiv

要約我々は、プロトタイプ学習に基づいた音楽オーディオ分類の解釈可能なモデル P … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning

投稿日: 2024年2月15日作成者: jarxiv

要約近年、ライブビデオストリーミング用のアダプティブビットレート (AB … 続きを読む →

カテゴリー: cs.AI, cs.MM | コメントを受け付けていません

Customizable Perturbation Synthesis for Robust SLAM Benchmarking

投稿日: 2024年2月14日作成者: jarxiv

要約堅牢性は、非構造化環境、特に同時ローカリゼーションとマッピング (SLAM … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

Test-Time Backdoor Attacks on Multimodal Large Language Models

投稿日: 2024年2月14日作成者: jarxiv

要約バックドア攻撃は通常、トレーニングデータを汚染することによって実行され、 … 続きを読む →

カテゴリー: cs.CL, cs.CR, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Evaluating Image Review Ability of Vision Language Models

Unified Hallucination Detection for Multimodal Large Language Models

Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond

UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction

Lester: rotoscope animation through video object segmentation and tracking

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning

Customizable Perturbation Synthesis for Robust SLAM Benchmarking

Test-Time Backdoor Attacks on Multimodal Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー