「cs.MM」カテゴリーアーカイブ

Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

投稿日: 2023年12月7日作成者: jarxiv

要約深度センサーの利用可能性が高まるにつれ、色情報と深度データを組み合わせるマ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

OneLLM: One Framework to Align All Modalities with Language

投稿日: 2023年12月7日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は、その強力なマルチモーダル … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

投稿日: 2023年12月6日作成者: jarxiv

要約ビデオダビングは、映画やテレビ番組の元の音声をターゲット言語の音声に翻訳 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.MM, eess.AS | コメントを受け付けていません

Rethinking Radiology Report Generation via Causal Reasoning and Counterfactual Augmentation

投稿日: 2023年12月6日作成者: jarxiv

要約 Radiology Report Generation (RRG) は、視 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

投稿日: 2023年12月6日作成者: jarxiv

要約広範な画像とテキストのペアデータを利用した事前トレーニング済み視覚言語モデ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts

投稿日: 2023年12月6日作成者: jarxiv

要約無人航空機 (UAV) をリアルタイムで検出するために、コンピュータービ … 続きを読む →

カテゴリー: cs.CV, cs.MM, eess.IV | コメントを受け付けていません

Rethinking Event-based Human Pose Estimation with 3D Event Representations

投稿日: 2023年12月4日作成者: jarxiv

要約人間の姿勢推定は、コンピュータビジョンにおける基本的かつ魅力的なタスクであ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO, eess.IV | コメントを受け付けていません

RTQ: Rethinking Video-language Understanding Based on Image-text Model

投稿日: 2023年12月4日作成者: jarxiv

要約ビデオ言語理解における最近の進歩は、画像-テキストモデルの基礎の上に確立さ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

投稿日: 2023年12月1日作成者: jarxiv

要約最近、ラージ言語モデル (LLM) の強力なテキスト作成能力により、論文の … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

Language Models as Black-Box Optimizers for Vision-Language Models

投稿日: 2023年12月1日作成者: jarxiv

要約 Web スケールのデータセットで事前トレーニングされたビジョン言語モデル … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

OneLLM: One Framework to Align All Modalities with Language

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Rethinking Radiology Report Generation via Causal Reasoning and Counterfactual Augmentation

RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts

Rethinking Event-based Human Pose Estimation with 3D Event Representations

RTQ: Rethinking Video-language Understanding Based on Image-text Model

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Language Models as Black-Box Optimizers for Vision-Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー