「cs.MM」カテゴリーアーカイブ

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

投稿日: 2024年1月25日作成者: jarxiv

要約音声感情認識 (SER) における一般的なアプローチには、音声情報とテキス … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Modularized Zero-shot VQA with Pre-trained Models

投稿日: 2024年1月25日作成者: jarxiv

要約大規模な事前トレーニング済みモデル (PTM) は、優れたゼロショット機能 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

投稿日: 2024年1月25日作成者: jarxiv

要約空間トランスクリプトミクス (ST) の進歩により、組織病理学画像に基づい … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

投稿日: 2024年1月25日作成者: jarxiv

要約マルチモーダル情報検索 (MMIR) は急速に進化している分野であり、高度 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.IR, cs.MM | コメントを受け付けていません

Benchmarking Large Multimodal Models against Common Corruptions

投稿日: 2024年1月23日作成者: jarxiv

要約この技術レポートは、一般的な破損にさらされた場合の出力の自己一貫性を特に調 … 続きを読む →

カテゴリー: cs.CL, cs.CR, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

投稿日: 2024年1月23日作成者: jarxiv

要約大規模言語モデル (LLM) の成功に続き、Flamingo モデルやその … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

投稿日: 2024年1月22日作成者: jarxiv

要約空間トランスクリプトミクス (ST) の進歩により、組織病理学画像に基づい … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

On the Audio Hallucinations in Large Audio-Video Language Models

投稿日: 2024年1月19日作成者: jarxiv

要約大規模なオーディオビデオ言語モデルは、ビデオとオーディオの両方の説明を生成 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

投稿日: 2024年1月19日作成者: jarxiv

要約言語モデル (LM) は、さまざまな 1D テキスト関連タスクにおいて優れ … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

Vlogger: Make Your Dream A Vlog

投稿日: 2024年1月18日作成者: jarxiv

要約この研究では、ユーザー説明の分レベルのビデオブログ (つまり、vlog) … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

Modularized Zero-shot VQA with Pre-trained Models

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Benchmarking Large Multimodal Models against Common Corruptions

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

On the Audio Hallucinations in Large Audio-Video Language Models

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Vlogger: Make Your Dream A Vlog

最近の投稿

最近のコメント

アーカイブ

カテゴリー