「cs.MM」カテゴリーアーカイブ

Differentiating Emigration from Return Migration of Scholars Using Name-Based Nationality Detection Models

投稿日: 2025年5月12日作成者: jarxiv

要約ほとんどのWebおよびデジタルトレースデータには、プライバシーの懸念による … 続きを読む →

カテゴリー: cs.CL, cs.DL, cs.MM | コメントを受け付けていません

Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study

投稿日: 2025年5月12日作成者: jarxiv

要約自動化されたヘイトスピーチ検出への関心が高まっているにもかかわらず、既存の … 続きを読む →

カテゴリー: cs.CL, cs.CY, cs.MM | コメントを受け付けていません

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

投稿日: 2025年5月9日作成者: jarxiv

要約マルチモーダル感情分析（MSA）は、言語、視覚、音響のモダリティを活用する … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

Does CLIP perceive art the same way we do?

投稿日: 2025年5月9日作成者: jarxiv

要約クリップは、関節の埋め込みを介して画像やテキストを接続できる強力なマルチモ … 続きを読む →

カテゴリー: (Primary), 68T45, 68U10, cs.CV, cs.MM, I.2.10 | コメントを受け付けていません

Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform

投稿日: 2025年5月8日作成者: jarxiv

要約自動音楽転写（AMT）は、音楽のオーディオ録音を分析し、再生されているメモ … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

投稿日: 2025年5月8日作成者: jarxiv

要約オーディオSDSを紹介します。オーディオSDは、テキストコンディショニング … 続きを読む →

カテゴリー: 68T07, cs.AI, cs.LG, cs.MM, cs.SD, eess.AS, H.5.1 | コメントを受け付けていません

Question-Answering Dense Video Events

投稿日: 2025年5月8日作成者: jarxiv

要約このペーパーでは、密集したビデオイベントに関する質問を提示します。これは、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

‘I Can See Forever!’: Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

投稿日: 2025年5月8日作成者: jarxiv

要約視覚障害のある人口、特に重度の視覚障害者は現在大きく、日々の活動は彼らにと … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.HC, cs.MM | コメントを受け付けていません

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

投稿日: 2025年5月8日作成者: jarxiv

要約この作業では、ビデオのみで条件付けられた音楽生成を体系的に研究しています。 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD | コメントを受け付けていません

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

投稿日: 2025年5月8日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、テキスト、ビジョン、オーディオ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Differentiating Emigration from Return Migration of Scholars Using Name-Based Nationality Detection Models

Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

Does CLIP perceive art the same way we do?

Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

Question-Answering Dense Video Events

‘I Can See Forever!’: Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning

最近の投稿

最近のコメント

アーカイブ

カテゴリー