「cs.MM」カテゴリーアーカイブ

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

投稿日: 2025年1月7日作成者: jarxiv

要約対話における皮肉の説明 (SED) は、新しいながらもやりがいのあるタスク … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

投稿日: 2025年1月7日作成者: jarxiv

要約この論文では、マルチモーダルな理解と生成のための視覚・音声・言語オムニ知覚 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM, eess.AS | コメントを受け付けていません

VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

投稿日: 2025年1月7日作成者: jarxiv

要約オンラインコースにより、教育へのアクセスの障壁は大幅に低くなりましたが、 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Reviewing Intelligent Cinematography: AI research for camera-based video production

投稿日: 2025年1月7日作成者: jarxiv

要約この論文は、エンターテインメント目的での実際のカメラコンテンツ取得のコン … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Towards Expressive Video Dubbing with Multiscale Multimodal Context Interaction

投稿日: 2025年1月3日作成者: jarxiv

要約自動ビデオダビング (AVD) は、スクリプトから唇の動きと顔の感情に合 … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

ChemDFM-X: Towards Large Multimodal Model for Chemistry

投稿日: 2025年1月3日作成者: jarxiv

要約 AI ツールの急速な発展により、化学を含む自然科学の研究にこれまでにない支 … 続きを読む →

カテゴリー: cs.CL, cs.LG, cs.MM | コメントを受け付けていません

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

投稿日: 2025年1月3日作成者: jarxiv

要約サウンドデザイナーやフォーリーアーティストは通常、ビデオ内の関心の … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Inclusion 2024 Global Multimedia Deepfake Detection: Towards Multi-dimensional Facial Forgery Detection

投稿日: 2024年12月31日作成者: jarxiv

要約このペーパーでは、Inclusion 2024 と同時に開催されたグローバ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline

投稿日: 2024年12月31日作成者: jarxiv

要約深層学習の最近の進歩により、特に画像とテキストを共有埋め込みスペースにマッ … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.MM | コメントを受け付けていません

Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration

投稿日: 2024年12月31日作成者: jarxiv

要約ブラインドフェイス復元は、さまざまな未確認の劣化源から高品質の顔画像を復元 … 続きを読む →

カテゴリー: 68U10, cs.CV, cs.MM, I.4.3 | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

Reviewing Intelligent Cinematography: AI research for camera-based video production

Towards Expressive Video Dubbing with Multiscale Multimodal Context Interaction

ChemDFM-X: Towards Large Multimodal Model for Chemistry

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

Inclusion 2024 Global Multimedia Deepfake Detection: Towards Multi-dimensional Facial Forgery Detection

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline

Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration

最近の投稿

最近のコメント

アーカイブ

カテゴリー