「cs.MM」カテゴリーアーカイブ

Image Conductor: Precision Control for Interactive Video Synthesis

投稿日: 2024年6月24日作成者: jarxiv

要約映画制作やアニメーション制作では、多くの場合、カメラのトランジションやオブ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

投稿日: 2024年6月21日作成者: jarxiv

要約 Explainable AI for the Arts (XAIxArts … 続きを読む →

カテゴリー: cs.AI, cs.HC, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

投稿日: 2024年6月21日作成者: jarxiv

要約ラージビジョンランゲージモデル (LVLM) の出現により、マルチモ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

投稿日: 2024年6月19日作成者: jarxiv

要約ビデオ編集は、エンターテインメントや教育からプロフェッショナルなコミュニケ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Unveiling Encoder-Free Vision-Language Models

投稿日: 2024年6月18日作成者: jarxiv

要約既存のビジョン言語モデル (VLM) は、主にビジョンエンコーダに依存し … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

投稿日: 2024年6月17日作成者: jarxiv

要約ポリープは早期がんの指標であるため、ポリープの発生とその切除を評価すること … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

CinePile: A Long Video Question Answering Dataset and Benchmark

投稿日: 2024年6月17日作成者: jarxiv

要約長い形式のビデオを理解するための現在のデータセットは、ビデオから 1 つま … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

投稿日: 2024年6月14日作成者: jarxiv

要約近年、教育における人工知能技術への注目が高まっていますが、効果的な楽器指導 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Explore the Limits of Omni-modal Pretraining at Scale

投稿日: 2024年6月14日作成者: jarxiv

要約私たちは、あらゆるモダリティを理解し、普遍的な表現を学習できるオムニモーダ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

投稿日: 2024年6月13日作成者: jarxiv

要約テキストデータは一般に、音声感情認識 (SER) のパフォーマンスと信頼 … 続きを読む →

カテゴリー: cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Image Conductor: Precision Control for Interactive Video Synthesis

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Unveiling Encoder-Free Vision-Language Models

SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

CinePile: A Long Video Question Answering Dataset and Benchmark

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

Explore the Limits of Omni-modal Pretraining at Scale

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

最近の投稿

最近のコメント

アーカイブ

カテゴリー