Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks

要約

ビデオから高品質のピアノオーディオを生成するには、視覚的な手がかりと音楽出力の間の正確な同期が必要であり、正確なセマンティックおよび時間的アライメントを確保する必要があります。
包括的なベンチマークは、2つの主な理由に不可欠です。（1）既存のメトリックは、ビデオとピアノの音楽の相互作用の複雑さを反映していません。
これらの課題に対処するために、ビデオ誘導ピアノ音楽の世代向けに特別に設計された完全にオープンソースのマルチモーダルベンチマークA Cop Benchmark Dataset-Aを紹介します。
提案されているパフォーマンスのチェーン（COP）ベンチマークは、いくつかの説得力のある機能を提供します。（1）詳細なマルチモーダルアノテーションは、ビデオコンテンツとピアノオーディオの間の正確なセマンティックと時間的アライメントを段階的なパフォーマンスガイダンスを介して可能にします。
（2）汎用と特殊なビデオからピアノの世代の両方のタスクの両方を厳密に評価するための汎用性の高い評価フレームワーク。
（3）データセット、注釈、および評価プロトコルの完全なオープンソース。
データセットは、https://github.com/acappemin/video-to-audio-and-pianoで公開されており、このドメインで進行中の研究を促進するために継続的に更新されたリーダーボードがあります。

要約(オリジナル)

Generating high-quality piano audio from video requires precise synchronization between visual cues and musical output, ensuring accurate semantic and temporal alignment.However, existing evaluation datasets do not fully capture the intricate synchronization required for piano music generation. A comprehensive benchmark is essential for two primary reasons: (1) existing metrics fail to reflect the complexity of video-to-piano music interactions, and (2) a dedicated benchmark dataset can provide valuable insights to accelerate progress in high-quality piano music generation. To address these challenges, we introduce the CoP Benchmark Dataset-a fully open-sourced, multimodal benchmark designed specifically for video-guided piano music generation. The proposed Chain-of-Perform (CoP) benchmark offers several compelling features: (1) detailed multimodal annotations, enabling precise semantic and temporal alignment between video content and piano audio via step-by-step Chain-of-Perform guidance; (2) a versatile evaluation framework for rigorous assessment of both general-purpose and specialized video-to-piano generation tasks; and (3) full open-sourcing of the dataset, annotations, and evaluation protocols. The dataset is publicly available at https://github.com/acappemin/Video-to-Audio-and-Piano, with a continuously updated leaderboard to promote ongoing research in this domain.

arxiv情報

著者	Chang Liu,Haomin Zhang,Shiyu Xia,Zihao Chen,Chaofan Ding,Xin Yue,Huizhe Chen,Xinhan Di
発行日	2025-05-26 14:24:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー