Visual Acoustic Fields

要約

オブジェクトはヒットすると異なる音を生成し、人間はその外観と材料特性に基づいてオブジェクトがどのように鳴るかを直感的に推測できます。
この直感に触発されて、3Dガウスのスプラッティング（3DG）を使用して3Dスペース内で音と視覚的な信号を打つ橋渡しするフレームワークである視覚的なアコースティックフィールドを提案します。
私たちのアプローチには、2つの重要なモジュールがあります。サウンド生成とサウンドローカリゼーションです。
サウンド生成モジュールは、条件付き拡散モデルを活用します。これにより、特徴の高級3DGからレンダリングされたマルチスケール機能が現実的なヒット音を生成します。
一方、サウンドローカリゼーションモジュールは、機能編成された3DGSで表される3Dシーンをクエリすることで、サウンドソースに基づいてヒット位置をローカライズできます。
このフレームワークをサポートするために、シーンレベルのビジュアルサウンドサンプルペアを収集し、キャプチャされた画像、インパクトロケーション、対応するサウンド間のアライメントを実現するための新しいパイプラインを紹介します。
私たちの知る限り、これは3Dコンテキストで視覚と音響信号を接続する最初のデータセットです。
データセットでの広範な実験は、もっともらしい衝撃音を生成し、衝撃ソースを正確にローカライズする際の視覚音響場の有効性を示しています。
プロジェクトページはhttps://yuelei0428.github.io/projects/visual-acoustic-fields/にあります。

要約(オリジナル)

Objects produce different sounds when hit, and humans can intuitively infer how an object might sound based on its appearance and material properties. Inspired by this intuition, we propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space using 3D Gaussian Splatting (3DGS). Our approach features two key modules: sound generation and sound localization. The sound generation module leverages a conditional diffusion model, which takes multiscale features rendered from a feature-augmented 3DGS to generate realistic hitting sounds. Meanwhile, the sound localization module enables querying the 3D scene, represented by the feature-augmented 3DGS, to localize hitting positions based on the sound sources. To support this framework, we introduce a novel pipeline for collecting scene-level visual-sound sample pairs, achieving alignment between captured images, impact locations, and corresponding sounds. To the best of our knowledge, this is the first dataset to connect visual and acoustic signals in a 3D context. Extensive experiments on our dataset demonstrate the effectiveness of Visual Acoustic Fields in generating plausible impact sounds and accurately localizing impact sources. Our project page is at https://yuelei0428.github.io/projects/Visual-Acoustic-Fields/.

arxiv情報

著者	Yuelei Li,Hyunjin Kim,Fangneng Zhan,Ri-Zhao Qiu,Mazeyu Ji,Xiaojun Shan,Xueyan Zou,Paul Liang,Hanspeter Pfister,Xiaolong Wang
発行日	2025-04-01 03:16:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Acoustic Fields

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー