1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On

要約

仮想試着 (VTON) は、電子商取引において重要なツールとなっており、元の外観やポーズを維持しながら、個人の衣服のリアルなシミュレーションを可能にします。
初期の VTON 手法は単一の生成ネットワークに依存していましたが、特徴の抽出と融合には限界があるため、衣服の詳細をきめ細かく保存するには課題が残っています。
これらの問題に対処するために、最近のアプローチではデュアルネットワークパラダイムを採用し、補完的な「ReferenceNet」を組み込んで衣服の特徴抽出と融合を強化しています。
このデュアルネットワークアプローチは効果的ではありますが、大幅な計算オーバーヘッドが発生し、高解像度で長時間の画像/ビデオ VTON アプリケーションの拡張性が制限されます。
この論文では、既存の技術の限界を克服する新しいシングルネットワーク VTON 方式を提案することで、デュアルネットワークパラダイムに挑戦します。
私たちの手法、つまり MNVTON は、テキスト、画像、ビデオ入力を個別に処理するモダリティ固有の正規化戦略を導入し、VTON ネットワーク内で同じアテンションレイヤーを共有できるようにします。
広範な実験結果は、私たちのアプローチの有効性を実証しており、画像とビデオの両方の VTON タスクでより高品質でより詳細な結果が一貫して得られることを示しています。
私たちの結果は、単一ネットワークパラダイムがデュアルネットワークアプローチのパフォーマンスに匹敵し、高品質でスケーラブルな VTON アプリケーションのより効率的な代替手段を提供できることを示唆しています。

要約(オリジナル)

Virtual Try-On (VTON) has become a crucial tool in ecommerce, enabling the realistic simulation of garments on individuals while preserving their original appearance and pose. Early VTON methods relied on single generative networks, but challenges remain in preserving fine-grained garment details due to limitations in feature extraction and fusion. To address these issues, recent approaches have adopted a dual-network paradigm, incorporating a complementary ‘ReferenceNet’ to enhance garment feature extraction and fusion. While effective, this dual-network approach introduces significant computational overhead, limiting its scalability for high-resolution and long-duration image/video VTON applications. In this paper, we challenge the dual-network paradigm by proposing a novel single-network VTON method that overcomes the limitations of existing techniques. Our method, namely MNVTON, introduces a Modality-specific Normalization strategy that separately processes text, image and video inputs, enabling them to share the same attention layers in a VTON network. Extensive experimental results demonstrate the effectiveness of our approach, showing that it consistently achieves higher-quality, more detailed results for both image and video VTON tasks. Our results suggest that the single-network paradigm can rival the performance of dualnetwork approaches, offering a more efficient alternative for high-quality, scalable VTON applications.

arxiv情報

著者	Shuliang Ning,Yipeng Qin,Xiaoguang Han
発行日	2025-01-09 16:49:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー