Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

要約

与えられた顔写真からスケッチポートレートを生成できるフェイシャルスケッチ合成 (FSS) モデルは、クロスモーダル顔認識、エンターテイメント、アート、メディアなどを含む複数の領域にわたって深い意味を持っています。
しかし、高品質のスケッチの作成は、主に 3 つの重要な要素に関連する課題と欠陥により依然として困難な作業です。(1) アーティストが描いたデータの不足、(2) 限られたスタイルタイプによる制約、および
(3) 既存のモデルにおける入力情報の処理の欠陥。
これらの困難に対処するために、画像を対応する複数の様式化されたスケッチに効率的に変換し、追加の入力 (3D ジオメトリなど) を不要にする軽量のエンドツーエンド合成モデルを提案します。
この研究では、トレーニングプロセスに半教師あり学習を組み込むことで、データ不足の問題を克服しました。
さらに、特徴抽出モジュールとスタイル埋め込みを採用して、マスクされた画像トークンの反復予測中に生成トランスフォーマーを適切に操作し、スケッチ内の顔の特徴を正確に保持する連続的な様式化された出力を実現します。
広範な実験により、私たちの手法が複数のベンチマークにわたって一貫して以前のアルゴリズムを上回り、明らかな差異が見られることが実証されました。

要約(オリジナル)

The facial sketch synthesis (FSS) model, capable of generating sketch portraits from given facial photographs, holds profound implications across multiple domains, encompassing cross-modal face recognition, entertainment, art, media, among others. However, the production of high-quality sketches remains a formidable task, primarily due to the challenges and flaws associated with three key factors: (1) the scarcity of artist-drawn data, (2) the constraints imposed by limited style types, and (3) the deficiencies of processing input information in existing models. To address these difficulties, we propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches, obviating the necessity for any supplementary inputs (\eg, 3D geometry). In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process. Additionally, we employ a feature extraction module and style embeddings to proficiently steer the generative transformer during the iterative prediction of masked image tokens, thus achieving a continuous stylized output that retains facial features accurately in sketches. The extensive experiments demonstrate that our method consistently outperforms previous algorithms across multiple benchmarks, exhibiting a discernible disparity.

arxiv情報

著者	Bowen Sun,Guo Lu,Shibao Zheng
発行日	2024-08-22 13:45:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Style Facial Sketch Synthesis through Masked Generative Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー