Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

要約

スケッチリサーチは時間とともに集合的に成熟しているため、AT-Massの商業化への適応はすぐに現れます。
写真のすでに成熟した研究の努力にもかかわらず、スケッチデータ用に特別に設計された効率的な推論に関する研究はありません。
この論文では、写真用に設計された既存の最先端の効率的な光重量モデルがスケッチでは機能しないことを最初に示します。
次に、写真効率の高いネットワークでプラグアンドプレイで動作する2つのスケッチ固有のコンポーネントを提案し、スケッチデータの作業に適応します。
具体的には、即時の商業的価値で最も認識されているスケッチ問題として、デモンストレーターとして、きめ細かいスケッチベースの画像検索（FG-SBIR）を選択しました。
技術的に言えば、最初にクロスモーダルの知識蒸留ネットワークを提案して、既存の写真効率の高いネットワークをスケッチと互換性のあるものに転送します。これにより、フロップとモデルパラメーターの数がそれぞれ97.96％と84.89％を削減します。
次に、スケッチの抽象的な特性を活用して、抽象化レベルに動的に調整するRLベースのキャンバスセレクターを導入し、フロップの数を3分の2に削減します。
最終結果は、完全なネットワークと比較した場合、フルプスの99.37％（40.18gから0.254g）の全体的な減少です。一方、精度（33.03％対32.77％）を保持します。最終的に、最高の写真の対応物よりも少ないフロップを示すまばらなスケッチデータの効率的なネットワークを作成します。

要約(オリジナル)

As sketch research has collectively matured over time, its adaptation for at-mass commercialisation emerges on the immediate horizon. Despite an already mature research endeavour for photos, there is no research on the efficient inference specifically designed for sketch data. In this paper, we first demonstrate existing state-of-the-art efficient light-weight models designed for photos do not work on sketches. We then propose two sketch-specific components which work in a plug-n-play manner on any photo efficient network to adapt them to work on sketch data. We specifically chose fine-grained sketch-based image retrieval (FG-SBIR) as a demonstrator as the most recognised sketch problem with immediate commercial value. Technically speaking, we first propose a cross-modal knowledge distillation network to transfer existing photo efficient networks to be compatible with sketch, which brings down number of FLOPs and model parameters by 97.96% percent and 84.89% respectively. We then exploit the abstract trait of sketch to introduce a RL-based canvas selector that dynamically adjusts to the abstraction level which further cuts down number of FLOPs by two thirds. The end result is an overall reduction of 99.37% of FLOPs (from 40.18G to 0.254G) when compared with a full network, while retaining the accuracy (33.03% vs 32.77%) — finally making an efficient network for the sparse sketch data that exhibit even fewer FLOPs than the best photo counterpart.

arxiv情報

著者	Aneeshan Sain,Subhajit Maity,Pinaki Nath Chowdhury,Subhadeep Koley,Ayan Kumar Bhunia,Yi-Zhe Song
発行日	2025-05-29 17:59:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー