Compositional Scene Representation Learning via Reconstruction: A Survey

要約

視覚シーン表現学習は、コンピュータビジョンの分野における重要な研究課題です。
視覚シーンにより適した表現を学習すれば、視覚タスクでの人工知能システムのパフォーマンスを向上させることができます。
複雑な視覚シーンは、比較的単純な視覚概念で構成されており、組み合わせ爆発の特性を備えています。
視覚的なシーン全体を直接表現する場合と比較して、構図のシーン表現を抽出すると、背景とオブジェクトのさまざまな組み合わせにうまく対処できます。
構図シーン表現はオブジェクトの概念を抽象化するため、これらの表現に基づいて視覚的なシーン分析と理解を実行する方が簡単で、より解釈しやすい可能性があります。
さらに、再構成による学習により、データ注釈のトレーニングの必要性を大幅に減らすことができます。
したがって、再構成ベースの構成シーン表現学習は、重要な研究上の重要性を持っています。
この調査では、最初に、視覚的シーンのモデリングとシーン表現の推論の観点からの開発履歴と既存の方法の分類を含む、この研究トピックの現在の進捗状況を概説します。
次に、ベンチマーク実験を再現するためのオープンソースツールボックスを含む、最も広く研究されている問題設定を考慮し、他の方法の基盤を形成する代表的な方法のベンチマークを提供します。
そして最後に、この研究トピックの将来の方向性について話し合います。

要約(オリジナル)

Visual scene representation learning is an important research problem in the field of computer vision. The performance of artificial intelligence systems on vision tasks could be improved if more suitable representations are learned for visual scenes. Complex visual scenes are composed of relatively simple visual concepts, and have the property of combinatorial explosion. Compared with directly representing the entire visual scene, extracting compositional scene representations can better cope with the diverse combinations of background and objects. Because compositional scene representations abstract the concept of objects, performing visual scene analysis and understanding based on these representations could be easier and more interpretable. Moreover, learning via reconstruction can greatly reduce the need for training data annotations. Therefore, reconstruction-based compositional scene representation learning has important research significance. In this survey, we first outline the current progress on this research topic, including development history and categorizations of existing methods from the perspectives of modeling of visual scenes and inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the future directions of this research topic.

arxiv情報

著者	Jinyang Yuan,Tonglin Chen,Bin Li,Xiangyang Xue
発行日	2022-06-03 04:02:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Compositional Scene Representation Learning via Reconstruction: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー