Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

要約

RGB 画像のリアルタイムマルチタスク学習用に設計された Y 字型ニューラルネットワークアーキテクチャである Y-MAP-Net を紹介します。
Y-MAP-Net は、単一のネットワーク評価から、深度、表面法線、人間の姿勢、セマンティックセグメンテーションを同時に予測し、マルチラベルキャプションを生成します。
これを達成するために、私たちは複数教師、単一生徒のトレーニングパラダイムを採用しています。このパラダイムでは、タスク固有の基盤モデルがネットワークの学習を監督し、ネットワークの機能をリアルタイムアプリケーションに適した軽量アーキテクチャに抽出できるようにします。
Y-MAP-Net は、強力な一般化、単純さ、計算効率を示し、ロボット工学やその他の実用的なシナリオに最適です。
将来の研究をサポートするために、コードを公開します。

要約(オリジナル)

We present Y-MAP-Net, a Y-shaped neural network architecture designed for real-time multi-task learning on RGB images. Y-MAP-Net, simultaneously predicts depth, surface normals, human pose, semantic segmentation and generates multi-label captions, all from a single network evaluation. To achieve this, we adopt a multi-teacher, single-student training paradigm, where task-specific foundation models supervise the network’s learning, enabling it to distill their capabilities into a lightweight architecture suitable for real-time applications. Y-MAP-Net, exhibits strong generalization, simplicity and computational efficiency, making it ideal for robotics and other practical scenarios. To support future research, we will release our code publicly.

arxiv情報

著者	Ammar Qammaz,Nikolaos Vasilikopoulos,Iason Oikonomidis,Antonis A. Argyros
発行日	2024-11-15 16:33:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー