Head and eye egocentric gesture recognition for human-robot interaction using eyewear cameras

要約

非言語コミュニケーションは、ヒューマンロボットインタラクション（HRI）の幅広いシナリオで特に重要な役割を果たします。
したがって、この作業は、人間のジェスチャ認識の問題に対処します。
特に、頭と目のジェスチャーに焦点を当て、アイウェアカメラを使用して自己中心的な（一人称）視点を採用しています。
この自己中心的なビューは、シーン中心またはロボット中心の視点よりも多くの概念的および技術的な利点を提供する可能性があると私たちは主張します。
2つの時間的粒度で動作するモーションベースの認識アプローチが提案されています。
ローカルでは、フレーム間のホモグラフィは畳み込みニューラルネットワーク（CNN）を使用して推定されます。
このCNNの出力は、長短期記憶（LSTM）に入力され、ジェスチャの特性評価に関連する長期の時間的視覚的関係をキャプチャします。
ネットワークアーキテクチャの構成に関して、特に興味深い発見の1つは、ホモグラフィCNNの内部層の出力を使用すると、ホモグラフィマトリックス自体を使用する場合に比べて認識率が向上することです。
この作業は行動認識に焦点を当てており、ロボットやユーザーの研究はまだ行われていませんが、システムはリアルタイムの制約を満たすように設計されています。
有望な結果は、提案された自己中心的な視点が実行可能であることを示唆しており、この概念実証作業は、HRIの刺激的な領域に斬新で有用な貢献を提供します。

要約(オリジナル)

Non-verbal communication plays a particularly important role in a wide range of scenarios in Human-Robot Interaction (HRI). Accordingly, this work addresses the problem of human gesture recognition. In particular, we focus on head and eye gestures, and adopt an egocentric (first-person) perspective using eyewear cameras. We argue that this egocentric view may offer a number of conceptual and technical benefits over scene- or robot-centric perspectives. A motion-based recognition approach is proposed, which operates at two temporal granularities. Locally, frame-to-frame homographies are estimated with a convolutional neural network (CNN). The output of this CNN is input to a long short-term memory (LSTM) to capture longer-term temporal visual relationships, which are relevant to characterize gestures. Regarding the configuration of the network architecture, one particularly interesting finding is that using the output of an internal layer of the homography CNN increases the recognition rate with respect to using the homography matrix itself. While this work focuses on action recognition, and no robot or user study has been conducted yet, the system has been designed to meet real-time constraints. The encouraging results suggest that the proposed egocentric perspective is viable, and this proof-of-concept work provides novel and useful contributions to the exciting area of HRI.

arxiv情報

著者	Javier Marina-Miranda,V. Javier Traver
発行日	2022-06-10 17:29:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Head and eye egocentric gesture recognition for human-robot interaction using eyewear cameras

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー