GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

要約

この文書では、手のジェスチャーによる直感的なロボット制御を可能にする、人間とロボットの対話のための高度なシステムである GestLLM を紹介します。
限られた事前定義されたジェスチャのセットに依存する従来のシステムとは異なり、GestLLM は大規模な言語モデルと MediaPipe を介した特徴抽出を活用して、さまざまな範囲のジェスチャを解釈します。
この統合により、ジェスチャの柔軟性が制限されたり、人間のコミュニケーションで一般的に使用される複雑または型破りなジェスチャを認識できないなど、既存のシステムの主要な制限が解決されます。
GestLLM は、最先端の特徴抽出機能と言語モデル機能を組み合わせることで、従来のデータセットでは過小評価されていたジェスチャをサポートしながら、主要なビジョン言語モデルに匹敵するパフォーマンスを実現します。
たとえば、これには、追加の事前トレーニングや迅速なエンジニアリングなどを必要とせずに、スタートレックの「バルカン式敬礼」などの大衆文化のジェスチャーが含まれます。この柔軟性により、ロボット制御の自然性と包括性が強化され、インタラクションがより直観的でユーザーの操作性が向上します。
フレンドリー。
GestLLM は、ジェスチャベースのインタラクションに大きな進歩をもたらし、ロボットがさまざまな手のジェスチャを効果的に理解して応答できるようにします。
この文書では、その設計、実装、評価について概説し、高度な人間とロボットのコラボレーション、支援ロボット工学、およびインタラクティブなエンターテイメントにおける潜在的なアプリケーションを実証します。

要約(オリジナル)

This paper introduces GestLLM, an advanced system for human-robot interaction that enables intuitive robot control through hand gestures. Unlike conventional systems, which rely on a limited set of predefined gestures, GestLLM leverages large language models and feature extraction via MediaPipe to interpret a diverse range of gestures. This integration addresses key limitations in existing systems, such as restricted gesture flexibility and the inability to recognize complex or unconventional gestures commonly used in human communication. By combining state-of-the-art feature extraction and language model capabilities, GestLLM achieves performance comparable to leading vision-language models while supporting gestures underrepresented in traditional datasets. For example, this includes gestures from popular culture, such as the “Vulcan salute’ from Star Trek, without any additional pretraining, prompt engineering, etc. This flexibility enhances the naturalness and inclusivity of robot control, making interactions more intuitive and user-friendly. GestLLM provides a significant step forward in gesture-based interaction, enabling robots to understand and respond to a wide variety of hand gestures effectively. This paper outlines its design, implementation, and evaluation, demonstrating its potential applications in advanced human-robot collaboration, assistive robotics, and interactive entertainment.

arxiv情報

著者	Oleg Kobzarev,Artem Lykov,Dzmitry Tsetserukou
発行日	2025-01-13 13:01:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー