Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

要約

LLM は人間の会話におけるテキストの処理には優れていますが、ソーシャルナビゲーションなどのシナリオでは、あいまいさや不確実性がロボットやその他の AI システムの信頼を損なう可能性があるため、口頭での指示のニュアンスに苦労します。
この欠点は、テキストを超えて、さらにこれらの音声応答のパラ言語的特徴に焦点を当てることで解決できます。
これらの機能は、文字通りの言葉遣い (語彙内容) を必要とせず、何かの言い方を通じて意味やニュアンスを伝える、音声コミュニケーションの側面です。
私たちは、音声文字起こしとこれらの機能のサブセクションを統合することで、LLM の意思決定を改善するアプローチである Beyond Text を紹介します。これは、人間とロボットの会話における影響とより関連性に焦点を当てています。このアプローチは、70.26% の勝率を達成するだけでなく、
既存の LLM よりも 22.16% ～ 48.30% (それぞれ Gemini-1.5-pro と gpt-3.5) 優れたパフォーマンスを示しますが、トークン操作に対する堅牢性も強化されています。
これは、テキストのみの言語モデルよりも勝率の減少率が 22.44% 低いことで強調されています。
「Beyond Text」は、ソーシャルロボットナビゲーションと人間とロボットのより広範なインタラクションの進歩を示し、テキストベースのガイダンスと人間の音声情報を活用した言語モデルをシームレスに統合します。

要約(オリジナル)

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text’ marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.

arxiv情報

著者	Xingpeng Sun,Haoming Meng,Souradip Chakraborty,Amrit Singh Bedi,Aniket Bera
発行日	2024-11-11 04:03:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー