Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality

要約

遮蔽されたオブジェクトの選択は、仮想現実においては困難な問題であり、複数のオブジェクトが関係する場合はさらに困難になります。
新しい人工知能技術の出現に伴い、私たちは大規模な言語モデルを活用して、マルチモーダル音声およびレイキャストインタラクション技術を介して仮想現実における複数のオブジェクトの選択タスクを支援する可能性を模索しています。
私たちは、参加者が異なるレベルのシーンの複雑さで仮想現実シーン内のターゲットオブジェクトを選択した比較ユーザー調査 (n=24) でその結果を検証しました。
パフォーマンスメトリックとユーザーエクスペリエンスメトリックは、ベースラインとして機能するミニマップベースの遮蔽されたオブジェクト選択手法と比較されます。
結果は、複数のターゲットオブジェクトがある場合、導入された技術 AssistVR がベースライン技術よりも優れていることを示しています。
音声インターフェイスに対する一般的な考えに反して、AssistVR は、ターゲットオブジェクトを口頭で参照することが難しい場合でも、ベースラインを上回るパフォーマンスを発揮することができました。
この研究は、大規模な言語モデルを活用したインテリジェントなマルチモーダル対話型システムの実行可能性と対話の可能性を実証します。
この結果に基づいて、没入型環境における将来のインテリジェントなマルチモーダルインタラクティブシステムの設計への影響について議論します。

要約(オリジナル)

Selection of occluded objects is a challenging problem in virtual reality, even more so if multiple objects are involved. With the advent of new artificial intelligence technologies, we explore the possibility of leveraging large language models to assist multi-object selection tasks in virtual reality via a multimodal speech and raycast interaction technique. We validate the findings in a comparative user study (n=24), where participants selected target objects in a virtual reality scene with different levels of scene perplexity. The performance metrics and user experience metrics are compared against a mini-map based occluded object selection technique that serves as the baseline. Results indicate that the introduced technique, AssistVR, outperforms the baseline technique when there are multiple target objects. Contrary to the common belief for speech interfaces, AssistVR was able to outperform the baseline even when the target objects were difficult to reference verbally. This work demonstrates the viability and interaction potential of an intelligent multimodal interactive system powered by large laguage models. Based on the results, we discuss the implications for design of future intelligent multimodal interactive systems in immersive environments.

arxiv情報

著者	Junlong Chen,Jens Grubert,Per Ola Kristensson
発行日	2024-10-28 14:56:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー