Reasoning LLMs for User-Aware Multimodal Conversational Agents

要約

ソーシャルロボット工学のパーソナライズは、効果的な人間とロボットの相互作用を促進するために重要ですが、システムは初期のユーザーの好みや特性が利用できないコールドスタートの問題に直面することがよくあります。
このペーパーでは、動的なユーザープロファイリングとモデルの開始を通じてこの課題に対処するユーザー認識の会話エージェントのユーザー-LLM R1と呼ばれる新しいフレームワークを提案します。
当社のアプローチは、チェーンオブテアの（COT）推論モデルを統合して、ユーザーの好みとビジョン言語モデル（VLM）を繰り返し推測して、マルチモーダル入力からユーザープロファイルを初期化し、最初の出会いからパーソナライズされた対話を可能にします。
検索された生成（RAG）アーキテクチャを活用すると、システムは固有のCOTプロセス内でユーザー表現を動的に改良し、コンテキストに関連する適応的応答を確保します。
高齢者Tech-VQAベンチでの評価は、Rouge-1（+23.2％）、Rouge-2（+0.6％）、およびRouge-L（+8％）F1の最先端のベースラインを超える大幅な改善を示しています。
人間の評価は、特にカスタマイズされた応答がエンゲージメントと信頼を高める高齢者のユーザーにとって、フレームワークの有効性をさらに検証します。
プライバシーの保存やバイアス緩和を含む倫理的な考慮事項は、責任ある展開を確実にするために厳密に議論され、対処されます。

要約(オリジナル)

Personalization in social robotics is critical for fostering effective human-robot interactions, yet systems often face the cold start problem, where initial user preferences or characteristics are unavailable. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent that addresses this challenge through dynamic user profiling and model initiation. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models (VLMs) to initialize user profiles from multimodal inputs, enabling personalized interactions from the first encounter. Leveraging a Retrieval-Augmented Generation (RAG) architecture, the system dynamically refines user representations within an inherent CoT process, ensuring contextually relevant and adaptive responses. Evaluations on the ElderlyTech-VQA Bench demonstrate significant improvements in ROUGE-1 (+23.2%), ROUGE-2 (+0.6%), and ROUGE-L (+8%) F1 scores over state-of-the-art baselines, with ablation studies underscoring the impact of reasoning model size on performance. Human evaluations further validate the framework’s efficacy, particularly for elderly users, where tailored responses enhance engagement and trust. Ethical considerations, including privacy preservation and bias mitigation, are rigorously discussed and addressed to ensure responsible deployment.

arxiv情報

著者	Hamed Rahimi,Jeanne Cattoni,Meriem Beghili,Mouad Abrini,Mahdi Khoramshahi,Maribel Pino,Mohamed Chetouani
発行日	2025-04-02 13:00:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reasoning LLMs for User-Aware Multimodal Conversational Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー