Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning

要約

表現力豊かなロボットの動作は、社会環境でのロボットを広く受け入れるために不可欠です。
学習した脚の移動コントローラーの最近の進歩により、より動的で多用途のロボット動作が可能になりました。
ただし、さまざまなシナリオで異なるユーザーとのやり取りの最適な動作を決定することは依然として課題です。
現在の方法は、効率的ですが低解像度である自然言語の入力に依存するか、人間の好みから学習します。これは、高解像度ですが、サンプルは非効率的です。
このペーパーでは、優先学習の精度とともに、事前に訓練されたLLMによって生成されたプライアーを活用する新しいアプローチを紹介します。
言語誘導選好学習（LGPL）と呼ばれる私たちの方法は、LLMSを使用して初期行動サンプルを生成し、その後、優先ベースのフィードバックを通じて改良され、人間の期待に密接に整合する行動を学習します。
私たちの中心的な洞察は、LLMがサンプリングプロセスを優先学習のためにガイドし、サンプル効率の大幅な改善につながることです。
LGPLは、わずか4つのクエリで正確で表現力のある動作を迅速に学習できることを実証し、純粋に言語パラメーター化されたモデルと従来の好みの学習アプローチの両方を上回ります。
ビデオ付きウェブサイト：https：//lgpl-gaits.github.io/

要約(オリジナル)

Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches. Website with videos: https://lgpl-gaits.github.io/

arxiv情報

著者	Jaden Clark,Joey Hejna,Dorsa Sadigh
発行日	2025-02-06 02:07:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー