Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

要約

ポリシーの最適化は、客観的またはフィットネス機能に従って制御問題に対する最良の解決策を求めており、ロボット工学のアプリケーションを使用したエンジニアリングと研究の基本的な分野として機能します。
強化学習や進化的アルゴリズムなどの従来の最適化方法は、即時の改善に続くと、最適ではないソリューションにつながる、欺cept的なフィットネスの状況と格闘しています。
Quality-Diversity（QD）アルゴリズムは、地元のオプティマを逃れるためのステッピングストーンとして多様な中間ソリューションを維持することにより、有望なアプローチを提供します。
ただし、QDアルゴリズムには、手作りの機能を定義するためにドメインの専門知識が必要であり、ソリューションの多様性を特徴付ける適用性を制限する必要があります。
このホワイトペーパーでは、監視されていないQDアルゴリズム、特に感覚データから機能を学習するAuroraフレームワークは、ドメインの専門知識なしに欺cept的な最適化の問題を効率的に解決することを示します。
対照的な学習と定期的な絶滅イベントでオーロラを強化することにより、すべての従来の最適化ベースラインとマッチを上回るAurora-XCONを提案します。
この作業は、監視されていないQDアルゴリズムの新しいアプリケーションを確立し、従来の最適化への新しいソリューションの発見から焦点を移し、特徴スペースの定義が課題をもたらすドメインに可能性を拡大します。

要約(オリジナル)

Policy optimization seeks the best solution to a control problem according to an objective or fitness function, serving as a fundamental field of engineering and research with applications in robotics. Traditional optimization methods like reinforcement learning and evolutionary algorithms struggle with deceptive fitness landscapes, where following immediate improvements leads to suboptimal solutions. Quality-diversity (QD) algorithms offer a promising approach by maintaining diverse intermediate solutions as stepping stones for escaping local optima. However, QD algorithms require domain expertise to define hand-crafted features, limiting their applicability where characterizing solution diversity remains unclear. In this paper, we show that unsupervised QD algorithms – specifically the AURORA framework, which learns features from sensory data – efficiently solve deceptive optimization problems without domain expertise. By enhancing AURORA with contrastive learning and periodic extinction events, we propose AURORA-XCon, which outperforms all traditional optimization baselines and matches, in some cases even improving by up to 34%, the best QD baseline with domain-specific hand-crafted features. This work establishes a novel application of unsupervised QD algorithms, shifting their focus from discovering novel solutions toward traditional optimization and expanding their potential to domains where defining feature spaces poses challenges.

arxiv情報

著者	Lisa Coiffard,Paul Templier,Antoine Cully
発行日	2025-04-02 17:18:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー