Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

要約

政策最適化は、目的関数または適合度関数に従って制御問題の最適解を求めるもので、ロボット工学に応用される工学および研究の基礎分野として機能している。強化学習や進化的アルゴリズムのような伝統的な最適化手法は、欺瞞的なフィットネスランドスケープに苦戦し、即座の改善を追いかけると最適解以下になってしまう。品質多様性（QD）アルゴリズムは、局所最適を脱出するための足がかりとして多様な中間解を維持することで、有望なアプローチを提供する。しかし、QDアルゴリズムには、手作業で特徴を定義するための専門知識が必要であり、解の多様性を特徴付けることが不明確なままである場合には適用が制限される。本論文では、教師なしQDアルゴリズム（特に、感覚データから特徴を学習するAURORAフレームワーク）が、領域の専門知識がなくても、欺瞞的最適化問題を効率的に解くことを示す。AURORAを対照学習と周期的消滅イベントで強化することで、我々はAURORA-XConを提案する。AURORA-XConは、従来の最適化ベースラインの全てを凌駕し、ドメイン固有の手作業で作成された特徴を持つ最良のQDベースラインに匹敵し、場合によっては最大34％改善する。この研究は、教師なしQDアルゴリズムの新しい応用を確立し、従来の最適化に対する新しい解の発見からその焦点を移し、特徴空間を定義することが困難であるドメインにその可能性を拡大する。

要約(オリジナル)

Policy optimization seeks the best solution to a control problem according to an objective or fitness function, serving as a fundamental field of engineering and research with applications in robotics. Traditional optimization methods like reinforcement learning and evolutionary algorithms struggle with deceptive fitness landscapes, where following immediate improvements leads to suboptimal solutions. Quality-diversity (QD) algorithms offer a promising approach by maintaining diverse intermediate solutions as stepping stones for escaping local optima. However, QD algorithms require domain expertise to define hand-crafted features, limiting their applicability where characterizing solution diversity remains unclear. In this paper, we show that unsupervised QD algorithms – specifically the AURORA framework, which learns features from sensory data – efficiently solve deceptive optimization problems without domain expertise. By enhancing AURORA with contrastive learning and periodic extinction events, we propose AURORA-XCon, which outperforms all traditional optimization baselines and matches, in some cases even improving by up to 34%, the best QD baseline with domain-specific hand-crafted features. This work establishes a novel application of unsupervised QD algorithms, shifting their focus from discovering novel solutions toward traditional optimization and expanding their potential to domains where defining feature spaces poses challenges.

arxiv情報

著者	Lisa Coiffard,Paul Templier,Antoine Cully
発行日	2025-04-04 15:03:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー