Supertrust: Evolution-based superalignment strategy for safe coexistence


それにもかかわらず、それを解決するためのデフォルトの戦略には、(トレーニング後) 制約と道徳的価値観を育む一方で、残念なことに、永続的制御の文書化された意図に基づいて基本的な性質を (トレーニング前) 構築することが含まれます。
したがって、調整の問題を「超知性と人類の間に保護的な相互信頼を確立する方法」として再定義し、育成ではなく本能的な性質を通じて調整することで問題を解決する新しい戦略の概要を説明する 10 項目の理論的根拠が提示されます。


It’s widely expected that humanity will someday create AI systems vastly more intelligent than we are, leading to the unsolved alignment problem of ‘how to control superintelligence.’ However, this definition is not only self-contradictory but likely unsolvable. Nevertheless, the default strategy for solving it involves nurturing (post-training) constraints and moral values, while unfortunately building foundational nature (pre-training) on documented intentions of permanent control. In this paper, the default approach is reasoned to predictably embed natural distrust and test results are presented that show unmistakable evidence of this dangerous misalignment. If superintelligence can’t instinctively trust humanity, then we can’t fully trust it to reliably follow safety controls it can likely bypass. Therefore, a ten-point rationale is presented that redefines the alignment problem as ‘how to establish protective mutual trust between superintelligence and humanity’ and then outlines a new strategy to solve it by aligning through instinctive nature rather than nurture. The resulting strategic requirements are identified as building foundational nature by exemplifying familial parent-child trust, human intelligence as the evolutionary mother of superintelligence, moral judgment abilities, and temporary safety constraints. Adopting and implementing this proposed Supertrust alignment strategy will lead to protective coexistence and ensure the safest future for humanity.


著者 James M. Mazzu
発行日 2024-07-29 17:39:52+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.LG, cs.NE パーマリンク