Jekyll-and-Hyde Tipping Point in an AI’s Behavior

要約

AIへの信頼は、LLMの出力（たとえばChatGPT）が間違っていること、誤解を招く、誤解を招く、無関係、または危険になると予測する科学がない、または一般に説明できる科学がないという事実によって損なわれています。
すでにLLMに非難されている死とトラウマがあるため、この不確実性は、人々が彼らの「ペット」LLMをより丁寧に扱うように押し進め、それ（またはその将来の人工的な一般情報の子孫）を突然オンにすることからです。
ここでは、LLMSの最も基本的なレベルでジキルとハイドの転換点が発生する場合の正確な式を第一原理から導き出すことにより、この鋭いニーズに対処します。
中等学校の数学のみを必要とするため、AIの注意が非常に薄いため、突然スナップする原因が示されています。
この正確な式は、プロンプトとAIのトレーニングを変更することにより、チップポイントを遅延または防止する方法の定量的予測を提供します。
調整された一般化により、政策立案者と一般の人々は、AIのより広い用途とリスクについて議論するためのしっかりしたプラットフォームを提供します。
個人カウンセラー、メディカルアドバイザー、紛争状況でいつ武力を使用するかについての意思決定者。
また、「私はLLMに礼儀正しくすべきか」などの質問に対する明確で透明な答えの必要性を満たしています。

要約(オリジナル)

Trust in AI is undermined by the fact that there is no science that predicts — or that can explain to the public — when an LLM’s output (e.g. ChatGPT) is likely to tip mid-response to become wrong, misleading, irrelevant or dangerous. With deaths and trauma already being blamed on LLMs, this uncertainty is even pushing people to treat their ‘pet’ LLM more politely to ‘dissuade’ it (or its future Artificial General Intelligence offspring) from suddenly turning on them. Here we address this acute need by deriving from first principles an exact formula for when a Jekyll-and-Hyde tipping point occurs at LLMs’ most basic level. Requiring only secondary school mathematics, it shows the cause to be the AI’s attention spreading so thin it suddenly snaps. This exact formula provides quantitative predictions for how the tipping-point can be delayed or prevented by changing the prompt and the AI’s training. Tailored generalizations will provide policymakers and the public with a firm platform for discussing any of AI’s broader uses and risks, e.g. as a personal counselor, medical advisor, decision-maker for when to use force in a conflict situation. It also meets the need for clear and transparent answers to questions like ”should I be polite to my LLM?”

arxiv情報

著者	Neil F. Johnson,Frank Yingjie Huo
発行日	2025-04-29 17:50:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Jekyll-and-Hyde Tipping Point in an AI’s Behavior

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー