We Can’t Understand AI Using our Existing Vocabulary

要約

このポジションペーパーは、AIを理解するために、既存の人間の言葉の語彙に頼ることはできないと主張しています。
代わりに、私たちは新学習を開発するよう努力する必要があります：機械を教えたい正確な人間の概念、または学習する必要がある機械の概念を表す新しい単語。
私たちは、人間と機械には概念が異なるという前提から始めます。
これは、解釈可能性をコミュニケーションの問題として組み立てることができることを意味します。人間は機械の概念を参照および制御し、人間の概念を機械に伝えることができなければなりません。
ネオロジズムの発展を通じて共有された人間の言語を作成することは、このコミュニケーションの問題を解決できると考えています。
成功した新学者は有用な量の抽象化を達成します。詳細すぎないため、多くのコンテキストで再利用可能であり、高レベルではないため、正確な情報を伝えます。
概念の証明として、「長さのネオロジズム」がLLM応答の長さを制御する方法を実証し、「多様性の新技術」により、より可変の応答をサンプリングすることができます。
まとめると、既存の語彙を使用してAIを理解できず、ネオログ主義を通じてそれを拡張することで、マシンをよりよく制御および理解する機会が生まれます。

要約(オリジナル)

This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they’re reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a ‘length neologism’ enables controlling LLM response length, while a ‘diversity neologism’ allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.

arxiv情報

著者	John Hewitt,Robert Geirhos,Been Kim
発行日	2025-02-11 14:34:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

We Can’t Understand AI Using our Existing Vocabulary

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー