Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

要約

オープンエンドの実世界のタスクのゆるい抽象化である一連の最小アルゴリズムタスクを設計します。
これにより、現在の言語モデルの創造的な限界をきれいかつ制御できるようにすることができます。
創造的で先見の明のある思考の飛躍を必要とする現実世界のタスクと同じように、私たちのタスクは、（a）抽象的な知識グラフで新しいつながりを発見し、抽象的な知識グラフ（類推、または研究の描画など）または（b）が新しいパターンを構築するという暗黙のオープンエンドの確率的計画ステップを必要とします。
これらのタスクでは、次のトークン学習が近視であり、過度に記憶される方法を経験的および概念的に議論します。
それに比べて、マルチトークンアプローチ、すなわち教師レストレーニングと拡散モデルは、多様で元の出力の生産に優れています。
第二に、私たちのタスクでは、コヒーレンスを傷つけずにトランスからランダム性を引き出すために、出力層からの温度サンプリングに延期するのではなく、入力層にノイズを（ハッシュ条件付けをダブする方法を介して）注入する方が良いことがわかります。
したがって、私たちの作品は、オープンエンドのクリエイティブスキルを分析するための原則的で最小限のテストベッドを提供し、次のトークン学習とソフトマックスベースのサンプリングを超えて新しい議論を提供します。
コードの一部をhttps://github.com/chenwu98/algorithmic-creativityで利用可能にします

要約(オリジナル)

We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; comparatively, multi-token approaches, namely teacherless training and diffusion models, excel in producing diverse and original output. Secondly, in our tasks, we find that to elicit randomness from the Transformer without hurting coherence, it is better to inject noise right at the input layer (via a method we dub hash-conditioning) rather than defer to temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and softmax-based sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity

arxiv情報

著者	Vaishnavh Nagarajan,Chen Henry Wu,Charles Ding,Aditi Raghunathan
発行日	2025-04-21 17:47:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー