LLMs as Method Actors: A Model for Prompt Engineering and Architecture

要約

LLM プロンプトエンジニアリングとプロンプトアーキテクチャをガイドするメンタルモデルとして「メソッドアクター」を紹介します。
この精神モデルの下では、LLM はアクターとして考えられるべきです。
スクリプトおよびキューとしてのプロンプト。
パフォーマンスとしての LLM 応答。
私たちは、このメンタルモデルを、先行研究で LLM 推論を評価するための難しいベンチマークであると特定されたニューヨークタイムズのワードパズルゲームである Connections をプレイする際の LLM パフォーマンスを向上させるタスクに適用します。
GPT-4o を使った実験では、「メソッドアクター」アプローチが、通常のアプローチと「思考の連鎖」アプローチの両方に比べて LLM のパフォーマンスを大幅に向上できることがわかりました。
標準的なアプローチは、データセット内の接続パズルの 27% を解決し、「思考の連鎖」アプローチはパズルの 41% を解決します。一方、最も強力な「メソッドアクター」アプローチはパズルの 86% を解決します。
また、複雑な推論タスク用に特別に設計された OpenAI の最新モデルである o1-preview もテストします。
パズルを一度に解くように要求された場合、o1-preview はデータセット内の Connections パズルの 79% を解決します。また、複数の API 呼び出しで一度に 1 つずつ推測してパズルの解決策を構築できる場合、o1-preview はパズルの 100% を解決します。
「メソッドアクター」プロンプトアーキテクチャを組み込むと、o1-preview が完全に解決するパズルの割合が 76% から 87% に増加します。

要約(オリジナル)

We introduce ‘Method Actors’ as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a ‘Method Actors’ approach can significantly improve LLM performance over both a vanilla and ‘Chain of Thoughts’ approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a ‘Chain of Thoughts’ approach solves 41% of puzzles, whereas our strongest ‘Method Actor’ approach solves 86% of puzzles. We also test OpenAI’s newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a ‘Method Actor’ prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.

arxiv情報

著者	Colin Doyle
発行日	2024-11-08 18:45:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMs as Method Actors: A Model for Prompt Engineering and Architecture

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー