Tryage: Real-time, intelligent Routing of User Prompts to Large Language Model

要約

トランスフォーマーアーキテクチャとセルフアテンションメカニズムの導入により、特定の下流タスクとデータドメインでトレーニングされた言語モデルが爆発的に生成されるようになりました。
Hugging Face エコシステムには 200,000 を超えるモデルがあり、ユーザーは計算、セキュリティ、最新性の問題に対処しながら、多面的なワークフローとデータドメインに適合するモデルの選択と最適化に取り組んでいます。
モデルの選択とカスタマイズの負担を軽減し、膨大な新しいモデルライブラリの驚異的なパワーをエンドユーザーに解放できる機械学習フレームワークが緊急に必要とされています。
ここでは、個々の入力プロンプトの分析に基づいて、モデルライブラリからエキスパートモデルを最適に選択するために言語モデルルーターを活用する、コンテキスト認識ルーティングシステム Tryage を提案します。
脳内の視床ルーターからインスピレーションを得た Tryage は、知覚ルーターを採用してプロンプトで下流モデルのパフォーマンスを予測し、その後、パフォーマンス予測とユーザーの目標およびフラグを通じて組み込まれた制約を統合する目的関数を使用してルーティングの決定を行います (
例: モデルのサイズ、モデルの最新性など)。
Tryage を使用すると、ユーザーはパレートフロントを探索し、タスクの精度と、モデルサイズの最小化、最新性、セキュリティ、冗長性、可読性などの二次目標との間で自動的にトレードオフを行うことができます。
コード、テキスト、臨床データ、特許を含む異種データセット全体にわたって、Tryage フレームワークは、動的モデル選択において、Gorilla および GPT3.5 Turbo を上回り、50.9% の精度で最適なモデルを特定します (GPT 3.5 Turbo および GPT 3.5 Turbo の 23.6% と比較)。
ゴリラは10.8％。
概念的には、Tryage はルーティングモデルをどのように適用してマルチモデル LLM システムの動作をプログラムおよび制御し、拡大し進化する言語モデルエコシステムを最大限に効率的に利用できるかを示します。

要約(オリジナル)

The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive router to predict down-stream model performance on prompts and, then, makes a routing decision using an objective function that integrates performance predictions with user goals and constraints that are incorporated through flags (e.g., model size, model recency). Tryage allows users to explore a Pareto front and automatically trade-off between task accuracy and secondary goals including minimization of model size, recency, security, verbosity, and readability. Across heterogeneous data sets that include code, text, clinical data, and patents, the Tryage framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how routing models can be applied to program and control the behavior of multi-model LLM systems to maximize efficient use of the expanding and evolving language model ecosystem.

arxiv情報

著者	Surya Narayanan Hari,Matt Thomson
発行日	2023-08-22 17:48:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tryage: Real-time, intelligent Routing of User Prompts to Large Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー