Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles

要約

長いコンテキストを処理する大規模言語モデル (LLM) は、実装の複雑さ、トレーニングの効率、データの希薄さのため、依然として困難です。
この問題に対処するために、オンラインロングコンテキスト処理 (OLP) と呼ばれる新しいパラダイムが、長さ無制限のドキュメントを処理するときに提案されています。これは通常、自動ニュース報道、ライブ電子メールなどのさまざまなストリーミングメディアの情報受信と編成で発生します。
コマース、バイラルなショートビデオなど。
さらに、優れた性能、手頃な価格、短い応答遅延を目指して爆発的に成長する中で、多数の LLM の中から最適な LLM を選択しようとすると、ジレンマに遭遇することがよくありました。
これを考慮して、実際のパフォーマンスに応じて、OLP パイプライン内のそれぞれの役割にさまざまな LLM を自動的にデプロイする役割強化学習 (Role-RL) も開発しています。
OLP-MINI データセットに対して広範な実験が行われ、Role-RL フレームワークを備えた OLP は、平均再現率 93.2% で OLP ベンチマークを達成し、LLM コストが 79.4% 削減されることがわかりました。
コードとデータセットは、https://anonymous.4open.science/r/Role-RL で公開されています。

要約(オリジナル)

Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity. To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos. Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays. In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance. Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%. The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL.

arxiv情報

著者	Lewei He,Tianyu Shi,Pengran Huang,Bingzhi Chen,Qianglong Chen,Jiahui Pan
発行日	2024-09-26 16:22:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー