Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention

要約

多くのショット内の学習学習は、最近、複数のタスクで同じモデルを提供できるという主要な利点をもたらす微調整の代替として有望であることを示しています。
ただし、これにより、計算の負担がトレーニング時間から推論時間にシフトし、多くのショットICLの展開が実践を正当化するのに挑戦します。
このコストは、推論の例ごとにカスタムデモセットが取得された場合、さらに増加します。
検索ベースの多くのショット内コンテキスト学習のためのトレーニングフリーのフレームワークである動的ブロックスパースの注意を紹介します。
慎重に設計されたブロックスパースの注意とキャッシュされたデモンストレーショングループの検索を組み合わせることにより、強力なICLとフィニティングベースラインにわたる最良の方法の精度の平均を平均して維持しながら、微調整に匹敵する速度ごとの遅延を達成します。
これにより、さらに多くのショットICLが大規模に展開できるようになることを願っています。

要約(オリジナル)

Many-shot in-context learning has recently shown promise as an alternative to finetuning, with the major advantage that the same model can be served for multiple tasks. However, this shifts the computational burden from training-time to inference-time, making deployment of many-shot ICL challenging to justify in-practice. This cost is further increased if a custom demonstration set is retrieved for each inference example. We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning. By combining carefully designed block-sparse attention and retrieval of cached groups of demonstrations, we achieve comparable per-example latency to finetuning while maintaining on average >95% of the best method’s accuracy across strong ICL and finetuning baselines. We hope that this will further enable the deployment of many-shot ICL at scale.

arxiv情報

著者	Emily Xiao,Chin-Jou Li,Yilin Zhang,Graham Neubig,Amanda Bertsch
発行日	2025-03-11 17:30:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー