Revisiting Cascaded Ensembles for Efficient Inference

要約

機械学習の推論をより効率的にするための一般的なアプローチは、推論時に各例のモデルをルーティングまたは選択する、例固有の適応スキームを使用することです。
この研究では、適応推論のための単純なスキームを研究します。
私たちは、リソース効率の高いモデルから始めて、より大規模で表現力豊かなモデルへと成長するアンサンブルのカスケード (CoE) を構築します。ここでは、アンサンブル合意がデータ依存のルーティング基準として機能します。
このスキームは既存の推論パイプラインに簡単に組み込むことができ、追加のトレーニングは必要なく、モデルを複数のリソース層に配置するために使用できます。たとえば、エッジで効率的なモデルを提供し、必要な場合にのみクラウドで大規模なモデルを呼び出すことができます。
並列推論が実現可能な場合、CoE は推論の平均コストを最大 7 分の 1 に削減しながら、単一の最良のモデルと比較して精度を向上させることができ、既存の適応推論ベースラインと比較して精度と効率においてパレート支配のソリューションを提供できることを示します。
これらの節約は、GPU の異種クラスターを使用して推論を実行する場合の総金銭的コストの 3 倍以上の削減につながります。
最後に、カスケードの一部がクラウドではなくエッジに存在するエッジ推論シナリオの場合、CoE は精度を犠牲にすることなく、通信コストと推論遅延を 14 分の 1 に削減できます。

要約(オリジナル)

A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves as a data-dependent routing criterion. This scheme is easy to incorporate into existing inference pipelines, requires no additional training, and can be used to place models across multiple resource tiers–for instance, serving efficient models at the edge and invoking larger models in the cloud only when necessary. In cases where parallel inference is feasible, we show that CoE can improve accuracy relative to the single best model while reducing the average cost of inference by up to 7x, and provides Pareto-dominate solutions in accuracy and efficiency relative to existing adaptive inference baselines. These savings translate to an over 3x-reduction in total monetary cost when performing inference using a heterogeneous cluster of GPUs. Finally, for edge inference scenarios where portions of the cascade reside at the edge vs. in the cloud, CoE can provide a 14x reduction in communication cost and inference latency without sacrificing accuracy.

arxiv情報

著者	Steven Kolawole,Don Dennis,Ameet Talwalkar,Virginia Smith
発行日	2024-07-02 15:14:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revisiting Cascaded Ensembles for Efficient Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー