Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

要約

ディープラーニングは、センサーデータやユーザー入力などの動的な入力を効率的に処理するという課題に直面することがよくあります。
たとえば、AI ライティングアシスタントは、ドキュメントの編集中にリアルタイムで提案を更新する必要があります。
知識の蒸留、枝刈り、量子化などの圧縮技術を使用した場合でも、毎回モデルを再実行するとコストがかかります。
代わりに、インクリメンタルコンピューティングのアプローチを採用し、入力の変化に応じて計算を再利用することを目指しています。
しかし、従来のアーキテクチャの高密度接続は、わずかな入力変更でもネットワーク内をカスケードして情報の再利用を制限するため、増分計算には大きな障害となります。
これに対処するために、ベクトル量子化を使用してネットワーク内の中間値を離散化し、隠れたニューロンに対するノイズの多い不必要な変更を除去し、値の再利用を容易にします。
このアプローチをトランスフォーマーアーキテクチャに適用し、変更された入力の割合に比例した複雑さを持つ効率的な増分推論アルゴリズムを作成します。
OPT-125M の事前トレーニング済み言語モデルを適応させた実験では、アトミックな編集シーケンスの処理に必要な操作が 12.1 倍 (中央値) 少ない一方で、文書分類に関しては同等の精度が実証されました。

要約(オリジナル)

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of the modified inputs. Our experiments with adapting the OPT-125M pre-trained language model demonstrate comparable accuracy on document classification while requiring 12.1X (median) fewer operations for processing sequences of atomic edits.

arxiv情報

著者	Or Sharir,Anima Anandkumar
発行日	2023-07-27 16:30:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー