BoTTA: Benchmarking on-device Test Time Adaptation

要約

ディープラーニングモデルのパフォーマンスは、実行時にテストサンプルに大きく依存し、トレーニングデータの分布からのシフトは精度を大幅に低下させる可能性があります。
テスト時間適応（TTA）は、ラベル付きのテストデータや元のトレーニングセットへのアクセスを必要とせずに、推論中にモデルを適応させることにより、これに対処します。
調査により、アルゴリズムの複雑さ、データとクラスの分布シフト、モデルアーキテクチャ、オフラインと継続的な学習などのさまざまな観点からTTAが調査されていますが、モバイルおよびエッジデバイスに固有の制約は依存していないままです。
モバイルおよびエッジデバイスの実際の制約の下でTTAメソッドを評価するために設計されたベンチマークであるBottaを提案します。
私たちの評価では、限られたリソースと使用条件によって引き起こされる4つの重要な課題を対象としています。（i）限られたテストサンプル、（ii）カテゴリへの限られた曝露、（iii）多様な分布シフト、および（iv）サンプル内のシフトの重複。
ベンチマークデータセットを使用して、これらのシナリオで最先端のTTAメソッドを評価し、実際のテストベッドでシステムレベルのメトリックをレポートします。
さらに、以前の作業とは異なり、継続的な推論時間適応の代わりに定期的な適応を提唱することにより、デバイス上の要件に沿っています。
実験は重要な洞察を明らかにしています。最近の多くのTTAアルゴリズムは、小さなデータセットと格闘し、目に見えないカテゴリに一般化することができず、分布シフトの多様性と複雑さに依存しています。
Bottaは、デバイス固有のリソースの使用も報告しています。
たとえば、Shotは512ドルの適応サンプルで2.25 \ Times $ $ $ 2.25 \ Times $だけ改善されますが、Raspberry Piとベースモデルで$ 1.08 \ Times $のピークメモリを使用します。
Bottaは、実際のリソースに制約のある展開におけるTTAのための実用的なガイダンスを提供しています。

要約(オリジナル)

The performance of deep learning models depends heavily on test samples at runtime, and shifts from the training data distribution can significantly reduce accuracy. Test-time adaptation (TTA) addresses this by adapting models during inference without requiring labeled test data or access to the original training set. While research has explored TTA from various perspectives like algorithmic complexity, data and class distribution shifts, model architectures, and offline versus continuous learning, constraints specific to mobile and edge devices remain underexplored. We propose BoTTA, a benchmark designed to evaluate TTA methods under practical constraints on mobile and edge devices. Our evaluation targets four key challenges caused by limited resources and usage conditions: (i) limited test samples, (ii) limited exposure to categories, (iii) diverse distribution shifts, and (iv) overlapping shifts within a sample. We assess state-of-the-art TTA methods under these scenarios using benchmark datasets and report system-level metrics on a real testbed. Furthermore, unlike prior work, we align with on-device requirements by advocating periodic adaptation instead of continuous inference-time adaptation. Experiments reveal key insights: many recent TTA algorithms struggle with small datasets, fail to generalize to unseen categories, and depend on the diversity and complexity of distribution shifts. BoTTA also reports device-specific resource use. For example, while SHOT improves accuracy by $2.25\times$ with $512$ adaptation samples, it uses $1.08\times$ peak memory on Raspberry Pi versus the base model. BoTTA offers actionable guidance for TTA in real-world, resource-constrained deployments.

arxiv情報

著者	Michal Danilowski,Soumyajit Chatterjee,Abhirup Ghosh
発行日	2025-04-16 13:16:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BoTTA: Benchmarking on-device Test Time Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー