TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices

要約

従来の機械学習モデルは、多くの場合、強力なハードウェアを必要とするため、リソースが限られたデバイスでの展開には適していません。タイニーマシンラーニング（tinyML）は、このようなデバイス上で機械学習モデルを実行するための有望なアプローチとして登場したが、複雑さ、待ち時間、消費電力の増加のため、複数のデータモダリティをタイニーマシンラーニングモデルに統合することは依然として課題である。本論文では、リソースに制約のあるtinyMLハードウェア上で展開可能な、視覚的質問応答タスクのための新しいマルチモーダル深層ニューラルネットワークであるTinyVQAを提案する。TinyVQAは、教師あり注意ベースのモデルを活用して、視覚と言語の両方のモダリティを使用して画像に関する質問に回答する方法を学習する。教師付き注意ベースのVQAモデルから抽出された知識は、メモリを意識したコンパクトなTinyVQAモデルを学習し、低ビット幅の量子化技術がtinyMLデバイスへの展開のためにモデルをさらに圧縮するために採用されている。TinyVQAモデルは、災害後の被害評価に使用されるFloodNetデータセットで評価された。コンパクトなモデルは79.5%の精度を達成し、実世界のアプリケーションにおけるTinyVQAの有効性を実証した。さらに、AIデッキとGAP8マイクロプロセッサーを搭載したCrazyflie 2.0ドローンにモデルを展開した。TinyVQAモデルは、56ミリ秒の低レイテンシを達成し、小型ドローンに搭載された際の消費電力は693mWであり、リソースに制約のある組込みシステムに適していることが示されました。

要約(オリジナル)

Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

arxiv情報

著者	Hasib-Al Rashid,Argho Sarkar,Aryya Gangopadhyay,Maryam Rahnemoonfar,Tinoosh Mohsenin
発行日	2024-04-04 16:38:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー