Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

要約

タイトル：視覚障害者の安全性を考慮した品質に依存しない画像キャプション

要約：

– 視覚障害を持つユーザーが撮影した画像はノイズが多く、画像キャプションモデルの不正確な予測や安全性などに影響を与えます。
– 本論文では、視覚障害者のための画像キャプションモデルのパフォーマンスと堅牢性を向上させる品質に依存しないフレームワークを提案しています。
– データ、モデル、評価の3つの観点からこの問題に取り組みます。
– まず、合成ノイズを生成するためのデータ拡張技術が、このドメインにおけるデータの疎らさに対処する方法を示します。
– 次に、拡張されたデータを使用し、さまざまな一貫性損失を活用して、最新のモデルを二重ネットワークアーキテクチャに拡張することで、モデルの堅牢性を向上させます。
– 最後に、さまざまな難易度/ノイズレベルの画像に対して信頼性を評価し、安全性に関する状況においてより確実にパフォーマンスすることを示す信頼性キャリブレーションを行います。
– 改良されたモデルは、Royal National Institute of Blind Peopleとのパートナーシップで開発された支援生活アプリケーションの一部です。

要約(オリジナル)

Automated image captioning has the potential to be a useful tool for people with vision impairments. Images taken by this user group are often noisy, which leads to incorrect and even unsafe model predictions. In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people. We address this problem from three angles: data, model, and evaluation. First, we show how data augmentation techniques for generating synthetic noise can address data sparsity in this domain. Second, we enhance the robustness of the model by expanding a state-of-the-art model to a dual network architecture, using the augmented data and leveraging different consistency losses. Our results demonstrate increased performance, e.g. an absolute improvement of 2.15 on CIDEr, compared to state-of-the-art image captioning networks, as well as increased robustness to noise with up to 3 points improvement on CIDEr in more noisy settings. Finally, we evaluate the prediction reliability using confidence calibration on images with different difficulty/noise levels, showing that our models perform more reliably in safety-critical situations. The improved model is part of an assisted living application, which we develop in partnership with the Royal National Institute of Blind People.

arxiv情報

著者	Lu Yu,Malvina Nikandrou,Jiali Jin,Verena Rieser
発行日	2023-05-01 07:35:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー