Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory

要約

ヒューマンオブジェクトインタラクション (HOI) 検出は、人間とオブジェクトの間の関係を位置特定し、推測することを目的としています。
おそらく、このタスクの教師ありモデルをゼロからトレーニングすることには、まれなクラスでのパフォーマンスの低下と、現実的な設定で複雑な HOI シーンの HOI のロングテール分布を処理するために必要な計算コストと時間がかかるため、課題が生じます。
この観察は、ロングテールのラベル付きデータでもトレーニングでき、事前トレーニングされたモデルからの既存の知識を活用できる HOI 検出器を設計する動機になりました。
分類と検索タスクにおける大規模な視覚言語モデル (VLM) の強力な一般化能力に触発され、概念誘導メモリ (ADA-CM) を備えた効率的な適応型 HOI 検出器を提案します。
ADA-CM には 2 つの動作モードがあります。
最初のモードでは、トレーニング不要のパラダイムで新しいパラメーターを学習することなく調整可能になります。
2 番目のモードには、軽量のパラメータセットを更新できる場合に、パフォーマンスをさらに効率的に向上できるインスタンス対応アダプタメカニズムが組み込まれています。
私たちが提案した手法は、HICO-DET および V-COCO データセットで最先端のトレーニング時間を大幅に短縮して、競合する結果を達成します。
コードは https://github.com/ltttpku/ADA-CM で見つけることができます。

要約(オリジナル)

Human Object Interaction (HOI) detection aims to localize and infer the relationships between a human and an object. Arguably, training supervised models for this task from scratch presents challenges due to the performance drop over rare classes and the high computational cost and time required to handle long-tailed distributions of HOIs in complex HOI scenes in realistic settings. This observation motivates us to design an HOI detector that can be trained even with long-tailed labeled data and can leverage existing knowledge from pre-trained models. Inspired by the powerful generalization ability of the large Vision-Language Models (VLM) on classification and retrieval tasks, we propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM). ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm. Its second mode incorporates an instance-aware adapter mechanism that can further efficiently boost performance if updating a lightweight set of parameters can be afforded. Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time. Code can be found at https://github.com/ltttpku/ADA-CM.

arxiv情報

著者	Ting Lei,Fabian Caba,Qingchao Chen,Hailin Jin,Yuxin Peng,Yang Liu
発行日	2023-09-07 13:10:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー