Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion

要約

従来の人間活動認識 (HAR) は、離散的な活動クラスを予測するように訓練された分類器に依存しており、本質的に認識をトレーニングセットに明示的に存在する活動に限定しています。
このような分類器は、目に見えないアクティビティに遭遇すると、必ず失敗し、可能性はゼロになります。
私たちは、まず各アクティビティを自然言語に変換し、それを一連の基本動作に分解することで、この制限を克服するフレームワークである Open Vocabulary HAR (OV-HAR) を提案します。
この説明テキストは、固定サイズの埋め込みにエンコードされます。
モデルはこの埋め込みを回帰するようにトレーニングされ、その後、事前トレーニングされた埋め込み反転モデルを使用して自然言語にデコードされます。
自己回帰大規模言語モデル (LLM) に中核的に依存する他の研究とは異なり、OV-HAR は、そのようなモデルの計算オーバーヘッドなしでオープンな語彙認識を実現します。
生成されたテキストは、LLM プロンプトエンジニアリングを使用して単一のアクティビティクラスに変換できます。
私たちは、視覚（ポーズ）、IMU、圧力センサーなどのさまざまなモダリティに関するアプローチを評価し、目に見えないアクティビティやモダリティにわたる堅牢な一般化を実証し、現代の分類器とは根本的に異なるパラダイムを提供します。

要約(オリジナル)

Conventional human activity recognition (HAR) relies on classifiers trained to predict discrete activity classes, inherently limiting recognition to activities explicitly present in the training set. Such classifiers would invariably fail, putting zero likelihood, when encountering unseen activities. We propose Open Vocabulary HAR (OV-HAR), a framework that overcomes this limitation by first converting each activity into natural language and breaking it into a sequence of elementary motions. This descriptive text is then encoded into a fixed-size embedding. The model is trained to regress this embedding, which is subsequently decoded back into natural language using a pre-trained embedding inversion model. Unlike other works that rely on auto-regressive large language models (LLMs) at their core, OV-HAR achieves open vocabulary recognition without the computational overhead of such models. The generated text can be transformed into a single activity class using LLM prompt engineering. We have evaluated our approach on different modalities, including vision (pose), IMU, and pressure sensors, demonstrating robust generalization across unseen activities and modalities, offering a fundamentally different paradigm from contemporary classifiers.

arxiv情報

著者	Lala Shakti Swarup Ray,Bo Zhou,Sungho Suh,Paul Lukowicz
発行日	2025-01-13 15:24:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Initial Findings on Sensor based Open Vocabulary Activity Recognition via Text Embedding Inversion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー