PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition

要約

人間の行動認識（HAR）は、深い学習モデルで印象的な結果を達成していますが、彼らの意思決定プロセスはブラックボックスの性質のために不透明のままです。
特に透明性と説明責任を必要とする実際のアプリケーションにとって、解釈可能性を確保することが重要です。
既存のビデオXaiメソッドは、主に機能の帰属または静的なテキストの概念に依存しており、どちらも動きのダイナミクスとアクション理解に不可欠な時間的依存関係を捉えるのに苦労しています。
これらの課題に対処するために、説明可能なアクション認識（PCBEAR）のポーズコンセプトボトルネックを提案します。これは、ビデオアクション認識のためのモーション認識で構造化された概念として人間のポーズシーケンスを導入する新しいコンセプトボトルネックフレームワークです。
ピクセルレベルの機能や静的なテキストの説明に基づいた方法とは異なり、PCBearは人間の骨格のポーズを活用します。これは、身体の動きのみに焦点を当て、モーションダイナミクスの堅牢で解釈可能な説明を提供します。
2種類のポーズベースの概念を定義します。個々のフレームでの空間構成の静的なポーズ概念と、複数のフレームにわたるモーションパターンの動的なポーズ概念です。
これらの概念を構築するために、PCBEARはクラスタリングをビデオポーズシーケンスに適用し、手動注釈なしで意味のある概念を自動的に発見できるようにします。
KTH、Penn-compse、およびHAA500でPCBearを検証し、解釈可能なモーション駆動型の説明を提供しながら、高い分類パフォーマンスを達成することを示しています。
私たちの方法は、モデルの推論プロセスに対する強力な予測パフォーマンスと人間に理解しやすい洞察の両方を提供し、モデルの動作をデバッグして改善するためのテスト時間介入を可能にします。

要約(オリジナル)

Human action recognition (HAR) has achieved impressive results with deep learning models, but their decision-making process remains opaque due to their black-box nature. Ensuring interpretability is crucial, especially for real-world applications requiring transparency and accountability. Existing video XAI methods primarily rely on feature attribution or static textual concepts, both of which struggle to capture motion dynamics and temporal dependencies essential for action understanding. To address these challenges, we propose Pose Concept Bottleneck for Explainable Action Recognition (PCBEAR), a novel concept bottleneck framework that introduces human pose sequences as motion-aware, structured concepts for video action recognition. Unlike methods based on pixel-level features or static textual descriptions, PCBEAR leverages human skeleton poses, which focus solely on body movements, providing robust and interpretable explanations of motion dynamics. We define two types of pose-based concepts: static pose concepts for spatial configurations at individual frames, and dynamic pose concepts for motion patterns across multiple frames. To construct these concepts, PCBEAR applies clustering to video pose sequences, allowing for automatic discovery of meaningful concepts without manual annotation. We validate PCBEAR on KTH, Penn-Action, and HAA500, showing that it achieves high classification performance while offering interpretable, motion-driven explanations. Our method provides both strong predictive performance and human-understandable insights into the model’s reasoning process, enabling test-time interventions for debugging and improving model behavior.

arxiv情報

著者	Jongseo Lee,Wooil Lee,Gyeong-Moon Park,Seong Tae Kim,Jinwoo Choi
発行日	2025-04-17 17:50:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー