Telling Stories for Common Sense Zero-Shot Action Recognition

要約

ビデオの理解は、大規模なラベル付きデータセットへの依存によって長い間悩まされてきました。そのため、ゼロショット学習の研究が促進されています。
言語モデリングの最近の進歩により、ゼロショットビデオ分析を進める機会が生まれましたが、アクションクラスに関連する効果的な意味空間を構築することは依然として困難です。
私たちは、WikiHow の記事から抽出されたさまざまなアクションクラスの豊富なテキスト記述を含む、新しいデータセット Stories を導入することでこの問題に対処します。
クラスごとに、アクションを特徴付ける必要なステップ、シーン、オブジェクト、動詞を詳述する複数の文からなる物語を抽出します。
このコンテキストデータにより、アクション間の微妙な関係のモデリングが可能になり、ゼロショット転送への道が開かれます。
また、ストーリーを利用してゼロショット分類をトレーニングするための特徴生成を改善するアプローチも提案します。
ターゲットデータセットを微調整することなく、私たちの手法は複数のベンチマークで新しい最先端を達成し、トップ 1 の精度を最大 6.1% 向上させます。
私たちは、ストーリーがゼロショットアクション認識の進歩を促進する貴重なリソースを提供すると信じています。
テキストの物語は、目に見えるクラスと目に見えないクラスの間のつながりを築き、このエキサイティングな領域の進歩を長い間妨げてきたラベル付きデータのボトルネックを克服します。
データはここで見つけることができます: https://github.com/kini5gowda/Stories 。

要約(オリジナル)

Video understanding has long suffered from reliance on large labeled datasets, motivating research into zero-shot learning. Recent progress in language modeling presents opportunities to advance zero-shot video analysis, but constructing an effective semantic space relating action classes remains challenging. We address this by introducing a novel dataset, Stories, which contains rich textual descriptions for diverse action classes extracted from WikiHow articles. For each class, we extract multi-sentence narratives detailing the necessary steps, scenes, objects, and verbs that characterize the action. This contextual data enables modeling of nuanced relationships between actions, paving the way for zero-shot transfer. We also propose an approach that harnesses Stories to improve feature generation for training zero-shot classification. Without any target dataset fine-tuning, our method achieves new state-of-the-art on multiple benchmarks, improving top-1 accuracy by up to 6.1%. We believe Stories provides a valuable resource that can catalyze progress in zero-shot action recognition. The textual narratives forge connections between seen and unseen classes, overcoming the bottleneck of labeled data that has long impeded advancements in this exciting domain. The data can be found here: https://github.com/kini5gowda/Stories .

arxiv情報

著者	Shreyank N Gowda,Laura Sevilla-Lara
発行日	2024-10-23 16:25:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Telling Stories for Common Sense Zero-Shot Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー