Action-Agnostic Point-Level Supervision for Temporal Action Detection

要約

軽く注釈を付けたデータセットで正確なアクションインスタンスの検出を実現するために、一時的なアクション検出のためのアクション非依存ポイントレベル (AAPL) 監視を提案します。
提案されたスキームでは、ビデオフレームのごく一部が教師なしの方法でサンプリングされ、人間のアノテーターに提示され、人間のアノテーターがフレームにアクションカテゴリのラベルを付けます。
アノテーターがトリミングされていないビデオ内のすべてのアクションインスタンスを検索する必要があるポイントレベルの監視とは異なり、AAPL 監視における人間の介入なしに、アノテーションを付けるフレームが選択されます。
また、AAPL ラベルを有効に活用するための検出モデルと学習手法を提案します。
さまざまなデータセット (THUMOS ’14、FineAction、GTEA、BEOID、ActivityNet 1.3) に関する広範な実験により、提案されたアプローチが、トレードオフの観点から、ビデオレベルおよびポイントレベルの監視に関する従来の方法と競合するか、それを上回るパフォーマンスを発揮することが実証されました。
アノテーションのコストと検出パフォーマンスの間。

要約(オリジナル)

We propose action-agnostic point-level (AAPL) supervision for temporal action detection to achieve accurate action instance detection with a lightly annotated dataset. In the proposed scheme, a small portion of video frames is sampled in an unsupervised manner and presented to human annotators, who then label the frames with action categories. Unlike point-level supervision, which requires annotators to search for every action instance in an untrimmed video, frames to annotate are selected without human intervention in AAPL supervision. We also propose a detection model and learning method to effectively utilize the AAPL labels. Extensive experiments on the variety of datasets (THUMOS ’14, FineAction, GTEA, BEOID, and ActivityNet 1.3) demonstrate that the proposed approach is competitive with or outperforms prior methods for video-level and point-level supervision in terms of the trade-off between the annotation cost and detection performance.

arxiv情報

著者	Shuhei M. Yoshida,Takashi Shibata,Makoto Terao,Takayuki Okatani,Masashi Sugiyama
発行日	2024-12-30 18:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Action-Agnostic Point-Level Supervision for Temporal Action Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー