Sign Language Video Retrieval with Free-Form Textual Queries

要約

手話技術の有用な応用として、手話ビデオのコレクションを効率的に検索できるシステムが注目されています。
ただし、個々のキーワードを超えてビデオを検索する問題は、文献ではあまり注目されていません。
このギャップに対処するために、この作業では、自由形式のテキストクエリを使用した手話検索のタスクを紹介します。記述されたクエリ (文など) と手話ビデオの大規模なコレクションが与えられた場合、目的は署名ビデオを見つけることです。
記述されたクエリに最もよく一致するコレクション内。
最近導入されたアメリカ手話 (ASL) の大規模な How2Sign データセットでクロスモーダル埋め込みを学習することにより、このタスクに取り組むことを提案します。
システムのパフォーマンスにおける重要なボトルネックは、ラベル付けされたトレーニングデータの不足に悩まされている標識ビデオの埋め込みの品質であることがわかりました。
したがって、利用可能なトレーニングデータの範囲と規模を拡大するために、サインスポッティングとフィーチャアラインメントの反復ラウンドをインターリーブするためのフレームワークである SPOT-ALIGN を提案します。
標識認識と提案されたビデオ検索タスクの両方の改善を通じて、堅牢な標識ビデオ埋め込みを学習するための SPOT-ALIGN の有効性を検証します。

要約(オリジナル)

Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with free-form textual queries: given a written query (e.g., a sentence) and a large collection of sign language videos, the objective is to find the signing video in the collection that best matches the written query. We propose to tackle this task by learning cross-modal embeddings on the recently introduced large-scale How2Sign dataset of American Sign Language (ASL). We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labeled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. We validate the effectiveness of SPOT-ALIGN for learning a robust sign video embedding through improvements in both sign recognition and the proposed video retrieval task.

arxiv情報

著者	Amanda Duarte,Samuel Albanie,Xavier Giró-i-Nieto,Gül Varol
発行日	2022-09-15 10:13:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sign Language Video Retrieval with Free-Form Textual Queries

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー