Sign Language Translation with Iterative Prototype

要約

この文書では、手話翻訳 (SLT) のためのシンプルかつ効果的なフレームワークである IP-SLT について説明します。
当社の IP-SLT は再帰構造を採用し、反復改良方式によって入力手話ビデオの意味表現 (プロトタイプ) を強化します。
私たちのアイデアは人間の読書の動作を模倣しており、正確な理解に達するまで文章を繰り返し消化することができます。
技術的には、IP-SLT は、特徴抽出、プロトタイプの初期化、および反復的なプロトタイプの改良で構成されます。
初期化モジュールは、特徴抽出モジュールによって抽出された視覚特徴に基づいて初期プロトタイプを生成する。
次に、反復改良モジュールはクロスアテンションメカニズムを利用して、元のビデオ機能と統合することで以前のプロトタイプを磨き上げます。
改良を繰り返すことで、プロトタイプは最終的により安定した正確な状態に収束し、流暢で適切な翻訳につながります。
さらに、プロトタイプの逐次依存性を活用するために、最終反復の知識を前の反復の知識に圧縮するための反復蒸留損失をさらに提案します。
自己回帰復号プロセスは推論中に 1 回だけ実行されるため、当社の IP-SLT は、許容可能なオーバーヘッドでさまざまな SLT システムを改善する準備ができています。
IP-SLT の有効性を実証するために、公開ベンチマークで広範な実験が行われています。

要約(オリジナル)

This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). Our IP-SLT adopts a recurrent structure and enhances the semantic representation (prototype) of the input sign language video via an iterative refinement manner. Our idea mimics the behavior of human reading, where a sentence can be digested repeatedly, till reaching accurate understanding. Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement. The initialization module generates the initial prototype based on the visual feature extracted by the feature extraction module. Then, the iterative refinement module leverages the cross-attention mechanism to polish the previous prototype by aggregating it with the original video feature. Through repeated refinement, the prototype finally converges to a more stable and accurate state, leading to a fluent and appropriate translation. In addition, to leverage the sequential dependence of prototypes, we further propose an iterative distillation loss to compress the knowledge of the final iteration into previous ones. As the autoregressive decoding process is executed only once in inference, our IP-SLT is ready to improve various SLT systems with acceptable overhead. Extensive experiments are conducted on public benchmarks to demonstrate the effectiveness of the IP-SLT.

arxiv情報

著者	Huijie Yao,Wengang Zhou,Hao Feng,Hezhen Hu,Hao Zhou,Houqiang Li
発行日	2023-08-23 15:27:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sign Language Translation with Iterative Prototype

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー