Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

要約

教師なし構文解析に関する過去の作業は、記述された形式に限定されています。
この論文では、ラベル付けされていない発話文と対になっていないテキストデータが与えられた場合の、教師なし発話構成要素の解析に関する最初の研究を紹介します。
目標は、各ノードが構成要素に対応するオーディオのスパンになるように、構成要素解析ツリーの形式で発話された文の階層的な構文構造を決定することです。
2 つのアプローチを比較します。(1) 教師なし自動音声認識 (ASR) モデルと教師なしパーサーをカスケードして ASR トランスクリプトの解析ツリーを取得する方法と、(2) 教師なしパーサーを連続的な単語レベルの音声表現で直接トレーニングする方法です。
これは、最初に発話を単語レベルのセグメントのシーケンスに分割し、セグメント内の自己教師付き音声表現を集約してセグメントの埋め込みを取得することによって行われます。
ペアになっていないテキストでパーサーを個別にトレーニングし、それを推論のために ASR トランスクリプトに直接適用すると、教師なしの解析でより良い結果が得られることがわかりました。
さらに、私たちの結果は、正確なセグメンテーションだけで、話し言葉の文章を正確に解析するのに十分である可能性があることを示唆しています。
最後に、明示的な帰納的バイアスなしで、直接的なアプローチが頭の最初の言語と頭の最後の言語の両方の頭の方向性を正しく学習できることを示します。

要約(オリジナル)

Past work on unsupervised parsing is constrained to written form. In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. The goal is to determine the spoken sentences’ hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent. We compare two approaches: (1) cascading an unsupervised automatic speech recognition (ASR) model and an unsupervised parser to obtain parse trees on ASR transcripts, and (2) direct training an unsupervised parser on continuous word-level speech representations. This is done by first splitting utterances into sequences of word-level segments, and aggregating self-supervised speech representations within segments to obtain segment embeddings. We find that separately training a parser on the unpaired text and directly applying it on ASR transcripts for inference produces better results for unsupervised parsing. Additionally, our results suggest that accurate segmentation alone may be sufficient to parse spoken sentences accurately. Finally, we show the direct approach may learn head-directionality correctly for both head-initial and head-final languages without any explicit inductive bias.

arxiv情報

著者	Yuan Tseng,Cheng-I Lai,Hung-yi Lee
発行日	2023-03-15 17:57:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー