Diversity-Aware Coherence Loss for Improving Neural Topic Models

要約

ニューラルトピックモデリングの標準的なアプローチでは、再構成損失に加えて、推定された事後推定値と事前推定値の間の KL 発散を共同で最小化する変分オートエンコーダー (VAE) フレームワークが使用されます。
ニューラルトピックモデルは個々の入力ドキュメントを再作成することによってトレーニングされるため、コーパスレベルでトピック単語間の一貫性を明示的に捕捉することはありません。
この研究では、トピック間の高い多様性を維持しながら、モデルがコーパスレベルのコヒーレンススコアを学習することを促進する、新しい多様性を意識したコヒーレンス損失を提案します。
複数のデータセットでの実験結果は、私たちの方法が事前トレーニングや追加のパラメーターを必要とせずにニューラルトピックモデルのパフォーマンスを大幅に向上させることを示しています。

要約(オリジナル)

The standard approach for neural topic modeling uses a variational autoencoder (VAE) framework that jointly minimizes the KL divergence between the estimated posterior and prior, in addition to the reconstruction loss. Since neural topic models are trained by recreating individual input documents, they do not explicitly capture the coherence between topic words on the corpus level. In this work, we propose a novel diversity-aware coherence loss that encourages the model to learn corpus-level coherence scores while maintaining a high diversity between topics. Experimental results on multiple datasets show that our method significantly improves the performance of neural topic models without requiring any pretraining or additional parameters.

arxiv情報

著者	Raymond Li,Felipe González-Pizarro,Linzi Xing,Gabriel Murray,Giuseppe Carenini
発行日	2023-05-26 09:59:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diversity-Aware Coherence Loss for Improving Neural Topic Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー