Improving Contextualized Topic Models with Negative Sampling

要約

トピックモデリングは、大規模なドキュメントコレクションを調査するための主要な方法として登場しました。
トピックモデリングへの最近のアプローチでは、大規模な文脈化された言語モデルと変分オートエンコーダが使用されます。
この論文では、生成されたトピックの品質を改善するために、コンテキスト化されたトピックモデルのネガティブサンプリングメカニズムを提案します。
特に、モデルのトレーニング中に、生成されたドキュメントトピックベクトルを摂動し、トリプレットロスを使用して、正しいドキュメントトピックベクトルから再構築されたドキュメントが入力ドキュメントと類似し、摂動されたベクトルから再構築されたドキュメントとは異なるようにします。
公開されている 3 つのベンチマークデータセットでのさまざまなトピック数の実験では、ほとんどの場合、私たちのアプローチによって、ベースラインよりもトピックの一貫性が向上することが示されています。
私たちのモデルは、非常に高いトピックの多様性も達成しています。

要約(オリジナル)

Topic modeling has emerged as a dominant method for exploring large document collections. Recent approaches to topic modeling use large contextualized language models and variational autoencoders. In this paper, we propose a negative sampling mechanism for a contextualized topic model to improve the quality of the generated topics. In particular, during model training, we perturb the generated document-topic vector and use a triplet loss to encourage the document reconstructed from the correct document-topic vector to be similar to the input document and dissimilar to the document reconstructed from the perturbed vector. Experiments for different topic counts on three publicly available benchmark datasets show that in most cases, our approach leads to an increase in topic coherence over that of the baselines. Our model also achieves very high topic diversity.

arxiv情報

著者	Suman Adhya,Avishek Lahiri,Debarshi Kumar Sanyal,Partha Pratim Das
発行日	2023-03-27 07:28:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Contextualized Topic Models with Negative Sampling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー