S2vNTM: Semi-supervised vMF Neural Topic Modeling

要約

言語モデルに基づく方法は、テキスト分類のための強力な技術です。
ただし、このモデルにはいくつかの欠点があります。
（１）キーワードなどの人知の統合が難しい。
(2) モデルのトレーニングには多くのリソースが必要です。
(3) 事前トレーニングには大きなテキストデータに依存していました。
この論文では、これらの問題を克服するために、半教師あり vMF ニューラルトピックモデリング (S2vNTM) を提案します。
S2vNTM は、トピックの入力としていくつかのシードキーワードを受け取ります。
S2vNTM は、キーワードのパターンを利用して潜在的なトピックを特定し、トピックのキーワードセットの品質を最適化します。
S2vNTM は、さまざまなデータセットにわたって、提供されるキーワードが限られている場合でも、分類精度において既存の半教師ありトピックモデリング手法を上回ります。
S2vNTM はベースラインの少なくとも 2 倍高速です。

要約(オリジナル)

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the models. (3) It relied on large text data to pretrain. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics’ keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines.

arxiv情報

著者	Weijie Xu,Jay Desai,Srinivasan Sengamedu,Xiaoyu Jiang,Francis Iannacci
発行日	2024-02-08 11:09:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

S2vNTM: Semi-supervised vMF Neural Topic Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー