Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs

要約

NLPの一般的な使用は、従来のトピックモデルの使用から大規模な言語モデルにシフトすることで、大規模なドキュメントコレクションの理解を促進することです。
しかし、実際のアプリケーションでの大規模なコーパス理解にLLMを使用することの有効性は、未調査のままです。
この調査では、ユーザーが2つのデータセットで監督されていない、監視されたLLMベースの探索的アプローチまたは従来のトピックモデルで獲得する知識を測定します。
LLMベースの方法は、より人間の読み取り可能なトピックを生成し、データ探索の従来のモデルよりも高い平均WIN確率を示しますが、ユーザーがドキュメントについてあまり学習できないドメイン固有のデータセットの過度に一般的なトピックを作成します。
LLM生成プロセスに人間の監督を追加すると、幻覚と過剰な性能を緩和することにより、データ探索が改善されますが、より大きな人間の努力が必要です。
対照的に、伝統的です。
Latent Dirichlet Allocation（LDA）のようなモデルは、探索に効果的なままですが、使いやすいものではありません。
LLMSは、人間の助け、特にドメイン固有のデータ、およびコンテキストの長さの制約によるスケーリングと幻覚の制限に直面している大規模なコーパラの干し草の屋根を記述するのに苦労していることを示しています。
https：// huggingfaceで利用可能なデータセット。
CO/データセット/ZLI12321/請求書。

要約(オリジナル)

A common use of NLP is to facilitate the understanding of large document collections, with a shift from using traditional topic models to Large Language Models. Yet the effectiveness of using LLM for large corpus understanding in real-world applications remains under-explored. This study measures the knowledge users acquire with unsupervised, supervised LLM-based exploratory approaches or traditional topic models on two datasets. While LLM-based methods generate more human-readable topics and show higher average win probabilities than traditional models for data exploration, they produce overly generic topics for domain-specific datasets that do not easily allow users to learn much about the documents. Adding human supervision to the LLM generation process improves data exploration by mitigating hallucination and over-genericity but requires greater human effort. In contrast, traditional. models like Latent Dirichlet Allocation (LDA) remain effective for exploration but are less user-friendly. We show that LLMs struggle to describe the haystack of large corpora without human help, particularly domain-specific data, and face scaling and hallucination limitations due to context length constraints. Dataset available at https://huggingface. co/datasets/zli12321/Bills.

arxiv情報

著者	Zongxia Li,Lorena Calvo-Bartolomé,Alexander Hoyle,Paiheng Xu,Alden Dima,Juan Francisco Fung,Jordan Boyd-Graber
発行日	2025-02-20 17:19:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー