GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for Unsupervised Reverse Dictionary

要約

逆引き辞書 (RD) は、テキストの説明または辞書の定義から最も関連性の高い単語または単語のセットを取得するタスクです。
効果的な RD 手法は、アクセシビリティ、翻訳、またはライティングサポートシステムに応用できます。
さらに、NLP 研究では、RD は多くの場合、単語、定義、文の埋め込みを必要とするため、さまざまな粒度でテキストエンコーダーのベンチマークに使用されることがわかりました。
このペーパーでは、LLM を埋め込みモデルと組み合わせて活用する、RD へのシンプルなアプローチを提案します。
その単純さにもかかわらず、このアプローチは、十分に研究された RD データセットの教師ありベースラインよりも優れたパフォーマンスを示し、同時に過剰適合も少ないことが示されています。
また、さまざまな辞書に対して多数の実験を実施し、さまざまなスタイル、レジスタ、対象読者が RD システムの品質にどのような影響を与えるかを分析します。
私たちは、平均して、調整されていない埋め込みだけでは、LLM のみのベースラインをはるかに下回っていますが (高度な技術辞書では競争力がありますが)、組み合わせた方法でパフォーマンスを向上させるには重要であると結論付けています。

要約(オリジナル)

Reverse Dictionary (RD) is the task of obtaining the most relevant word or set of words given a textual description or dictionary definition. Effective RD methods have applications in accessibility, translation or writing support systems. Moreover, in NLP research we find RD to be used to benchmark text encoders at various granularities, as it often requires word, definition and sentence embeddings. In this paper, we propose a simple approach to RD that leverages LLMs in combination with embedding models. Despite its simplicity, this approach outperforms supervised baselines in well studied RD datasets, while also showing less over-fitting. We also conduct a number of experiments on different dictionaries and analyze how different styles, registers and target audiences impact the quality of RD systems. We conclude that, on average, untuned embeddings alone fare way below an LLM-only baseline (although they are competitive in highly technical dictionaries), but are crucial for boosting performance in combined methods.

arxiv情報

著者	Fatemah Almeman,Luis Espinosa-Anke
発行日	2024-12-09 16:54:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for Unsupervised Reverse Dictionary

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー