ANALOGICAL — A New Benchmark for Analogy of Long Text for Large Language Models

要約

タイトル：ANALOGICAL — 大規模言語モデルにおける長文のアナロジーの新たなベンチマーク

要約：
– 過去10年間、単語レベルのアナロジーとして、word2vecなどの単語埋め込み手法の品質を評価するための固有尺度として、アナロジーが重要な役割を果たしてきた。
– しかしながら、現代の大規模言語モデル（LLM）は主にGLUEやSuperGLUEなどのベンチマークに基づいた外部尺度で評価されており、LLMが長文のアナロジーを引き出すことができるかどうかに関する調査はわずかしかない。
– 本論文では、6つの複雑度レベル（（i）単語、（ii）単語vs文、（iii）構文、（iv）否定、（v）帰結、および（vi）比喩）の長文アナロジーのタクソノミーに沿ってLLMを内在的に評価するための新しいベンチマークであるANALOGICALを提示する。
– 13個のデータセットと3つの異なる距離尺度を使用して、8つのLLMの能力を評価し、意味的ベクタースペース内のアナログペアを識別する能力を評価する（たとえば、「私は2つの言語を話せる」と「私はバイリンガルです」は近い距離にあるべきで、一方、「私はチョコレートが好き」と「私はチョコレートが好きではない」は直交すべきである）。
– 評価により、アナロジーのタクソノミーを上げると、LLMがアナロジーを識別することがますます困難になることがわかった。

要約(オリジナル)

Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity — (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space (e.g., ‘I can speak two languages’ should be closer to ‘I am bilingual’ while ‘I like chocolate’ and ‘I do not like chocolate’ should be orthogonal). Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.

arxiv情報

著者	Thilini Wijesiriwardene,Ruwan Wickramarachchi,Bimal G. Gajera,Shreeyash Mukul Gowaikar,Chandan Gupta,Aman Chadha,Aishwarya Naresh Reganti,Amit Sheth,Amitava Das
発行日	2023-05-08 21:12:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

ANALOGICAL — A New Benchmark for Analogy of Long Text for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー