Identifying Semantic Induction Heads to Understand In-Context Learning

要約

大規模言語モデル (LLM) は顕著なパフォーマンスを示していますが、その推論ロジックに透明性が欠如しているため、その信頼性について懸念が生じています。
LLM をより深く理解するために、私たちはアテンションヘッドの動作の詳細な分析を実施し、LLM のコンテキスト内学習をより深く理解することを目指しています。
具体的には、アテンションヘッドが自然言語に存在するトークン間の 2 種類の関係 (文から解析された構文依存関係とナレッジグラフ内の関係) をエンコードしているかどうかを調査します。
特定の注意頭は、頭トークンに注意を向けると尾トークンを思い出し、それらの尾トークンの出力ロジットを増加させるパターンを示すことがわかりました。
さらに重要なことは、そのような意味誘導ヘッドの定式化は、言語モデルのコンテキスト内学習能力の出現と密接な相関関係があることです。
セマンティックアテンションヘッドの研究により、トランスフォーマーにおけるアテンションヘッドの複雑な動作についての理解が深まり、LLM のコンテキスト内学習についての新たな洞察がさらに得られます。

要約(オリジナル)

Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads encode two types of relationships between tokens present in natural languages: the syntactic dependency parsed from sentences and the relation within knowledge graphs. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens. More crucially, the formulation of such semantic induction heads has a close correlation with the emergence of the in-context learning ability of language models. The study of semantic attention heads advances our understanding of the intricate operations of attention heads in transformers, and further provides new insights into the in-context learning of LLMs.

arxiv情報

著者	Jie Ren,Qipeng Guo,Hang Yan,Dongrui Liu,Xipeng Qiu,Dahua Lin
発行日	2024-02-20 14:43:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Identifying Semantic Induction Heads to Understand In-Context Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー