Hidden Entity Detection from GitHub Leveraging Large Language Models

要約

固有表現の認識は、非構造化データソースからナレッジベースを構築する場合の重要なタスクです。
エンティティ検出方法は主に広範なトレーニングデータに依存するのに対し、大規模言語モデル (LLM) は、LLM が取得した機能を利用することにより、ゼロショット学習 (ZSL) または少数ショット学習 (FSL) に依存するアプローチへの道を切り開きました。
事前トレーニング。
特に、大規模なトレーニングデータが利用できない非常に特殊なシナリオでは、ZSL / FSL によって新たな機会が開かれます。
このペーパーでは、この最近の傾向に従い、そのようなシナリオで大規模言語モデル (LLM) を活用して、GitHub リポジトリのテキストコンテンツ内のデータセットとソフトウェアを自動的に検出する可能性を調査します。
既存の手法は名前付きエンティティのみに焦点を当てていましたが、この研究はエンティティが URL によっても表されるリポジトリやオンラインハブなどのリソースを組み込むことで範囲を広げることを目的としています。
この研究では、リポジトリテキスト内のデータセットとソフトウェアの言及を識別する LLM の能力を強化するための、さまざまな FSL プロンプト学習アプローチを調査しています。
このペーパーでは、LLM の有効性と学習戦略の分析を通じて、自動エンティティ検出のための高度な言語モデルの可能性についての洞察を提供します。

要約(オリジナル)

Named entity recognition is an important task when constructing knowledge bases from unstructured data sources. Whereas entity detection methods mostly rely on extensive training data, Large Language Models (LLMs) have paved the way towards approaches that rely on zero-shot learning (ZSL) or few-shot learning (FSL) by taking advantage of the capabilities LLMs acquired during pretraining. Specifically, in very specialized scenarios where large-scale training data is not available, ZSL / FSL opens new opportunities. This paper follows this recent trend and investigates the potential of leveraging Large Language Models (LLMs) in such scenarios to automatically detect datasets and software within textual content from GitHub repositories. While existing methods focused solely on named entities, this study aims to broaden the scope by incorporating resources such as repositories and online hubs where entities are also represented by URLs. The study explores different FSL prompt learning approaches to enhance the LLMs’ ability to identify dataset and software mentions within repository texts. Through analyses of LLM effectiveness and learning strategies, this paper offers insights into the potential of advanced language models for automated entity detection.

arxiv情報

著者	Lu Gan,Martin Blum,Danilo Dessi,Brigitte Mathiak,Ralf Schenkel,Stefan Dietze
発行日	2025-01-08 12:18:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hidden Entity Detection from GitHub Leveraging Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー