TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

要約

大規模な言語モデルは、さまざまな自然言語処理タスクにわたって堅牢なパフォーマンスを示しました。
このレポートでは、NLP アプリケーションにおける固有表現認識 (NER) タスクや関係トリプル抽出 (RTE) タスクなど、特にナレッジグラフ構築タスクにおける大規模言語モデルの機能を強化するように設計されたプロジェクトである TechGPT-2.0 を紹介します。
さらに、中国のオープンソースモデルコミュニティ内で研究のためにアクセスできる LLM としても機能します。
2 つの 7B の大きな言語モデルの重みと、長いテキストの処理に特化した QLoRA 重みを提供します。特に、TechGPT-2.0 は Huawei の Ascend サーバーでトレーニングされています。
TechGPT-1.0 のすべての機能を継承しており、特に医学と法律の分野で堅牢なテキスト処理機能を発揮します。
さらに、モデルに新しい機能を導入し、地理的領域、交通、組織、文学作品、生物学、自然科学、天体、建築などのさまざまな領域のテキストを処理できるようにします。
これらの機能強化により、幻覚、答えられない質問、長いテキストの処理におけるモデルの熟練度も強化されました。
このレポートは、Ascend サーバーのデバッグ、命令の微調整データ処理、およびモデルのトレーニングの経験を含む、Huawei の Ascend サーバーの完全な微調整プロセスを包括的かつ詳細に紹介します。
私たちのコードは https://github.com/neukg/TechGPT-2.0 で入手できます。

要約(オリジナル)

Large language models have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of large language models specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B large language model weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei’s Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model’s adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei’s Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at https://github.com/neukg/TechGPT-2.0

arxiv情報

著者	Jiaqi Wang,Yuying Chang,Zhong Li,Ning An,Qi Ma,Lei Hei,Haibo Luo,Yifei Lu,Feiliang Ren
発行日	2024-01-09 11:52:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TechGPT-2.0: A large language model project to solve the task of knowledge graph construction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー