Topical: Learning Repository Embeddings from Source Code using Attention

要約

この論文では、リポジトリレベルの埋め込みのための新しいディープニューラルネットワークである Topical について説明します。
自然言語ドキュメントや単純な集計手法に依存する既存の方法は、Topical のアテンションメカニズムの利用により優れています。
このメカニズムは、ソースコード、完全な依存関係グラフ、およびスクリプトレベルのテキストデータからリポジトリレベルの表現を生成します。
公的にアクセス可能な GitHub リポジトリでトレーニングされた Topical は、リポジトリの自動タグ付けなどのタスクで複数のベースラインを上回り、従来の集計方法に対するアテンションメカニズムの有効性を強調しています。
Topical はスケーラビリティと効率性も実証しており、リポジトリレベルの表現計算に貴重な貢献を果たします。
さらに詳しい調査のために、付属のツール、コード、トレーニングデータセットが https://github.com/jpmorganchase/topical で提供されています。

要約(オリジナル)

This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical’s utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism’s efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.

arxiv情報

著者	Agathe Lherondelle,Varun Babbar,Yash Satsangi,Fran Silavong,Shaltiel Eloul,Sean Moran
発行日	2023-08-21 12:21:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Topical: Learning Repository Embeddings from Source Code using Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー