GraphFM: A Scalable Framework for Multi-Graph Pretraining

要約

グラフニューラルネットワークは通常、個別のデータセットでトレーニングされ、多くの場合、高度に特殊化されたモデルと広範なハイパーパラメーター調整が必要になります。
このデータセット固有のアプローチは、各グラフデータセットが固有のノード特徴と多様な接続構造を持っていることが多く、汎用的なモデルを構築することが困難であるために生じます。
これらの課題に対処するために、さまざまなドメインの多様なグラフデータセットにわたるノード分類タスクに特化して調整された、スケーラブルなマルチグラフマルチタスク事前トレーニングアプローチを導入します。
私たちの手法である Graph Foundation Model (GraphFM) は、学習された潜在トークンを使用する Perceiver ベースのエンコーダーを利用して、ドメイン固有の特徴を共通の潜在空間に圧縮します。
このアプローチにより、さまざまなグラフ間で一般化するモデルの機能が強化され、多様なデータ間でのスケーリングが可能になります。
私たちは、740 万以上のノードと 1 億 8,900 万のエッジで構成される 152 の異なるグラフデータセットでモデルをトレーニングすることにより、アプローチの有効性を実証し、多くのドメイン (例: 分子、引用、および
製品グラフ）。
私たちの結果は、多様な実グラフと合成グラフでの事前トレーニングにより、モデルの適応性と安定性が向上し、同時に最先端の専門モデルと競合するパフォーマンスを発揮することを示しています。
この研究は、マルチグラフ事前トレーニングが現在のグラフトレーニングパラダイムによって課せられる負担を大幅に軽減し、広範囲のデータセットとタスクにわたって競合的に実行する単一のジェネラリストモデルを作成することによって、グラフニューラルネットワークの分野の新しい機能を解放できることを示しています。

要約(オリジナル)

Graph neural networks are typically trained on individual datasets, often requiring highly specialized models and extensive hyperparameter tuning. This dataset-specific approach arises because each graph dataset often has unique node features and diverse connectivity structures, making it difficult to build a generalist model. To address these challenges, we introduce a scalable multi-graph multi-task pretraining approach specifically tailored for node classification tasks across diverse graph datasets from different domains. Our method, Graph Foundation Model (GraphFM), leverages a Perceiver-based encoder that employs learned latent tokens to compress domain-specific features into a common latent space. This approach enhances the model’s ability to generalize across different graphs and allows for scaling across diverse data. We demonstrate the efficacy of our approach by training a model on 152 different graph datasets comprising over 7.4 million nodes and 189 million edges, establishing the first set of scaling laws for multi-graph pretraining on datasets spanning many domains (e.g., molecules, citation and product graphs). Our results show that pretraining on a diverse array of real and synthetic graphs improves the model’s adaptability and stability, while performing competitively with state-of-the-art specialist models. This work illustrates that multi-graph pretraining can significantly reduce the burden imposed by the current graph training paradigm, unlocking new capabilities for the field of graph neural networks by creating a single generalist model that performs competitively across a wide range of datasets and tasks.

arxiv情報

著者	Divyansha Lachi,Mehdi Azabou,Vinam Arora,Eva Dyer
発行日	2024-07-16 16:51:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GraphFM: A Scalable Framework for Multi-Graph Pretraining

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー