KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle

要約

最近では、ナレッジグラフ (KG) として利用できるデータがますます増えています。
このデータモデルは高度な推論とクエリをサポートしていますが、そのサイズと複雑さのため、依然としてマイニングが困難です。
グラフマイニングアプローチを使用して、KG からパターンを抽出できます。
ただし、これには 2 つの大きな問題があります。
まず、グラフマイニングアプローチでは、人間のアナリストが解釈するには多すぎるパターンが抽出される傾向があります (パターン爆発)。
第 2 に、実際の KG は、グラフマイニングで通常扱われるグラフとは異なる傾向があります。つまり、KG はマルチグラフであり、頂点次数はべき乗則に従う傾向があり、知識をモデル化する方法によって偽のパターンが生成される可能性があります。
最近、最小記述長 (MDL) 原理を使用して、パターン爆発の問題に取り組むために、GraphMDL+ という名前のグラフマイニングアプローチが提案されました。
ただし、GraphMDL+ は、他のグラフマイニングアプローチと同様、適応のない KG には適していません。
この論文では、KG-MDL を提案します。これは MDL 原理に基づくグラフパターンマイニングアプローチであり、KG が与えられると、人間サイズの説明的なグラフパターンのセットをパラメーターなしでいつでも生成できます。
我々は、中型の KG に関する実験について報告し、我々のアプローチにより、人間が解釈できるほど小さく、KG を説明するパターンのセットが生成されることを示しています。
抽出されたパターンが、データの関連特性、つまりデータの作成に使用されたスキーマとデータに含まれる具体的な事実の両方を強調していることを示します。
また、他のタイプのグラフデータとは対照的に、ナレッジグラフ上のマイニンググラフパターンに関連する問題についても説明します。

要約(オリジナル)

Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the Minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data.

arxiv情報

著者	Francesco Bariatti,Peggy Cellier,Sébastien Ferré
発行日	2023-09-22 14:52:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー