Graph foundation models for relational data

July 12, 2025

Relational databases constitute the main bulk of enterprise data formats and power many prediction services across Google as well as other services people use every day, like content recommendation or traffic prediction. Most non-trivial applications employ multiple tables — in fact, some elaborate applications at Google might require maintaining hundreds of tables — and extracting an actionable value from such networks of tables is rather non-trivial. Traditional tabular machine learning (ML) methods (like decision trees) often struggle to fully leverage the connectivity structure of these relational schemas.

On the other hand, recent advances in ML offer a suite of tools to build graph neural networks (GNN) tailored for graph-structured data, where industry-relevant tasks can be framed as node classification (or regression) or graph-level predictions. However, most GNNs are fixed to a particular graph on which the model has been trained and cannot generalize to novel graphs with new nodes, edge types, features, and node labels. For example, a model trained on a large 100M-node citation graph benchmark can’t be re-used for your own graph (e.g., transactions between users and products) since the feature and label spaces are vastly different, so you’ll have to re-train the same model from scratch on your own data. While some initial attempts have demonstrated the viability of the concept in specific link prediction and node classification tasks, there has yet to be a generalist model that can learn meaningful representations across relational data and tackle all node-, link-, and graph-level prediction tasks.

Today, we explore the possibility of designing a single model that can excel on interconnected relational tables and at the same time generalize to any arbitrary set of tables, features, and tasks without additional training. We are excited to share our recent progress on developing such graph foundation models (GFM) that push the frontiers of graph learning and tabular ML well beyond standard baselines.