Unsupervised Graph Neural Networks
: Training Strategies and Evaluation Principles

  • Will Leeney

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

Abstract

Attributed graphs are the fundamental data representation that captures the dual dimensionality of structural connectivity and feature information. The task of community detection to find collections of similar nodes within the graph. Identifying the latent communities in a network with associated features has many practical applications from misinformation detection in social networks to genomic feature discovery in genes. Graph Neural Networks (GNNs) can be trained to detect communities by learning an inductive model to represent a network in lower-dimensions. Unfortunately, we find that current comparisons of performance can be misleading due to the ambiguous experimental protocols including hyperparameter optimisation and model selection. In this Thesis, we make three key contributions to provide a foundation for better ways to evaluate and select models in this field. (1) We propose a framework for more reliable evaluations. The proposed framework is used to demonstrate the importance of hyperparameter optimisation for experimental comparisons of models. However, we find that performance rankings are subject to randomness that is not currently adequately quantified. Therefore, we define metrics for empirically quantifying the sensitivity of performance comparisons to randomness. It is shown that the randomness coefficient based on the Wasserstein distance provides the most accurate assessment of randomness for generalisation outside of the scope of an evaluation. (2) In the real-world, community detection is typically performed for applications where the ground-truth is unavailable. However, typical pipelines for GNN optimisation use labels to aid model selection, rendering these methods unusable. We show that it is possible to train GNNs with modularity to bypass a supervised optimisation in model selection or hyperparameter tuning. Modularity, which is an unsupervised metric, is shown to predict ground-truth performance of GNNs on a range of attributed graph dataset benchmarks. (3) To demonstrate that the framework can be extended to a new scenario for community detection, we show that it can be easily used when multiple clients are collaborating on learning. The situation, known as federated learning, where multiple clients collaborate on (unsupervised) learning with data remaining on device, is becoming more pervasive. The graph-structured data adds difficulty to federated learning as there is lost connectivity information between clients. Federated weight aggregation is non-trivial in unsupervised graph learning as it is unknown how much the supervisory signal affects the ability of the federation to collaborate in the graph setting. Therefore, we use the proposed framework to reliably quantify the extent to which a federated solution is able to effectively learn and collaborate beyond that of non-federated models. Overall, this Thesis establishes a guiding framework for training and evaluation procedures of unsupervised GNNs to ensure meaningful comparisons in single and multiple client setups.
Date of Award1 Oct 2024
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorRyan McConville (Supervisor) & Weiru Liu (Supervisor)

Keywords

  • GNNs
  • Clustering
  • Community Detection
  • Machine Learning
  • AI
  • artificial intelligence
  • Graphs

Cite this

'