AbstractGraphs occurring in the real world usually exhibit a high level of order and organisation: higher concentration of edges within the same group of vertices, and lower concentration among different groups. A common way to analyse these graphs is to partition the vertex set of a graph into clusters according to some connectivity measure. Graph clustering has been widely applied to many fields of computer science, from machine learning to bioinformatics and social network analysis.
The focus of this thesis is to design and analyse algorithms for partitioning graphs presenting a strong cluster-structure, which we call well-clustered. We first study the spectral properties of the Laplacian matrix of such graphs, and prove a structure theorem that relates the eigenvectors corresponding to the smallest eigenvalues of the Laplacian matrix of a graph to the structure of its clusters.
We then harness this theorem to analyse Spectral Clustering, arguably the most popular graph clustering algorithm. We give for the first time approximation guarantees on the number of misclassified vertices by Spectral Clustering when applied to well-clustered graphs.
Since Spectral Clustering needs to compute as many eigenvectors of the Laplacian matrix as the number of clusters in the graph, its performance deteriorates as this number grows. We present an algorithm that overcomes this issue without compromising its accuracy. This algorithm runs in time nearly linear in the number of the edges and independently of the number of clusters in the input graph.
Finally, we tackle the problem of partitioning a graph whose description is distributed among many sites. We present a distributed algorithm that works in a few synchronous rounds, requires limited communication complexity, and achieves the same guarantees of Spectral Clustering as long as the clusters are balanced in size.
|Date of Award
|25 Sept 2018
|He Sun (Supervisor) & Raphael Clifford (Supervisor)