Hierarchical clustering with dot products recovers hidden tree structure

Annie Gray, Alexander Modell, Patrick Rubin-Delanchy, Nick Whiteley

Research output: Contribution to conferenceConference Paperpeer-review

Abstract

In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model. The key technical innovations are to understand how hierarchical information in this model translates into tree geometry which can be recovered from data, and to characterise the benefits of simultaneously growing sample size and data dimension. We demonstrate superior tree recovery performance with real data over existing approaches such as UPGMA, Ward's method, and HDBSCAN.

Original languageEnglish
Publication statusPublished - 21 Sept 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Bibliographical note

Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'Hierarchical clustering with dot products recovers hidden tree structure'. Together they form a unique fingerprint.

Cite this