Comparison of dimensionality reduction techniques for the visualisation of chemical space in organometallic catalysis

Mario Villares, Carla M Saunders*, Natalie Fey*

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

Abstract

We have used a Ligand Knowledge Base for bidentate P,P-donor ligands of potential interest to homogeneous catalysis to compare three dimensionality reduction techniques, namely Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE). While our previous work on Ligand Knowledge Bases has focused on PCA, here we compare this approach with more recently-published approaches and assess the information retention, visualization, clustering and interpretability which can be achieved for each approach. We find that potential advantages of t-SNE are not realized with a database of the current size (275 entries), and that there is a degree of complementarity between PCA and UMAP. The statistics underlying PCA rely on linear relationships, making interpretation of the resulting plots comparatively straightforward. Since much of chemistry relies on linear structure-property relationships and low-dimensional visualization, the explainability and information retention achieved is attractive. UMAP proved more challenging to interpret, but achieved clear clustering which was often chemically meaningful, and it would be a useful approach for ensuring that distinct subsets of compounds are sampled in a machine-learning context. This analysis also highlighted that the tunability of catalysis achieved through ligand exchange maps well onto some areas of chemical space where closely related ligands cluster, while others represent outliers; these arise from different combinations of steric and electronic effects which chemists will find intuitive.
Original languageEnglish
Article number100055
Number of pages10
JournalArtificial Intelligence Chemistry
Volume2
Issue number1
Early online date17 Feb 2024
DOIs
Publication statusPublished - 1 Jun 2024

Research Groups and Themes

  • Physical & Theoretical
  • Inorganic & Materials

Keywords

  • Computational chemistry
  • Organometallic catalysis
  • Data science
  • Dimensionality reduction
  • Transition metal complexes

Fingerprint

Dive into the research topics of 'Comparison of dimensionality reduction techniques for the visualisation of chemical space in organometallic catalysis'. Together they form a unique fingerprint.

Cite this