Abstract
In this thesis, we model the world through objects and their relations, representing objects as embeddings (that are vectors) in a vector space. This method allows us to express both properties of the objects and relations between them algebraically.We are interested in investigating the mathematical relations, specifically additive ones, between pairs of vectors that represent entities known to share a particular relationship, such as male-female versions of nouns. To this end, we introduce two methods: (1) Correlation-based Compositionality Detection, which measures the correlation between known attributes of objects and their embeddings, and (2) Additive Compositionality Detection, which decomposes embeddings into an additive combination of individual vectors representing specific attributes.
We find that word embeddings can be interpreted as partly composed of semantic and morphological information, and that sentence embeddings can be partly interpreted as the sum of individual word embeddings. Similarly, graph embeddings in recommender systems reflect the sum of a user’s demographic attributes. Our methods offer improvements over previous approaches for decomposing embeddings by (1) being more general, as they can be applied across multiple embedding types, (2) providing quantitative insights into the decomposition process, and (3) offering a statistically robust metric for evaluating the additive compositional structure of embeddings.
Since these properties are not explicitly learned during training, there is a risk that sensitive personal information could be inferred from user behaviour, potentially leading to bias. Our compositionality detection method introduces new quantitative approaches for detecting sensitive information at both the group and individual levels. To mitigate this issue, we developed EXTRACT: a suite of explainable and transparent methods designed to control bias in knowledge graph embeddings by assessing and reducing the implicit encoding of protected information.
We introduce a novel application of our methods to the field of music, modelling it through n-grams of melodies and their relationships. By analysing the properties encoded within the resulting musical n-gram embeddings, we are able to identify both the genre and composer in a compositional way. To address potential bias in our analysis, we remove the genre information, which is treated as an unwanted information in our setting.
Finally, we leverage the rich semantics of embeddings to uncover deeper, latent relations within the social network of medical influencers, specifically using embeddings to infer relations between users who share similar identities or engage with the same topics.
Date of Award | 4 Feb 2025 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Nello Cristianini (Supervisor), Edwin D. Simpson (Supervisor) & Martha Lewis (Supervisor) |