SICA: subjectively interesting component analysis

Bo Kang*, Jefrey Lijffijt, Raúl Santos-Rodríguez, Tijl de Bie

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

4 Citations (Scopus)
305 Downloads (Pure)

Abstract

The information in high-dimensional datasets is often too complex for human users to perceive directly. Hence, it may be helpful to use dimensionality reduction methods to construct lower dimensional representations that can be visualized. The natural question that arises is how do we construct a most informative low dimensional representation? We study this question from an information-theoretic perspective and introduce a new method for linear dimensionality reduction. The obtained model that quantifies the informativeness also allows us to flexibly account for prior knowledge a user may have about the data. This enables us to provide representations that are subjectively interesting. We title the method Subjectively Interesting Component Analysis (SICA) and expect it is mainly useful for iterative data mining. SICA is based on a model of a user’s belief state about the data. This belief state is used to search for surprising views. The initial state is chosen by the user (it may be empty up to the data format) and is updated automatically as the analysis progresses. We study several types of prior beliefs: if a user only knows the scale of the data, SICA yields the same cost function as Principal Component Analysis (PCA), while if a user expects the data to have outliers, we obtain a variant that we term t-PCA. Finally, scientifically more interesting variants are obtained when a user has more complicated beliefs, such as knowledge about similarities between data points. The experiments suggest that SICA enables users to find subjectively more interesting representations.

Original languageEnglish
Number of pages39
JournalData Mining and Knowledge Discovery
Early online date8 Mar 2018
DOIs
Publication statusE-pub ahead of print - 8 Mar 2018

Keywords

  • Dimensionality reduction
  • Exploratory data mining
  • FORSIED
  • Information theory
  • Subjective interestingness

Fingerprint Dive into the research topics of 'SICA: subjectively interesting component analysis'. Together they form a unique fingerprint.

Cite this