Distributed, MapReduce-Based Nearest Neighbor and E-Ball Kernel k-Means

Ioannis Pitas, Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

213 Downloads (Pure)

Abstract

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the
so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of
classic k-Means regarding the non linear separability of the input data. Kernel k-Means typically computes the kernel matrix, which contains the results of the kernel function for every possible sample combination. This matrix can be viewed as the weight matrix of a full graph, where the samples are the vertices and the
edges are weighed according to the similarity between the samples they connect, according to the kernel function. In this context, it is possible to work on the Nearest Neighbor graph, where each sample is only connected to some of its closest samples, or only using information from samples that are sufficiently close to each other, referred to as -ball. Doing so reduces the size of the kernel
matrix and can provide improved clustering results. In this paper, we present a MapReduce based distributed implementation of Nearest Neighbor and -ball Kernel k-Means.
Original languageEnglish
Title of host publication2015 IEEE Symposium Series on Computational Intelligence (SSCI 2015)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages509-515
Number of pages7
ISBN (Electronic)9781479975617
ISBN (Print)9781479975600
DOIs
Publication statusPublished - Jan 2016
Event2015 IEEE Symposium Series on Computational Intelligence - Cape Town, South Africa
Duration: 7 Dec 201510 Dec 2015

Conference

Conference2015 IEEE Symposium Series on Computational Intelligence
Abbreviated titleSSCI 2015
CountrySouth Africa
CityCape Town
Period7/12/1510/12/15

Fingerprint

Dive into the research topics of 'Distributed, MapReduce-Based Nearest Neighbor and E-Ball Kernel k-Means'. Together they form a unique fingerprint.

Cite this