Translator Disclaimer
13 March 2003 Cluster structure evaluation of dyadic k-means for mining large image archives
Author Affiliations +
For many applications in data mining and knowledge discovery in databases, clustering methods are used for data reduction. If the amount of data increases like in image information mining, where one has to process GBytes of data, for instance, many of the existing clustering algorithms cannot be applied because of a high computational complexity. To overcome this disadvantage, we developed an efficient clustering algorithm called dyadic k-means. The algorithm is a modified and enhanced version of the traditional k-means. Whereas k-means has a computational complexity of O(nk) with n samples and k clusters, dyadic k-means has one of O(n \log k). Our algorithm is particularly efficient for the grouping of very large data sets with a high number of clusters. In this article we will present statistically-based methods for the objective evaluation of clusters obtained by dyadic k-means. The main focus is on how well the clusters describe the data point distribution in a multi-dimensional feature space and how much information can be obtained from the clusters. Both the filling of the feature space with samples and the characterization of this configuration with dyadic k-means produced clusters will be considered. We will use the well-established scatter matrices to measure the compactness and separability of clustered groups in the feature space. The probability of error, which is another indicator for the characterization of samples in the featuer space by clusters, will be calculated for each point, too. This probability delivers the relationship of each point to its cluster and can therefore be considered as a measurement of cluster reliability. We will test the evaluation methods both on a synthetic and a real world data set.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Herbert Daschiel and Mihai P. Datcu "Cluster structure evaluation of dyadic k-means for mining large image archives", Proc. SPIE 4885, Image and Signal Processing for Remote Sensing VIII, (13 March 2003);


Back to Top