Clustering analysis is currently one of well-developed branches in data mining technology
which is supposed to find the hidden structures in the multidimensional space called feature or
pattern space. A datum in the space usually possesses a vector form and the elements in the vector
represent several specifically selected features. These features are often of efficiency to the
problem oriented. Generally, clustering analysis goes into two divisions: one is based on the
agglomerative clustering method, and the other one is based on divisive clustering method. The
former refers to a bottom-up process which regards each datum as a singleton cluster while the
latter refers to a top-down process which regards entire data as a cluster. As the collected
literatures, it is noted that the divisive clustering is currently overwhelming both in application and
research. Although some famous divisive clustering methods are designed and well developed,
clustering problems are still far from being solved. The k - means algorithm is the original divisive
clustering method which initially assigns some important index values, such as the clustering
number and the initial clustering prototype positions, and that could not be reasonable in some
certain occasions. More than the initial problem, the k - means algorithm may also falls into local
optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see
that seeking for a good or natural clustering result, in fact, originates from the one's understanding
of the concept of clustering. Thus, the confusion or misunderstanding of the definition of
clustering always derives some unsatisfied clustering results. One should consider the definition
deeply and seriously. This paper demonstrates the nature of clustering, gives the way of
understanding clustering, discusses the methodology of designing a clustering algorithm, and
proposes a new clustering method based on relation chains among 2D patterns. In this paper, a
new method called relation chain based clustering is presented. The given method demonstrates
that arbitrary distribution shape and density are not the essential factors for clustering research, in
another words, clusters described by some particular expressions should be considered as a
uniform mathematical description which is called "relation chain" emphasized in this paper. The
relation chain indicates the relation between each pair of the spatial points and gives the
evaluation of the connection between the pair-wise points. This relation chain based clustering
algorithm initially assigns the neighborhood evaluation radius of the points, then assesses the
clustering result based on inner-cluster variance of each cluster while increasing the radius,
adjusting the radius properly and finally gives the clustering result. Some experiments are
conducted using the proposed method and the hidden data structure is well explored.
|