Paper
18 April 2006 Processing heterogeneous XML data from multi-source
Tong Wang, Da-Xin Liu, Wei Sun, Xuanzuo Lin
Author Affiliations +
Abstract
Recently XML heterogeneity has become a new challenge. In this paper, a novel clustering strategy is proposed to regroup these heterogeneous XML sources, for searching in a relatively smaller space with certain similarity can reduce cost. The strategy consists of four steps. We at first extract features about paths and map them into High-dimension Vector Space (HDVS). In the data pre-process, two algorithms are applied to diminish the redundancies in XML sources. Then heterogeneous documents are clustered. Finally, Multivalued Dependency (MVD) is introduced, for MVD can be redefined according to the range of constraints of XML. This paper also proposes a novel algorithm that discovering minimal MVD, based on the rough set handling non-integrity data. It can solve the problem that non-integrity data of XML influence on finding the MVD of XML, thus patterns can be extracted from each cluster.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tong Wang, Da-Xin Liu, Wei Sun, and Xuanzuo Lin "Processing heterogeneous XML data from multi-source", Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420S (18 April 2006); https://doi.org/10.1117/12.666467
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Vector spaces

Algorithms

Data integration

Evolutionary algorithms

Genetic algorithms

Head

RELATED CONTENT

A data mining algorithm based on the rough sets theory...
Proceedings of SPIE (December 02 2005)
Immune algorithm for KDD
Proceedings of SPIE (September 25 2001)
Expert bidder for a pilot's monthly schedule
Proceedings of SPIE (January 01 1990)

Back to Top