Paper
6 April 2000 Using closed itemsets in association rule mining with taxonomies
Petteri Sevon
Author Affiliations +
Abstract
Taxonomies or item hierarchies are often useful in association rule mining. The most time consuming subtask in rule mining is the discovery of the frequent itemsets. A standard algorithm for frequent itemset discovery, such as Apriori, would however produce many redundant itemsets if taxonomies are used, because any two itemsets that share their most specific items always match the same set of transactions. An efficient algorithm must therefore chose a canonical representative for each equivalence class of itemsets. Srikant and Agrawal solved the problem of redundancy in their Cumulate and Stratify algorithms by not allowing an itemset to contain an item and its ancestor. In this paper I will present a new algorithm, Closed Sets, for finding frequent itemsets in the presence of taxonomies. The algorithm requires itemsets to be closed under the taxonomies. If an item is a member of a closed itemset, then all its ancestors also are. The algorithm makes fewer passes over the database than Stratify and it prunes the search space optimally. This is not the case with Cumulate, and not even with Stratify, if there are items in the taxonomy with multiple parents. Furthermore, only modest modifications to Apriori are needed.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Petteri Sevon "Using closed itemsets in association rule mining with taxonomies", Proc. SPIE 4057, Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, (6 April 2000); https://doi.org/10.1117/12.381729
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Taxonomy

Databases

Mining

Knowledge discovery

Statistical analysis

Binary data

Chlorine

RELATED CONTENT

Scale-dependent spatial data mining
Proceedings of SPIE (December 02 2005)
An improved algorithm of a priori based on geostatistics
Proceedings of SPIE (December 29 2008)
Data modeling for data mining
Proceedings of SPIE (March 12 2002)
Feature transformations and structure of attributes
Proceedings of SPIE (March 12 2002)
Interactive mining of schema for semistructured data
Proceedings of SPIE (March 12 2002)

Back to Top