Experimental Datasets

Genetic-Fuzzy Mining with Taxonomy

Chun-Hao Chen

Department of Computer Science and Information Engineering
Tamkang University, Taipei, 251, Taiwan, R.O.C.
chchen@mail.tku.edu.tw

Tzung-Pei Hong

Department of Computer Science and Information Engineering
National University of Kaohsiung, Kaohsiung, 811, Taiwan, R.O.C.
Department of Computer Science and Engineering
National Sun Yat-sen University, Kaohsiung, 804, Taiwan, R.O.C.
tphong@nuk.edu.tw (corresponding author)

Yeong-Chyi Lee

Department of Information Management
Cheng Shiu University, Kaohsiung, Taiwan, R. O. C.
yeongchyi@csu.edu.tw

Abstract

Data mining is most commonly used in attempts to induce association rules from transaction data. Since transactions in real-world applications usually consist of quantitative values, many fuzzy association-rule mining approaches have been proposed on single- or multiple-concept levels. However, the given membership functions may have a critical influence on the final mining results. In this paper, we propose a multiple-level genetic-fuzzy mining algorithm for mining membership functions and fuzzy association rules using multiple-concept levels. It first encodes the membership functions of each item class (category) into a chromosome according to the given taxonomy. The fitness value of each individual is then evaluated by the summation of large 1-itemsets of each item in different concept levels and the suitability of membership functions in the chromosome. After the GA process terminates, a better set of multiple-level fuzzy association rules can then be expected with a more suitable set of membership functions. Experimental results on a simulation dataset also show the effectiveness of the algorithm.

Keywords: data mining, genetic algorithm, multiple-concept levels, membership function, fuzzy association rule.

Description of the Experimental Datasets:

There were 64 purchased items (terminal nodes) on level 3, 16 generalized items on level 2, and four generalized items on level 1. Each non-terminal node had four branches, and only the terminal nodes could appear in transactions. Data sets with different numbers of transactions were run by the proposed algorithm. In the data set, the number of purchased items in transactions was first randomly generated, and the purchased items and their quantities in each transaction were then generated. An item could not be generated twice in a transaction. In the experiments, totally five datasets with different data sizes, including 10k, 30k, 50k, 70k, 90k, were used to evaluate the proposed approach. The experiments were first made on the dataset with 10k transactions.

Download Dataset