Genetic-Fuzzy Mining with Taxonomy
Chun-Hao Chen
Department of
Computer Science and
chchen@mail.tku.edu.tw
Tzung-Pei Hong
Department of
Computer Science and Information Engineering
National University of
Kaohsiung, Kaohsiung, 811,
Taiwan, R.O.C.
Department of Computer Science
and Engineering
National Sun Yat-sen
University, Kaohsiung, 804,
Taiwan, R.O.C.
tphong@nuk.edu.tw (corresponding author)
Yeong-Chyi Lee
Department of Information Management
Cheng Shiu University,
yeongchyi@csu.edu.tw
Abstract
Data mining is most commonly used in attempts to induce association
rules from transaction data. Since transactions in real-world applications
usually consist of quantitative values, many fuzzy association-rule mining
approaches have been proposed on single- or multiple-concept levels. However,
the given membership functions may have a critical influence on the final
mining results. In this paper, we propose a multiple-level genetic-fuzzy mining
algorithm for mining membership functions and fuzzy association rules using
multiple-concept levels. It first encodes the membership functions of each item
class (category) into a chromosome according to the given taxonomy. The fitness
value of each individual is then evaluated by the summation of large 1-itemsets
of each item in different concept levels and the suitability of membership
functions in the chromosome. After the GA process terminates, a better set of
multiple-level fuzzy association rules can then be expected with a more
suitable set of membership functions. Experimental results on a simulation
dataset also show the effectiveness of the algorithm.
Keywords: data mining, genetic algorithm,
multiple-concept levels, membership function, fuzzy association rule.
Description of the Experimental Datasets:
There were 64 purchased items (terminal
nodes) on level 3, 16 generalized items on level 2, and four generalized items
on level 1. Each non-terminal node had four branches, and only the terminal
nodes could appear in transactions. Data sets with different numbers of
transactions were run by the proposed algorithm. In the data set, the number of
purchased items in transactions was first randomly generated, and the purchased
items and their quantities in each transaction were then generated. An item
could not be generated twice in a transaction. In the experiments, totally five
datasets with different data sizes, including 10k, 30k, 50k, 70k, 90k, were
used to evaluate the proposed approach. The experiments were first made on the
dataset with 10k transactions.