Rule extraction - the key to accurate and comprehensible data mining models
2004 (English)Licentiate thesis, monograph (Other academic)
The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques, such as neural networks, resulting in opaque models that are hard or impossible to interpret. For some domains this is unacceptable, since the model needs to be comprehensible. To achieve comprehensibility, accuracy is often sacrificed by using simpler models; a tradeoff termed the accuracy vs. comprehensibility tradeoff. In this thesis the tradeoff is studied in the context of data mining and decision support. The suggested solution is to transform high-accuracy opaque models into comprehensible models by applying rule extraction. This approach is contrasted with standard methods generating transparent models directly from the data set. Using a number of case studies, it is shown that the application of rule extraction generally results in higher accuracy and comprehensibility.
Although several rule extraction algorithms exist and there are well-established evaluation criteria (i.e. accuracy, comprehensibility, fidelity, scalability and generality), no existing algorithm meets all criteria. To counter this, a novel algorithm for rule extraction, named GREX (Genetic Rule EXtraction), is suggested. G-REX uses an extraction strategy based on genetic programming, where the fitness function directly measures the quality of the extracted model in terms of accuracy, fidelity and comprehensibility; thus making it possible to explicitly control the accuracy vs. comprehensibility tradeoff. To evaluate G-REX, experience is drawn from several case studies where G-REX has been used to extract rules from different opaque representations; e.g. neural networks, ensembles and boosted decision trees. The case studies fall into two categories; a data mining problem in the marketing domain which is extensively studied and several well-known benchmark problems. The results show that GREX, with its high flexibility regarding the choice of representation language and inherent ability to handle the accuracy vs. comprehensibility tradeoff, meets the proposed criteria well.
Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press , 2004. , 140 p.
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1095
IdentifiersURN: urn:nbn:se:liu:diva-42644Local ID: 67459ISBN: 91-7373-960-XOAI: oai:DiVA.org:liu-42644DiVA: diva2:263501
2004-06-11, Sal G100, Högskolan i Skövde, Skövde, 13:00 (Swedish)