liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rule extraction - the key to accurate and comprehensible data mining models
Linköping University, Department of Computer and Information Science, MDALAB - Human Computer Interfaces. Linköping University, The Institute of Technology.
2004 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques, such as neural networks, resulting in opaque models that are hard or impossible to interpret. For some domains this is unacceptable, since the model needs to be comprehensible. To achieve comprehensibility, accuracy is often sacrificed by using simpler models; a tradeoff termed the accuracy vs. comprehensibility tradeoff. In this thesis the tradeoff is studied in the context of data mining and decision support. The suggested solution is to transform high-accuracy opaque models into comprehensible models by applying rule extraction. This approach is contrasted with standard methods generating transparent models directly from the data set. Using a number of case studies, it is shown that the application of rule extraction generally results in higher accuracy and comprehensibility.

Although several rule extraction algorithms exist and there are well-established evaluation criteria (i.e. accuracy, comprehensibility, fidelity, scalability and generality), no existing algorithm meets all criteria. To counter this, a novel algorithm for rule extraction, named GREX (Genetic Rule EXtraction), is suggested. G-REX uses an extraction strategy based on genetic programming, where the fitness function directly measures the quality of the extracted model in terms of accuracy, fidelity and comprehensibility; thus making it possible to explicitly control the accuracy vs. comprehensibility tradeoff. To evaluate G-REX, experience is drawn from several case studies where G-REX has been used to extract rules from different opaque representations; e.g. neural networks, ensembles and boosted decision trees. The case studies fall into two categories; a data mining problem in the marketing domain which is extensively studied and several well-known benchmark problems. The results show that GREX, with its high flexibility regarding the choice of representation language and inherent ability to handle the accuracy vs. comprehensibility tradeoff, meets the proposed criteria well. 

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press , 2004. , p. 140
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1095
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-42644Libris ID: 9518889Local ID: LiU-TEK-LIC-2004:24ISBN: 917373960X (print)OAI: oai:DiVA.org:liu-42644DiVA, id: diva2:263501
Presentation
2004-06-11, Sal G100, Högskolan i Skövde, Skövde, 13:00 (Swedish)
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2023-02-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Johansson, Ulf
By organisation
MDALAB - Human Computer InterfacesThe Institute of Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 272 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf