liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
TREE ENSEMBLES WITH RULE STRUCTURED HORSESHOE REGULARIZATION
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Arts and Sciences.
2018 (English)In: Annals of Applied Statistics, ISSN 1932-6157, E-ISSN 1941-7330, Vol. 12, no 4, p. 2379-2408Article in journal (Refereed) Published
Abstract [en]

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu [Ann. Appl. Stat. 2 (2008) 916-954] where rules from decision trees and linear terms are used in a Ll -regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictors while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in RuleFit with an additional set of trees from Random Forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and Random Forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.

Place, publisher, year, edition, pages
INST MATHEMATICAL STATISTICS , 2018. Vol. 12, no 4, p. 2379-2408
Keywords [en]
Nonlinear regression; classification; decision trees; Bayesian; prediction; MCMC; interpretation
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:liu:diva-153157DOI: 10.1214/18-AOAS1157ISI: 000450015900015OAI: oai:DiVA.org:liu-153157DiVA, id: diva2:1267333
Available from: 2018-12-01 Created: 2018-12-01 Last updated: 2018-12-01

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Nalenz, MalteVillani, Mattias
By organisation
Department of Computer and Information ScienceFaculty of Arts and SciencesThe Division of Statistics and Machine Learning
In the same journal
Annals of Applied Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 26 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf