liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
Linköping University, Department of Mathematics, Mathematical Statistics . Linköping University, The Institute of Technology.
2006 (English)In: Journal of Machine Learning Research, ISSN 1532-4435, Vol. 7, 2449-2480 p.Article in journal (Refereed) Published
Abstract [en]

In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for estimating the degradation in probability of correct classification when compared to the optimal classifier. An example of an approximation is the Na¨ıve Bayes classifier. We show that the performance of the Naïve Bayes depends on the degree of functional dependence between the features and labels. We provide a sufficient condition for zero loss of performance, too.

Place, publisher, year, edition, pages
2006. Vol. 7, 2449-2480 p.
Keyword [en]
Bayesian networks, na¨ıve Bayes, plug-in classifier, Kolmogorov distance of variation, variational learning
National Category
Mathematics
Identifiers
URN: urn:nbn:se:liu:diva-13104OAI: oai:DiVA.org:liu-13104DiVA: diva2:17842
Available from: 2008-03-31 Created: 2008-03-31
In thesis
1. On approximations and computations in probabilistic classification and in learning of graphical models
Open this publication in new window or tab >>On approximations and computations in probabilistic classification and in learning of graphical models
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Model based probabilistic classification is heavily used in data mining and machine learning. For computational learning these models may need approximation steps however. One popular approximation in classification is to model the class conditional densities by factorization, which in the independence case is usually called the ’Naïve Bayes’ classifier. In general probabilistic independence cannot model all distributions exactly, and not much has been published on how much a discrete distribution can differ from the independence assumption. In this dissertation the approximation quality of factorizations is analyzed in two articles.

A specific class of factorizations is the factorizations represented by graphical models. Several challenges arise from the use of statistical methods for learning graphical models from data. Examples of problems include the increase in the number of graphical model structures as a function of the number of nodes, and the equivalence of statistical models determined by different graphical models. In one article an algorithm for learning graphical models is presented. In the final article an algorithm for clustering parts of DNA strings is developed, and a graphical representation for the remaining DNA part is learned.

Place, publisher, year, edition, pages
Matematiska institutionen, 2007. 22 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1141
Keyword
Mathematical statistics, factorizations, probabilistic classification, nodes, DNA strings
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-11429 (URN)978-91-85895-58-8 (ISBN)
Public defence
2007-12-14, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 10:15 (English)
Opponent
Available from: 2008-03-31 Created: 2008-03-31 Last updated: 2012-11-21

Open Access in DiVA

No full text

Other links

Link to articleLink to Ph.D. thesis

Authority records BETA

Ekdahl, MagnusKoski, Timo

Search in DiVA

By author/editor
Ekdahl, MagnusKoski, Timo
By organisation
Mathematical Statistics The Institute of Technology
Mathematics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 458 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf