On approximations and computations in probabilistic classification and in learning of graphical models
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Model based probabilistic classification is heavily used in data mining and machine learning. For computational learning these models may need approximation steps however. One popular approximation in classification is to model the class conditional densities by factorization, which in the independence case is usually called the ’Naïve Bayes’ classifier. In general probabilistic independence cannot model all distributions exactly, and not much has been published on how much a discrete distribution can differ from the independence assumption. In this dissertation the approximation quality of factorizations is analyzed in two articles.
A specific class of factorizations is the factorizations represented by graphical models. Several challenges arise from the use of statistical methods for learning graphical models from data. Examples of problems include the increase in the number of graphical model structures as a function of the number of nodes, and the equivalence of statistical models determined by different graphical models. In one article an algorithm for learning graphical models is presented. In the final article an algorithm for clustering parts of DNA strings is developed, and a graphical representation for the remaining DNA part is learned.
Place, publisher, year, edition, pages
Matematiska institutionen , 2007. , 22 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1141
Mathematical statistics, factorizations, probabilistic classification, nodes, DNA strings
Probability Theory and Statistics
IdentifiersURN: urn:nbn:se:liu:diva-11429ISBN: 978-91-85895-58-8OAI: oai:DiVA.org:liu-11429DiVA: diva2:17846
2007-12-14, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 10:15 (English)
Lozano, Jose Antonio, Professor
List of papers