liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Generalized PAV algorithm with block refinement for partially ordered monotonic regression
Linköping University, Department of Mathematics, Optimization . Linköping University, The Institute of Technology.ORCID iD: 0000-0003-1836-4200
Linköping University, Department of Computer and Information Science, Statistics. Linköping University, Faculty of Arts and Sciences.
Linköping University, Department of Computer and Information Science, Statistics. Linköping University, Faculty of Arts and Sciences.
2009 (English)In: Proceedings of the Workshop on Learning Monotone Models from Data / [ed] A. Feelders and R. Potharst, 2009, 23-37 p.Conference paper (Refereed)
Abstract [en]

In this paper, the monotonic regression problem (MR) is considered. We have recentlygeneralized for MR the well-known Pool-Adjacent-Voilators algorithm(PAV) from the case of completely to partially ordered data sets. Thenew algorithm, called GPAV, combines both high accuracy and lowcomputational complexity which grows quadratically with the problemsize. The actual growth observed in practice is typically far lowerthan quadratic. The fitted values of the exact MR solution composeblocks of equal values. Its GPAV approximation has also a blockstructure. We present here a technique for refining blocks produced bythe GPAV algorithm to make the new blocks more close to those in theexact solution. This substantially improves the accuracy of the GPAVsolution and does not deteriorate its computational complexity. Thecomputational time for the new technique is approximately triple thetime of running the GPAV algorithm. Its efficiency is demonstrated byresults of our numerical experiments.

Place, publisher, year, edition, pages
2009. 23-37 p.
Keyword [en]
Monotonic regression, Partially ordered data set, Pool-adjacent-violators algorithm, Quadratic programming, Large scale optimization, Least distance problem.
National Category
Computational Mathematics
URN: urn:nbn:se:liu:diva-52535OAI: diva2:283960
the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Bled, Slovenia, September 7-11, 2009
Available from: 2010-01-02 Created: 2010-01-02 Last updated: 2015-06-02
In thesis
1. Monotonic regression for large multivariate datasets
Open this publication in new window or tab >>Monotonic regression for large multivariate datasets
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Monoton regression för stora multivariata datamateriaI
Abstract [en]

Monotonic regression is a non-parametric statistical method that is designed especially for applications in which the expected value of a response variable increases or decreases in one or more explanatory variables. Such applications can be found in business, physics, biology, medicine, signal processing, and other areas. Inasmuch as many of the collected datasets can contain a very large number of multivariate observations, there is a strong need for efficient numerical algorithms. Here, we present new methods that make it feasible to fit monotonic functions to more than one hundred thousand data points. By simulation, we show that our algorithms have high accuracy and represent  considerable improvements with respect to computational time and memory requirements. In particular , we demonstrate how segmentation of a large-scale problem can greatly improve the performance of existing algorithms. Moreover, we show how the uncertainty of a monotonic regression model can be estimated. One of the procedures we developed can be employed to estimate the variance of the random error present in the observed response. Other procedures are based on resampling  techniques and can provide confidence intervals for the expected response at given levels of a set of predictors.

Abstract [sv]

Monoton regression är en icke-parametrisk statistisk metod som är utvecklad speciellt för tillämpningar i vilka det förväntade värdet aven responsvariabel ökar eller minskar med en eller flera förklaringsvariabler. Sådana tillämpningar finns inom företagsekonomi, fysik, biologi, medicin, signalbehandling och andra områden. Eftersom många insamlade datamaterial kan innehålla ett mycket stort antal multivariata observationer finns ett starkt behov av effektiva numeriska algoritmer. Här presenterar vi nya metoder som gör det möjligt att anpassa monotona funktioner till mer än 100000 datapunkter. Genom simulering visar vi. att våra algoritmer har hög noggrannhet och innebär betydande förbättringar med avseende på beräkningstid och krav på minnesutrymme. Speciellt visar vi hur segmentering av ett storskaligt problem starkt kan förbättra existerande algoritmer. Dessutom visar vi hur osäkerheten aven monoton regressions modell kan uppskattas. En av de metoder vi utvecklat kan användas för att uppskatta variansen för de slumpkomponenter som kan finnas i den observerade responsvariabeln. Andra metoder, baserade på s.k. återsampling, kan ge konfidensintervall för den förväntade responsen för givna värden på ett antal prediktorer.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. 75 p.
Linköping Studies in Statistics, ISSN 1651-1700 ; 11Linköping Studies in Arts and Science, ISSN 0282-9800 ; 514
National Category
Probability Theory and Statistics
urn:nbn:se:liu:diva-65349 (URN)978-91-7393-412-1 (ISBN)
Public defence
2010-04-16, Glashuset, Building B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Available from: 2011-02-04 Created: 2011-02-04 Last updated: 2012-11-08Bibliographically approved

Open Access in DiVA

No full text

Other links

Link to Article

Search in DiVA

By author/editor
Burdakov, OlegGrimvall, AndersSysoev, Oleg
By organisation
Optimization The Institute of TechnologyStatisticsFaculty of Arts and Sciences
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 427 hits
ReferencesLink to record
Permanent link

Direct link