liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Monotonic regression for large multivariate datasets
Linköping University, Department of Computer and Information Science. Linköping University, The Institute of Technology.
2010 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Monoton regression för stora multivariata datamateriaI (Swedish)
Abstract [en]

Monotonic regression is a non-parametric statistical method that is designed especially for applications in which the expected value of a response variable increases or decreases in one or more explanatory variables. Such applications can be found in business, physics, biology, medicine, signal processing, and other areas. Inasmuch as many of the collected datasets can contain a very large number of multivariate observations, there is a strong need for efficient numerical algorithms. Here, we present new methods that make it feasible to fit monotonic functions to more than one hundred thousand data points. By simulation, we show that our algorithms have high accuracy and represent  considerable improvements with respect to computational time and memory requirements. In particular , we demonstrate how segmentation of a large-scale problem can greatly improve the performance of existing algorithms. Moreover, we show how the uncertainty of a monotonic regression model can be estimated. One of the procedures we developed can be employed to estimate the variance of the random error present in the observed response. Other procedures are based on resampling  techniques and can provide confidence intervals for the expected response at given levels of a set of predictors.

Abstract [sv]

Monoton regression är en icke-parametrisk statistisk metod som är utvecklad speciellt för tillämpningar i vilka det förväntade värdet aven responsvariabel ökar eller minskar med en eller flera förklaringsvariabler. Sådana tillämpningar finns inom företagsekonomi, fysik, biologi, medicin, signalbehandling och andra områden. Eftersom många insamlade datamaterial kan innehålla ett mycket stort antal multivariata observationer finns ett starkt behov av effektiva numeriska algoritmer. Här presenterar vi nya metoder som gör det möjligt att anpassa monotona funktioner till mer än 100000 datapunkter. Genom simulering visar vi. att våra algoritmer har hög noggrannhet och innebär betydande förbättringar med avseende på beräkningstid och krav på minnesutrymme. Speciellt visar vi hur segmentering av ett storskaligt problem starkt kan förbättra existerande algoritmer. Dessutom visar vi hur osäkerheten aven monoton regressions modell kan uppskattas. En av de metoder vi utvecklat kan användas för att uppskatta variansen för de slumpkomponenter som kan finnas i den observerade responsvariabeln. Andra metoder, baserade på s.k. återsampling, kan ge konfidensintervall för den förväntade responsen för givna värden på ett antal prediktorer.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press , 2010. , 75 p.
Series
Linköping Studies in Statistics, ISSN 1651-1700 ; 11Linköping Studies in Arts and Science, ISSN 0282-9800 ; 514
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:liu:diva-65349ISBN: 978-91-7393-412-1 (print)OAI: oai:DiVA.org:liu-65349DiVA: diva2:395118
Public defence
2010-04-16, Glashuset, Building B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Available from: 2011-02-04 Created: 2011-02-04 Last updated: 2012-11-08Bibliographically approved
List of papers
1. An O(n2) algorithm for isotonic regression problems
Open this publication in new window or tab >>An O(n2) algorithm for isotonic regression problems
2006 (English)In: Large-Scale Nonlinear Optimization / [ed] G. Di Pillo and M. Roma, Springer-Verlag , 2006, 25-33 p.Chapter in book (Refereed)
Abstract [en]

Large-Scale Nonlinear Optimization reviews and discusses recent advances in the development of methods and algorithms for nonlinear optimization and its applications, focusing on the large-dimensional case, the current forefront of much research.

The chapters of the book, authored by some of the most active and well-known researchers in nonlinear optimization, give an updated overview of the field from different and complementary standpoints, including theoretical analysis, algorithmic development, implementation issues and applications

Place, publisher, year, edition, pages
Springer-Verlag, 2006
Series
Nonconvex Optimization and Its Applications, 83
Keyword
Quadratic programming, large scale optimization, least distance problem, isotonic regression, pool-adjacent-violators algorithm
National Category
Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-60581 (URN)978-0-387-30063-4 (ISBN)0-387-3-0065-1 (ISBN)
Available from: 2010-10-20 Created: 2010-10-20 Last updated: 2015-06-02Bibliographically approved
2. Data preordering in generalized PAV algorithm for monotonic regression
Open this publication in new window or tab >>Data preordering in generalized PAV algorithm for monotonic regression
2006 (English)In: Journal of Computational Mathematics, ISSN 0254-9409, Vol. 24, no 6, 771-790 p.Article in journal (Refereed) Published
Abstract [en]

Monotonic regression (MR) is a least distance problem with monotonicity constraints induced by a partially ordered data set of observations. In our recent publication [In Ser. {\sl Nonconvex Optimization and Its Applications}, Springer-Verlag, (2006) {\bf 83}, pp. 25-33], the Pool-Adjacent-Violators algorithm (PAV) was generalized from completely to partially ordered data sets (posets). The new algorithm, called GPAV, is characterized by the very low computational complexity, which is of second order in the number of observations. It treats the observations in a consecutive order, and it can follow any arbitrarily chosen topological order of the poset of observations. The GPAV algorithm produces a sufficiently accurate solution to the MR problem, but the accuracy depends on the chosen topological order. Here we prove that there exists a topological order for which the resulted GPAV solution is optimal. Furthermore, we present results of extensive numerical experiments, from which we draw conclusions about the most and the least preferable topological orders.

National Category
Mathematics
Identifiers
urn:nbn:se:liu:diva-36278 (URN)30826 (Local ID)30826 (Archive number)30826 (OAI)
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2015-06-02
3. Generalized PAV algorithm with block refinement for partially ordered monotonic regression
Open this publication in new window or tab >>Generalized PAV algorithm with block refinement for partially ordered monotonic regression
2009 (English)In: Proceedings of the Workshop on Learning Monotone Models from Data / [ed] A. Feelders and R. Potharst, 2009, 23-37 p.Conference paper, Published paper (Other academic)
Abstract [en]

In this paper, the monotonic regression problem (MR) is considered. We have recentlygeneralized for MR the well-known Pool-Adjacent-Voilators algorithm(PAV) from the case of completely to partially ordered data sets. Thenew algorithm, called GPAV, combines both high accuracy and lowcomputational complexity which grows quadratically with the problemsize. The actual growth observed in practice is typically far lowerthan quadratic. The fitted values of the exact MR solution composeblocks of equal values. Its GPAV approximation has also a blockstructure. We present here a technique for refining blocks produced bythe GPAV algorithm to make the new blocks more close to those in theexact solution. This substantially improves the accuracy of the GPAVsolution and does not deteriorate its computational complexity. Thecomputational time for the new technique is approximately triple thetime of running the GPAV algorithm. Its efficiency is demonstrated byresults of our numerical experiments.

Keyword
Monotonic regression, Partially ordered data set, Pool-adjacent-violators algorithm, Quadratic programming, Large scale optimization, Least distance problem.
National Category
Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-52535 (URN)
Conference
the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Bled, Slovenia, September 7-11, 2009
Available from: 2010-01-02 Created: 2010-01-02 Last updated: 2017-12-06
4. A segmentation-based algorithm for large-scale partially ordered monotonic regression
Open this publication in new window or tab >>A segmentation-based algorithm for large-scale partially ordered monotonic regression
2011 (English)In: Computational Statistics & Data Analysis, ISSN 0167-9473, E-ISSN 1872-7352, Vol. 55, no 8, 2463-2476 p.Article in journal (Refereed) Published
Abstract [en]

Monotonic regression (MR) is an efficient tool for estimating functions that are monotonic with respect to input variables. A fast and highly accurate approximate algorithm called the GPAV was recently developed for efficient solving large-scale multivariate MR problems. When such problems are too large, the GPAV becomes too demanding in terms of computational time and memory. An approach, that extends the application area of the GPAV to encompass much larger MR problems, is presented. It is based on segmentation of a large-scale MR problem into a set of moderate-scale MR problems, each solved by the GPAV. The major contribution is the development of a computationally efficient strategy that produces a monotonic response using the local solutions. A theoretically motivated trend-following technique is introduced to ensure higher accuracy of the solution. The presented results of extensive simulations on very large data sets demonstrate the high efficiency of the new algorithm.

Place, publisher, year, edition, pages
Elsevier Science B.V., Amsterdam., 2011
Keyword
Quadratic programming, Large-scale optimization, Least distance problem, Monotonic regression, Partially ordered data set, Pool-adjacent-violators algorithm
National Category
Social Sciences
Identifiers
urn:nbn:se:liu:diva-69182 (URN)10.1016/j.csda.2011.03.001 (DOI)000291181000002 ()
Available from: 2011-06-17 Created: 2011-06-17 Last updated: 2017-12-11
5. Bootstrap estimation of the variance of the error term in monotonic regression models
Open this publication in new window or tab >>Bootstrap estimation of the variance of the error term in monotonic regression models
2013 (English)In: Journal of Statistical Computation and Simulation, ISSN 0094-9655, E-ISSN 1563-5163, Vol. 83, no 4, 625-638 p.Article in journal (Refereed) Published
Abstract [en]

The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the predicted response). However, other types of variance estimators are needed when using monotonic regression (MR) models, which are particularly suitable for estimating response functions with pronounced thresholds. Here, we propose a simple bootstrap estimator to compensate for the over-fitting that occurs when MR models are estimated from empirical data. Furthermore, we show that, in the case of one or two predictors, the performance of this estimator can be enhanced by introducing adjustment factors that take into account the slope of the response function and characteristics of the distribution of the explanatory variables. Extensive simulations show that our estimators perform satisfactorily for a great variety of monotonic functions and error distributions.

Place, publisher, year, edition, pages
Taylor & Francis Group, 2013
Keyword
uncertainty estimation; bootstrap; monotonic regression; pool-adjacent-violators algorithm
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-78858 (URN)10.1080/00949655.2011.631138 (DOI)000317276900003 ()
Available from: 2012-06-21 Created: 2012-06-21 Last updated: 2017-12-07
6. Bootstrap confidence intervals for large-scale multivariate monotonic regression problems
Open this publication in new window or tab >>Bootstrap confidence intervals for large-scale multivariate monotonic regression problems
2016 (English)In: Communications in statistics. Simulation and computation, ISSN 0361-0918, E-ISSN 1532-4141, Vol. 45, no 3, 1025-1040 p.Article in journal (Refereed) Published
Abstract [en]

Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a million observations. Nevertheless, the computational burden can be prohibitively large for resampling techniques in which numerous datasets are processed independently of each other. Here, we present efficient algorithms for estimation of confidence limits in large-scale settings that take into account the similarity of the bootstrap or jackknifed datasets to which MR models are fitted. In addition, we introduce modifications that substantially improve the accuracy of MR solutions for binary response variables. The performance of our algorithms isillustrated using data on death in coronary heart disease for a large population. This example also illustrates that MR can be a valuable complement to logistic regression.

Place, publisher, year, edition, pages
Taylor & Francis, 2016
Keyword
Big data, Bootstrap, Confidence intervals, Monotonic regression, Pool- adjacent-violators algorithm
National Category
Probability Theory and Statistics Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-85169 (URN)10.1080/03610918.2014.911899 (DOI)000372527900014 ()
Note

Vid tiden för disputation förelåg publikationen som manuskript

Available from: 2012-11-08 Created: 2012-11-08 Last updated: 2017-12-07

Open Access in DiVA

No full text

Authority records BETA

Sysoev, Oleg

Search in DiVA

By author/editor
Sysoev, Oleg
By organisation
Department of Computer and Information ScienceThe Institute of Technology
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 220 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf