liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Sysoev, Oleg
Publications (10 of 21) Show all publications
Sysoev, O. & Burdakov, O. (2019). A smoothed monotonic regression via L2 regularization. Knowledge and Information Systems, 59(1), 197-218
Open this publication in new window or tab >>A smoothed monotonic regression via L2 regularization
2019 (English)In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 59, no 1, p. 197-218Article in journal (Refereed) Published
Abstract [en]

Monotonic regression is a standard method for extracting a monotone function from non-monotonic data, and it is used in many applications. However, a known drawback of this method is that its fitted response is a piecewise constant function, while practical response functions are often required to be continuous. The method proposed in this paper achieves monotonicity and smoothness of the regression by introducing an L2 regularization term. In order to achieve a low computational complexity and at the same time to provide a high predictive power of the method, we introduce a probabilistically motivated approach for selecting the regularization parameters. In addition, we present a technique for correcting inconsistencies on the boundary. We show that the complexity of the proposed method is O(n2). Our simulations demonstrate that when the data are large and the expected response is a complicated function (which is typical in machine learning applications) or when there is a change point in the response, the proposed method has a higher predictive power than many of the existing methods.

Place, publisher, year, edition, pages
Springer, 2019
Keywords
Monotonic regression, Kernel smoothing, Penalized regression, Probabilistic learning, Constrained optimization
National Category
Probability Theory and Statistics Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-147628 (URN)10.1007/s10115-018-1201-2 (DOI)000461390300008 ()
Available from: 2018-04-27 Created: 2018-04-27 Last updated: 2019-04-03Bibliographically approved
Sysoev, O., Bartoszek, K., Ekström, E.-C. & Ekström Selling, K. (2019). PSICA: Decision trees for probabilistic subgroup identification with categorical treatments. Statistics in Medicine, 38(22), 4436-4452
Open this publication in new window or tab >>PSICA: Decision trees for probabilistic subgroup identification with categorical treatments
2019 (English)In: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 38, no 22, p. 4436-4452Article in journal (Refereed) Published
Abstract [en]

Personalized medicine aims at identifying best treatments for a patient with given characteristics. It has been shown in the literature that these methods can lead to great improvements in medicine compared to traditional methods prescribing the same treatment to all patients. Subgroup identification is a branch of personalized medicine, which aims at finding subgroups of the patients with similar characteristics for which some of the investigated treatments have a better effect than the other treatments. A number of approaches based on decision trees have been proposed to identify such subgroups, but most of them focus on two‐arm trials (control/treatment) while a few methods consider quantitative treatments (defined by the dose). However, no subgroup identification method exists that can predict the best treatments in a scenario with a categorical set of treatments. We propose a novel method for subgroup identification in categorical treatment scenarios. This method outputs a decision tree showing the probabilities of a given treatment being the best for a given group of patients as well as labels showing the possible best treatments. The method is implemented in an R package psica available on CRAN. In addition to a simulation study, we present an analysis of a community‐based nutrition intervention trial that justifies the validity of our method.

Place, publisher, year, edition, pages
John Wiley & Sons, 2019
National Category
Computer Sciences Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-159305 (URN)10.1002/sim.8308 (DOI)000484974200020 ()31246349 (PubMedID)2-s2.0-85068189287 (Scopus ID)
Available from: 2019-08-06 Created: 2019-08-06 Last updated: 2020-01-29Bibliographically approved
Burdakov, O. & Sysoev, O. (2017). A Dual Active-Set Algorithm for Regularized Slope-Constrained Monotonic Regression. Iranian Journal of Operations Research, 8(2), 40-47
Open this publication in new window or tab >>A Dual Active-Set Algorithm for Regularized Slope-Constrained Monotonic Regression
2017 (English)In: Iranian Journal of Operations Research, ISSN 2008-1189, Vol. 8, no 2, p. 40-47Article in journal (Refereed) Published
Abstract [en]

In many problems, it is necessary to take into account monotonic relations. Monotonic (isotonic) Regression (MR) is often involved in solving such problems. The MR solutions are of a step-shaped form with a typical sharp change of values between adjacent steps. This, in some applications, is regarded as a disadvantage. We recently introduced a Smoothed MR (SMR) problem which is obtained from the MR by adding a regularization penalty term. The SMR is aimed at smoothing the aforementioned sharp change. Moreover, its solution has a far less pronounced step-structure, if at all available. The purpose of this paper is to further improve the SMR solution by getting rid of such a structure. This is achieved by introducing a lowed bound on the slope in the SMR. We call it Smoothed Slope-Constrained MR (SSCMR) problem. It is shown here how to reduce it to the SMR which is a convex quadratic optimization problem. The Smoothed Pool Adjacent Violators (SPAV) algorithm developed in our recent publications for solving the SMR problem is adapted here to solving the SSCMR problem. This algorithm belongs to the class of dual active-set algorithms. Although the complexity of the SPAV algorithm is o(n2) its running time is growing in our computational experiments almost linearly with n. We present numerical results which illustrate the predictive performance quality of our approach. They also show that the SSCMR solution is free of the undesirable features of the MR and SMR solutions.

Place, publisher, year, edition, pages
Tehran: , 2017
Keywords
Monotonic regression, Regularization, Quadratic penalty, Convex quadratic optimization, Dual active-set method, Large-scale optimization
National Category
Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-148061 (URN)10.29252/iors.8.2.40 (DOI)
Available from: 2018-05-29 Created: 2018-05-29 Last updated: 2018-06-07Bibliographically approved
Sysoev, O. & Burdakov, O. (2016). A Smoothed Monotonic Regression via L2 Regularization. Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>A Smoothed Monotonic Regression via L2 Regularization
2016 (English)Report (Other academic)
Abstract [en]

Monotonic Regression (MR) is a standard method for extracting a monotone function from non-monotonic data, and it is used in many applications. However, a known drawback of this method is that its fitted response is a piecewise constant function, while practical response functions are often required to be continuous. The method proposed in this paper achieves monotonicity and smoothness of the regression by introducing an L2 regularization term, and it is shown that the complexity of this method is O(n2). In addition, our simulations demonstrate that the proposed method normally has higher predictive power than some commonly used alternative methods, such as monotonic kernel smoothers. In contrast to these methods, our approach is probabilistically motivated and has connections to Bayesian modeling.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2016. p. 17
Series
LiTH-MAT-R, ISSN 0348-2960 ; 2016:01
Keywords
Monotonic regression, Kernel smoothing, Penalized regression, Bayesian modeling
National Category
Probability Theory and Statistics Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-125398 (URN)LiTH-MAT-R--2016/01--SE (ISRN)
Available from: 2016-02-22 Created: 2016-02-22 Last updated: 2016-09-26Bibliographically approved
Kalish, M. L., Dunn, J. C., Burdakov, O. P. & Sysoev, O. (2016). A statistical test of the equality of latent orders. Journal of mathematical psychology (Print), 70, 1-11, Article ID YJMPS2051.
Open this publication in new window or tab >>A statistical test of the equality of latent orders
2016 (English)In: Journal of mathematical psychology (Print), ISSN 0022-2496, E-ISSN 1096-0880, Vol. 70, p. 1-11, article id YJMPS2051Article in journal (Refereed) Published
Abstract [en]

It is sometimes the case that a theory proposes that the population means on two variables should have the same rank order across a set of experimental conditions. This paper presents a test of this hypothesis. The test statistic is based on the coupled monotonic regression algorithm developed by the authors. The significance of the test statistic is determined by comparison to an empirical distribution specific to each case, obtained via non-parametric or semi-parametric bootstrap. We present an analysis of the power and Type I error control of the test based on numerical simulation. Partial order constraints placed on the variables may sometimes be theoretically justified. These constraints are easily incorporated into the computation of the test statistic and are shown to have substantial effects on power. The test can be applied to any form of data, as long as an appropriate statistical model can be specified.

Place, publisher, year, edition, pages
Academic Press, 2016
Keywords
State-trace analysis, Monotonic regression, Hypothesis test
National Category
Other Mathematics
Identifiers
urn:nbn:se:liu:diva-122765 (URN)10.1016/j.jmp.2015.10.004 (DOI)000372686500001 ()
Note

free access is valid until January 8, 2016:

http://authors.elsevier.com/a/1S3XC53naPWGh

Funding agencies: Australian Research Council [0877510, 0878630, 110100751, 130101535]; National Science Foundation [1256959]; Linkoping University

Available from: 2015-11-21 Created: 2015-11-21 Last updated: 2017-12-01Bibliographically approved
Sysoev, O., Grimvall, A. & Burdakov, O. (2016). Bootstrap confidence intervals for large-scale multivariate monotonic regression problems. Communications in statistics. Simulation and computation, 45(3), 1025-1040
Open this publication in new window or tab >>Bootstrap confidence intervals for large-scale multivariate monotonic regression problems
2016 (English)In: Communications in statistics. Simulation and computation, ISSN 0361-0918, E-ISSN 1532-4141, Vol. 45, no 3, p. 1025-1040Article in journal (Refereed) Published
Abstract [en]

Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a million observations. Nevertheless, the computational burden can be prohibitively large for resampling techniques in which numerous datasets are processed independently of each other. Here, we present efficient algorithms for estimation of confidence limits in large-scale settings that take into account the similarity of the bootstrap or jackknifed datasets to which MR models are fitted. In addition, we introduce modifications that substantially improve the accuracy of MR solutions for binary response variables. The performance of our algorithms isillustrated using data on death in coronary heart disease for a large population. This example also illustrates that MR can be a valuable complement to logistic regression.

Place, publisher, year, edition, pages
Taylor & Francis, 2016
Keywords
Big data, Bootstrap, Confidence intervals, Monotonic regression, Pool- adjacent-violators algorithm
National Category
Probability Theory and Statistics Computational Mathematics
Identifiers
urn:nbn:se:liu:diva-85169 (URN)10.1080/03610918.2014.911899 (DOI)000372527900014 ()
Note

Vid tiden för disputation förelåg publikationen som manuskript

Available from: 2012-11-08 Created: 2012-11-08 Last updated: 2017-12-13
Burdakov, O. & Sysoev, O. (2016). Regularized monotonic regression. Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Regularized monotonic regression
2016 (English)Report (Other academic)
Abstract [en]

Monotonic (isotonic) Regression (MR) is a powerful tool used for solving a wide range of important applied problems. One of its features, which poses a limitation on its use in some areas, is that it produces a piecewise constant fitted response. For smoothing the fitted response, we introduce a regularization term in the MR formulated as a least distance problem with monotonicity constraints. The resulting Smoothed Monotonic Regrassion (SMR) is a convex quadratic optimization problem. We focus on the SMR, where the set of observations is completely (linearly) ordered. Our Smoothed Pool-Adjacent-Violators (SPAV) algorithm is designed for solving the SMR. It belongs to the class of dual activeset algorithms. We proved its finite convergence to the optimal solution in, at most, n iterations, where n is the problem size. One of its advantages is that the active set is progressively enlarging by including one or, typically, more constraints per iteration. This resulted in solving large-scale SMR test problems in a few iterations, whereas the size of that problems was prohibitively too large for the conventional quadratic optimization solvers. Although the complexity of the SPAV algorithm is O(n2), its running time was growing in our computational experiments in proportion to n1:16.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2016. p. 20
Series
LiTH-MAT-R, ISSN 0348-2960 ; 2016:02
Keywords
Monotonic regression, regularization, quadratic penalty, convex quadratic optimization, dual active-set method, large-scale optimization
National Category
Computational Mathematics Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-128117 (URN)LiTH-MAT-R--2016/02--SE (ISRN)
Available from: 2016-05-17 Created: 2016-05-17 Last updated: 2016-09-28Bibliographically approved
Sysoev, O., Grimvall, A. & Burdakov, O. (2013). Bootstrap estimation of the variance of the error term in monotonic regression models. Journal of Statistical Computation and Simulation, 83(4), 625-638
Open this publication in new window or tab >>Bootstrap estimation of the variance of the error term in monotonic regression models
2013 (English)In: Journal of Statistical Computation and Simulation, ISSN 0094-9655, E-ISSN 1563-5163, Vol. 83, no 4, p. 625-638Article in journal (Refereed) Published
Abstract [en]

The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the predicted response). However, other types of variance estimators are needed when using monotonic regression (MR) models, which are particularly suitable for estimating response functions with pronounced thresholds. Here, we propose a simple bootstrap estimator to compensate for the over-fitting that occurs when MR models are estimated from empirical data. Furthermore, we show that, in the case of one or two predictors, the performance of this estimator can be enhanced by introducing adjustment factors that take into account the slope of the response function and characteristics of the distribution of the explanatory variables. Extensive simulations show that our estimators perform satisfactorily for a great variety of monotonic functions and error distributions.

Place, publisher, year, edition, pages
Taylor & Francis Group, 2013
Keywords
uncertainty estimation; bootstrap; monotonic regression; pool-adjacent-violators algorithm
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-78858 (URN)10.1080/00949655.2011.631138 (DOI)000317276900003 ()
Available from: 2012-06-21 Created: 2012-06-21 Last updated: 2017-12-07
Sysoev, O., Burdakov, O. & Grimvall, A. (2011). A segmentation-based algorithm for large-scale partially ordered monotonic regression. Computational Statistics & Data Analysis, 55(8), 2463-2476
Open this publication in new window or tab >>A segmentation-based algorithm for large-scale partially ordered monotonic regression
2011 (English)In: Computational Statistics & Data Analysis, ISSN 0167-9473, E-ISSN 1872-7352, Vol. 55, no 8, p. 2463-2476Article in journal (Refereed) Published
Abstract [en]

Monotonic regression (MR) is an efficient tool for estimating functions that are monotonic with respect to input variables. A fast and highly accurate approximate algorithm called the GPAV was recently developed for efficient solving large-scale multivariate MR problems. When such problems are too large, the GPAV becomes too demanding in terms of computational time and memory. An approach, that extends the application area of the GPAV to encompass much larger MR problems, is presented. It is based on segmentation of a large-scale MR problem into a set of moderate-scale MR problems, each solved by the GPAV. The major contribution is the development of a computationally efficient strategy that produces a monotonic response using the local solutions. A theoretically motivated trend-following technique is introduced to ensure higher accuracy of the solution. The presented results of extensive simulations on very large data sets demonstrate the high efficiency of the new algorithm.

Place, publisher, year, edition, pages
Elsevier Science B.V., Amsterdam., 2011
Keywords
Quadratic programming, Large-scale optimization, Least distance problem, Monotonic regression, Partially ordered data set, Pool-adjacent-violators algorithm
National Category
Social Sciences
Identifiers
urn:nbn:se:liu:diva-69182 (URN)10.1016/j.csda.2011.03.001 (DOI)000291181000002 ()
Available from: 2011-06-17 Created: 2011-06-17 Last updated: 2017-12-11
Sysoev, O. (2010). Monotonic regression for large multivariate datasets. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Monotonic regression for large multivariate datasets
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Monoton regression för stora multivariata datamateriaI
Abstract [en]

Monotonic regression is a non-parametric statistical method that is designed especially for applications in which the expected value of a response variable increases or decreases in one or more explanatory variables. Such applications can be found in business, physics, biology, medicine, signal processing, and other areas. Inasmuch as many of the collected datasets can contain a very large number of multivariate observations, there is a strong need for efficient numerical algorithms. Here, we present new methods that make it feasible to fit monotonic functions to more than one hundred thousand data points. By simulation, we show that our algorithms have high accuracy and represent  considerable improvements with respect to computational time and memory requirements. In particular , we demonstrate how segmentation of a large-scale problem can greatly improve the performance of existing algorithms. Moreover, we show how the uncertainty of a monotonic regression model can be estimated. One of the procedures we developed can be employed to estimate the variance of the random error present in the observed response. Other procedures are based on resampling  techniques and can provide confidence intervals for the expected response at given levels of a set of predictors.

Abstract [sv]

Monoton regression är en icke-parametrisk statistisk metod som är utvecklad speciellt för tillämpningar i vilka det förväntade värdet aven responsvariabel ökar eller minskar med en eller flera förklaringsvariabler. Sådana tillämpningar finns inom företagsekonomi, fysik, biologi, medicin, signalbehandling och andra områden. Eftersom många insamlade datamaterial kan innehålla ett mycket stort antal multivariata observationer finns ett starkt behov av effektiva numeriska algoritmer. Här presenterar vi nya metoder som gör det möjligt att anpassa monotona funktioner till mer än 100000 datapunkter. Genom simulering visar vi. att våra algoritmer har hög noggrannhet och innebär betydande förbättringar med avseende på beräkningstid och krav på minnesutrymme. Speciellt visar vi hur segmentering av ett storskaligt problem starkt kan förbättra existerande algoritmer. Dessutom visar vi hur osäkerheten aven monoton regressions modell kan uppskattas. En av de metoder vi utvecklat kan användas för att uppskatta variansen för de slumpkomponenter som kan finnas i den observerade responsvariabeln. Andra metoder, baserade på s.k. återsampling, kan ge konfidensintervall för den förväntade responsen för givna värden på ett antal prediktorer.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. p. 75
Series
Linköping Studies in Statistics, ISSN 1651-1700 ; 11Linköping Studies in Arts and Science, ISSN 0282-9800 ; 514
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:liu:diva-65349 (URN)978-91-7393-412-1 (ISBN)
Public defence
2010-04-16, Glashuset, Building B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Available from: 2011-02-04 Created: 2011-02-04 Last updated: 2012-11-08Bibliographically approved
Organisations

Search in DiVA

Show all publications