Investigation of multivariate prediction methods for the analysis of biomarker data
Independent thesis Basic level (professional degree), 20 points / 30 hpStudent thesis
The paper describes predictive modelling of biomarker data stemming from patients suffering from multiple sclerosis. Improvements of multivariate analyses of the data are investigated with the goal of increasing the capability to assign samples to correct subgroups from the data alone.
The effects of different preceding scalings of the data are investigated and combinations of multivariate modelling methods and variable selection methods are evaluated. Attempts at merging the predictive capabilities of the method combinations through voting-procedures are made. A technique for improving the result of PLS-modelling, called bagging, is evaluated.
The best methods of multivariate analysis of the ones tried are found to be Partial least squares (PLS) and Support vector machines (SVM). It is concluded that the scaling have little effect on the prediction performance for most methods. The method combinations have interesting properties – the default variable selections of the multivariate methods are not always the best. Bagging improves performance, but at a high cost. No reasons for drastically changing the work flows of the biomarker data analysis are found, but slight improvements are possible. Further research is needed.
Place, publisher, year, edition, pages
Institutionen för fysik, kemi och biologi , 2006. , 56 p.
Multivariate analysis, multiple sclerosis, biomarker, predictive modeling, partial least squares, support vector machines, variable selection, bagging, neural networks
Bioinformatics (Computational Biology)
IdentifiersURN: urn:nbn:se:liu:diva-5889ISRN: LITH-IFM-EX--06/1556–-SEOAI: oai:DiVA.org:liu-5889DiVA: diva2:21549
Nilsson, KerstinSalter, Hugh