Effective dimensionality of large-scale expression data using principal component analysis
2002 (English)In: Biosystems (Amsterdam. Print), ISSN 0303-2647, Vol. 65, no 2-3, 147-156 p.Article in journal (Refereed) Published
Large-scale expression data are today measured for thousands of genes simultaneously. This development is followed by an exploration of theoretical tools to get as much information out of these data as possible. One line is to try to extract the underlying regulatory network. The models used thus far, however, contain many parameters, and a careful investigation is necessary in order not to over-fit the models. We employ principal component analysis to show how, in the context of linear additive models, one can get a rough estimate of the effective dimensionality (the number of information-carrying dimensions) of large-scale gene expression datasets. We treat both the lack of independence of different measurements in a time series and the fact that that measurements are subject to some level of noise, both of which reduce the effective dimensionality and thereby constrain the complexity of models which can be built from the data. Copyright © 2002 Elsevier Science Ireland Ltd.
Place, publisher, year, edition, pages
2002. Vol. 65, no 2-3, 147-156 p.
Data reduction, Gene regulation, Genetic regulatory network, Noise effects, Principal component analysis
Engineering and Technology
IdentifiersURN: urn:nbn:se:liu:diva-46982DOI: 10.1016/S0303-2647(02)00011-4OAI: oai:DiVA.org:liu-46982DiVA: diva2:267878