liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Gene Expression Prediction by Soft Integration and the Elastic Net: Best Performance of the DREAM3 Gene Expression Challenge
Linköping University, Department of Science and Technology, Communications and Transport Systems. Linköping University, The Institute of Technology. (Computational systems biology)
Linköping University, Department of Science and Technology, Communications and Transport Systems. Linköping University, The Institute of Technology.ORCID iD: 0000-0003-0528-9782
2010 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 5, no 2, e9134- p.Article in journal (Refereed) Published
Abstract [en]

Background: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance.

Methodology/Principal Findings: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the “elastic net”. Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance.

Conclusions/Significance: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.

Place, publisher, year, edition, pages
2010. Vol. 5, no 2, e9134- p.
Keyword [en]
elastic net
National Category
Bioinformatics and Systems Biology
URN: urn:nbn:se:liu:diva-54001DOI: 10.1371/journal.pone.0009134ISI: 000274590500002OAI: diva2:296804
Available from: 2010-02-23 Created: 2010-02-18 Last updated: 2014-10-06Bibliographically approved
In thesis
1. Gene networks from high-throughput data: Reverse engineering and analysis
Open this publication in new window or tab >>Gene networks from high-throughput data: Reverse engineering and analysis
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Experimental innovations starting in the 1990’s leading to the advent of high-throughput experiments in cellular biology have made it possible to measure thousands of genes simultaneously at a modest cost. This enables the discovery of new unexpected relationships between genes in addition to the possibility of falsify existing. To benefit as much as possible from these experiments the new inter disciplinary research field of systems biology have materialized. Systems biology goes beyond the conventional reductionist approach and aims at learning the whole system under the assumption that the system is greater than the sum of its parts. One emerging enterprise in systems biology is to use the high-throughput data to reverse engineer the web of gene regulatory interactions governing the cellular dynamics. This relatively new endeavor goes further than clustering genes with similar expression patterns and requires the separation of cause of gene expression from the effect. Despite the rapid data increase we then face the problem of having too few experiments to determine which regulations are active as the number of putative interactions has increased dramatic as the number of units in the system has increased. One possibility to overcome this problem is to impose more biologically motivated constraints. However, what is a biological fact or not is often not obvious and may be condition dependent. Moreover, investigations have suggested several statistical facts about gene regulatory networks, which motivate the development of new reverse engineering algorithms, relying on different model assumptions. As a result numerous new reverse engineering algorithms for gene regulatory networks has been proposed. As a consequent, there has grown an interest in the community to assess the performance of different attempts in fair trials on “real” biological problems. This resulted in the annually held DREAM conference which contains computational challenges that can be solved by the prosing researchers directly, and are evaluated by the chairs of the conference after the submission deadline.

This thesis contains the evolution of regularization schemes to reverse engineer gene networks from high-throughput data within the framework of ordinary differential equations. Furthermore, to understand gene networks a substantial part of it also concerns statistical analysis of gene networks. First, we reverse engineer a genome-wide regulatory network based solely on microarray data utilizing an extremely simple strategy assuming sparseness (LASSO). To validate and analyze this network we also develop some statistical tools. Then we present a refinement of the initial strategy which is the algorithm for which we achieved best performer at the DREAM2 conference. This strategy is further refined into a reverse engineering scheme which also can include external high-throughput data, which we confirm to be of relevance as we achieved best performer in the DREAM3 conference as well. Finally, the tools we developed to analyze stability and flexibility in linearized ordinary differential equations representing gene regulatory networks is further discussed.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. 36 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1301
National Category
Natural Sciences
urn:nbn:se:liu:diva-54089 (URN)978-91-7393-442-8 (ISBN)
Public defence
2010-03-26, K3, Kåkenshus, Campus Norrköping, Linköpings universitet, Norköping, 13:15 (English)
Available from: 2010-02-25 Created: 2010-02-22 Last updated: 2013-09-12Bibliographically approved

Open Access in DiVA

Gene Expression Prediction by Soft Integration and the Elastic Net—Best Performance of the DREAM3 Gene Expression Challenge(297 kB)730 downloads
File information
File name FULLTEXT01.pdfFile size 297 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Gustafsson, MikaHörnquist, Michael
By organisation
Communications and Transport SystemsThe Institute of Technology
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 730 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 357 hits
ReferencesLink to record
Permanent link

Direct link