liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Gene networks from high-throughput data: Reverse engineering and analysis
Linköping University, Department of Science and Technology, Communications and Transport Systems. Linköping University, The Institute of Technology. (Computational systems biology)
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Experimental innovations starting in the 1990’s leading to the advent of high-throughput experiments in cellular biology have made it possible to measure thousands of genes simultaneously at a modest cost. This enables the discovery of new unexpected relationships between genes in addition to the possibility of falsify existing. To benefit as much as possible from these experiments the new inter disciplinary research field of systems biology have materialized. Systems biology goes beyond the conventional reductionist approach and aims at learning the whole system under the assumption that the system is greater than the sum of its parts. One emerging enterprise in systems biology is to use the high-throughput data to reverse engineer the web of gene regulatory interactions governing the cellular dynamics. This relatively new endeavor goes further than clustering genes with similar expression patterns and requires the separation of cause of gene expression from the effect. Despite the rapid data increase we then face the problem of having too few experiments to determine which regulations are active as the number of putative interactions has increased dramatic as the number of units in the system has increased. One possibility to overcome this problem is to impose more biologically motivated constraints. However, what is a biological fact or not is often not obvious and may be condition dependent. Moreover, investigations have suggested several statistical facts about gene regulatory networks, which motivate the development of new reverse engineering algorithms, relying on different model assumptions. As a result numerous new reverse engineering algorithms for gene regulatory networks has been proposed. As a consequent, there has grown an interest in the community to assess the performance of different attempts in fair trials on “real” biological problems. This resulted in the annually held DREAM conference which contains computational challenges that can be solved by the prosing researchers directly, and are evaluated by the chairs of the conference after the submission deadline.

This thesis contains the evolution of regularization schemes to reverse engineer gene networks from high-throughput data within the framework of ordinary differential equations. Furthermore, to understand gene networks a substantial part of it also concerns statistical analysis of gene networks. First, we reverse engineer a genome-wide regulatory network based solely on microarray data utilizing an extremely simple strategy assuming sparseness (LASSO). To validate and analyze this network we also develop some statistical tools. Then we present a refinement of the initial strategy which is the algorithm for which we achieved best performer at the DREAM2 conference. This strategy is further refined into a reverse engineering scheme which also can include external high-throughput data, which we confirm to be of relevance as we achieved best performer in the DREAM3 conference as well. Finally, the tools we developed to analyze stability and flexibility in linearized ordinary differential equations representing gene regulatory networks is further discussed.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press , 2010. , 36 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1301
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:liu:diva-54089ISBN: 978-91-7393-442-8 (print)OAI: oai:DiVA.org:liu-54089DiVA: diva2:300143
Public defence
2010-03-26, K3, Kåkenshus, Campus Norrköping, Linköpings universitet, Norköping, 13:15 (English)
Opponent
Supervisors
Available from: 2010-02-25 Created: 2010-02-22 Last updated: 2013-09-12Bibliographically approved
List of papers
1. Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation
Open this publication in new window or tab >>Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation
2005 (English)In: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 2, no 3, 254-261 p.Article in journal (Refereed) Published
Abstract [en]

We construct a gene-to-gene regulatory network from time-series data of expression levels for the whole genome of the yeast Saccharomyces cerevisae, in a case where the number of measurements is much smaller than the number of genes in the network. This network is analyzed with respect to present biological knowledge of all genes (according to the Gene Ontology database), and we find some of its large-scale properties to be in accordance with known facts about the organism. The linear modeling employed here has been explored several times, but due to lack of any validation beyond investigating individual genes, it has been seriously questioned with respect to its applicability to biological systems. Our results show the adequacy of the approach and make further investigations of the model meaningful.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-29432 (URN)10.1109/TCBB.2005.35 (DOI)000235704200008 ()14778 (Local ID)14778 (Archive number)14778 (OAI)
Available from: 2009-10-09 Created: 2009-10-09 Last updated: 2017-12-13
2. Comparison and validation of community structures in complex networks
Open this publication in new window or tab >>Comparison and validation of community structures in complex networks
2006 (English)In: Physica A: Statistical Mechanics and its Applications, ISSN 0378-4371, E-ISSN 1873-2119, Vol. 367, 559-576 p.Article in journal (Refereed) Published
Abstract [en]

The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is believed to be NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information.

Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.

Keyword
Network, Community, Validation, Distance measure, Hierarchical clustering, K-means, GO
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-32261 (URN)10.1016/j.physa.2005.12.017 (DOI)000238236700049 ()18142 (Local ID)18142 (Archive number)18142 (OAI)
Available from: 2009-10-09 Created: 2009-10-09 Last updated: 2017-12-13
3. Reverse Engineering of Gene Networks with LASSO and Nonlinear Basis Functions
Open this publication in new window or tab >>Reverse Engineering of Gene Networks with LASSO and Nonlinear Basis Functions
Show others...
2009 (English)In: CHALLENGES OF SYSTEMS BIOLOGY: COMMUNITY EFFORTS TO HARNESS BIOLOGICAL COMPLEXITY, ISSN 0077-8923 , Vol. 1158, 265-275 p.Article in journal (Refereed) Published
Abstract [en]

The quest to determine cause from effect is often referred to as reverse engineering in the context of cellular networks. Here we propose and evaluate an algorithm for reverse engineering a gene regulatory network from time-series kind steady-state data. Our algorithmic pipeline, which is rather standard in its parts but not in its integrative composition, combines ordinary differential equations, parameter estimations by least angle regression, and cross-validation procedures for determining the in-degrees and selection of nonlinear transfer functions. The result of the algorithm is a complete directed net-work, in which each edge has been assigned a score front it bootstrap procedure. To evaluate the performance, we submitted the outcome of the algorithm to the reverse engineering assessment competition DREAM2, where we used the data corresponding to the InSillico1 and InSilico2 networks as input. Our algorithm outperformed all other algorithms when inferring one of the directed gene-to-gene networks.

Keyword
reverse engineering, network inference, nonlinear, DREAM conference, LARS, LASSO
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-18289 (URN)10.1111/j.1749-6632.2008.03764.x (DOI)
Note
This is the authors’ version of the following article: Mika Gustafsson, Michael Hörnquist, Jesper Lundstrom, Johan Bjorkegren and Jesper Tegnér, Reverse Engineering of Gene Networks with LASSO and Nonlinear Basis Functions, 2009, Annals of the New York Academy of Sciences, Volume 1158 Issue, The Challenges of Systems Biology Community Efforts to Harness Biological Complexity, 265-275. which has been published in final form at: http://dx.doi.org/10.1111/j.1749-6632.2008.03764.x Copyright: Blackwell Publishing Ltd http://www.blackwellpublishing.com/ Available from: 2009-05-25 Created: 2009-05-15 Last updated: 2013-09-12Bibliographically approved
4. Genome-wide system analysis reveals stable yet flexible network dynamics in yeast
Open this publication in new window or tab >>Genome-wide system analysis reveals stable yet flexible network dynamics in yeast
2009 (English)In: IET SYSTEMS BIOLOGY, ISSN 1751-8849, Vol. 3, no 4, 219-228 p.Article in journal (Refereed) Published
Abstract [en]

Recently, important insights into static network topology for biological systems have been obtained, but still global dynamical network properties determining stability and system responsiveness have not been accessible for analysis. Herein, we explore a genome-wide gene-to-gene regulatory network based on expression data from the cell cycle in Saccharomyces cerevisae (budding yeast). We recover static properties like hubs (genes having several out-going connections), network motifs and modules, which have previously been derived from multiple data sources such as whole-genome expression measurements, literature mining, protein-protein and transcription factor binding data. Further, our analysis uncovers some novel dynamical design principles; hubs are both repressed and repressors, and the intra-modular dynamics are either strongly activating or repressing whereas inter-modular couplings are weak. Finally, taking advantage of the inferred strength and direction of all interactions, we perform a global dynamical systems analysis of the network. Our inferred dynamics of hubs, motifs and modules produce a more stable network than what is expected given randomised versions. The main contribution of the repressed hubs is to increase system stability, while higher order dynamic effects (e.g. module dynamics) mainly increase system flexibility. Altogether, the presence of hubs, motifs and modules induce few flexible modes, to which the network is extra sensitive to an external signal. We believe that our approach, and the inferred biological mode of strong flexibility and stability, will also apply to other cellular networks and adaptive systems.

National Category
Natural Sciences
Identifiers
urn:nbn:se:liu:diva-19799 (URN)10.1049/iet-syb.2008.0112 (DOI)
Note
This paper is a postprint of a paper submitted to and accepted for publication in IET SYSTEMS BIOLOGY and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IET Digital Library Original Publication: Mika Gustafsson, Michael Hörnquist, J Bjorkegren and Jesper Tegnér, Genome-wide system analysis reveals stable yet flexible network dynamics in yeast, 2009, IET SYSTEMS BIOLOGY, (3), 4, 219-228. http://dx.doi.org/10.1049/iet-syb.2008.0112 Copyright: The Institution of Engineering and Technology http://www.theiet.org/ Available from: 2009-08-28 Created: 2009-08-10 Last updated: 2013-12-12Bibliographically approved
5. Integrating various data sources for improved quality in reverse engineering of gene regulatory networks
Open this publication in new window or tab >>Integrating various data sources for improved quality in reverse engineering of gene regulatory networks
2009 (English)In: Handbook of Research on Computational Methodologies in Gene Regulatory Networks / [ed] Sanjoy Das, Doina Caragea, Stephen M. Welch and William H. Hsu, IGI Global , 2009, 1, 476-496 p.Chapter in book (Other academic)
Abstract [en]

In this chapter we outline a methodology to reverse engineer GRNs from various data sources within an ODE framework. The methodology is generally applicable and is suitable to handle the broad error distribution present in microarrays. The main effort of this chapter is the exploration of a fully data driven approach to the integration problem in a “soft evidence” based way. Integration is here seen as the process of incorporation of uncertain a priori knowledge and is therefore only relied upon if it lowers the prediction error. An efficient implementation is carried out by a linear programming formulation. This LP problem is solved repeatedly with small modifications, from which we can benefit by restarting the primal simplex method from nearby solutions, which enables a computational efficient execution. We perform a case study for data from the yeast cell cycle, where all verified genes are putative regulators and the a priori knowledge consists of several types of binding data, text-mining and annotation knowledge.

Place, publisher, year, edition, pages
IGI Global, 2009 Edition: 1
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-54096 (URN)978-1-60566-685-3 (ISBN)978-1-60566-686-0 (ISBN)
Available from: 2010-02-23 Created: 2010-02-23 Last updated: 2013-09-12Bibliographically approved
6. Gene Expression Prediction by Soft Integration and the Elastic Net: Best Performance of the DREAM3 Gene Expression Challenge
Open this publication in new window or tab >>Gene Expression Prediction by Soft Integration and the Elastic Net: Best Performance of the DREAM3 Gene Expression Challenge
2010 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 5, no 2, e9134- p.Article in journal (Refereed) Published
Abstract [en]

Background: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance.

Methodology/Principal Findings: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the “elastic net”. Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance.

Conclusions/Significance: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.

Keyword
elastic net
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:liu:diva-54001 (URN)10.1371/journal.pone.0009134 (DOI)000274590500002 ()
Projects
CENIIT
Available from: 2010-02-23 Created: 2010-02-18 Last updated: 2014-10-06Bibliographically approved
7. System Analysis of Gene Regulatory Networks
Open this publication in new window or tab >>System Analysis of Gene Regulatory Networks
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The inference of genome-wide regulatory networks in cells from high-throughput data sets has revealed a diverse picture of only partly overlapping descriptions. Nevertheless, several conclusions of the large-scale properties in the organization of these networks are possible. For example, the presence of hubs, a modular structure and certain motifs are recurrent phenomena.

Several authors have recently claimed cell systems to be stable against perturbations and random errors, but still able to rapidly switch between different states from specific stimuli. Since inferred genome-wide systems need to be extremely simple to avoid overfitting, these two features are hard to attain simultaneously in a mathematical model. Here we review and discuss possible measures of how system stability and flexibility may be manifested and measured for linear ODE models. Furthermore, we review how different network properties contribute to these systems level properties. It turns out that the presence of repressed hubs, together with other phenomena of topological nature such as motifs and modules, contributes to the overall stability and/or flexibility of the system.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-54097 (URN)
Available from: 2010-02-23 Created: 2010-02-23 Last updated: 2013-09-12

Open Access in DiVA

Gene networks from high-throughput data –Reverse engineering and analysis(469 kB)2294 downloads
File information
File name FULLTEXT01.pdfFile size 469 kBChecksum SHA-512
2a08ebdbe97a77bfecdbb5dd72042ae6d8195af2661165217059b64e373493bfadc24ceaa54136f7818aa9b15b59e45d538f5a7764402a0e72cdedaeffa47121
Type fulltextMimetype application/pdf
Cover(113 kB)68 downloads
File information
File name COVER01.pdfFile size 113 kBChecksum SHA-512
4c92f667840c0490df48735573d42ea63a8b162813db2148a856726a6a371034bfdad70c55b945c4d5847bc41a8e3f5f73f2c58dae196cd4c266b6067f883014
Type coverMimetype application/pdf

Authority records BETA

Gustafsson, Mika

Search in DiVA

By author/editor
Gustafsson, Mika
By organisation
Communications and Transport SystemsThe Institute of Technology
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 2294 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2449 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf