liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparison and validation of community structures in complex networks
Linköping University, Department of Science and Technology. Linköping University, The Institute of Technology.
Linköping University, Department of Science and Technology. Linköping University, The Institute of Technology.ORCID iD: 0000-0003-0528-9782
Linköping University, Department of Science and Technology. Linköping University, The Institute of Technology.
2006 (English)In: Physica A: Statistical Mechanics and its Applications, ISSN 0378-4371, E-ISSN 1873-2119, Vol. 367, 559-576 p.Article in journal (Refereed) Published
Abstract [en]

The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is believed to be NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information.

Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.

Place, publisher, year, edition, pages
2006. Vol. 367, 559-576 p.
Keyword [en]
Network, Community, Validation, Distance measure, Hierarchical clustering, K-means, GO
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-32261DOI: 10.1016/j.physa.2005.12.017ISI: 000238236700049Local ID: 18142OAI: oai:DiVA.org:liu-32261DiVA: diva2:253083
Available from: 2009-10-09 Created: 2009-10-09 Last updated: 2017-12-13
In thesis
1. Gene networks from high-throughput data: Reverse engineering and analysis
Open this publication in new window or tab >>Gene networks from high-throughput data: Reverse engineering and analysis
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Experimental innovations starting in the 1990’s leading to the advent of high-throughput experiments in cellular biology have made it possible to measure thousands of genes simultaneously at a modest cost. This enables the discovery of new unexpected relationships between genes in addition to the possibility of falsify existing. To benefit as much as possible from these experiments the new inter disciplinary research field of systems biology have materialized. Systems biology goes beyond the conventional reductionist approach and aims at learning the whole system under the assumption that the system is greater than the sum of its parts. One emerging enterprise in systems biology is to use the high-throughput data to reverse engineer the web of gene regulatory interactions governing the cellular dynamics. This relatively new endeavor goes further than clustering genes with similar expression patterns and requires the separation of cause of gene expression from the effect. Despite the rapid data increase we then face the problem of having too few experiments to determine which regulations are active as the number of putative interactions has increased dramatic as the number of units in the system has increased. One possibility to overcome this problem is to impose more biologically motivated constraints. However, what is a biological fact or not is often not obvious and may be condition dependent. Moreover, investigations have suggested several statistical facts about gene regulatory networks, which motivate the development of new reverse engineering algorithms, relying on different model assumptions. As a result numerous new reverse engineering algorithms for gene regulatory networks has been proposed. As a consequent, there has grown an interest in the community to assess the performance of different attempts in fair trials on “real” biological problems. This resulted in the annually held DREAM conference which contains computational challenges that can be solved by the prosing researchers directly, and are evaluated by the chairs of the conference after the submission deadline.

This thesis contains the evolution of regularization schemes to reverse engineer gene networks from high-throughput data within the framework of ordinary differential equations. Furthermore, to understand gene networks a substantial part of it also concerns statistical analysis of gene networks. First, we reverse engineer a genome-wide regulatory network based solely on microarray data utilizing an extremely simple strategy assuming sparseness (LASSO). To validate and analyze this network we also develop some statistical tools. Then we present a refinement of the initial strategy which is the algorithm for which we achieved best performer at the DREAM2 conference. This strategy is further refined into a reverse engineering scheme which also can include external high-throughput data, which we confirm to be of relevance as we achieved best performer in the DREAM3 conference as well. Finally, the tools we developed to analyze stability and flexibility in linearized ordinary differential equations representing gene regulatory networks is further discussed.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2010. 36 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1301
National Category
Natural Sciences
Identifiers
urn:nbn:se:liu:diva-54089 (URN)978-91-7393-442-8 (ISBN)
Public defence
2010-03-26, K3, Kåkenshus, Campus Norrköping, Linköpings universitet, Norköping, 13:15 (English)
Opponent
Supervisors
Available from: 2010-02-25 Created: 2010-02-22 Last updated: 2013-09-12Bibliographically approved
2. Large-scale topology, stability and biology of gene networks
Open this publication in new window or tab >>Large-scale topology, stability and biology of gene networks
2006 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Experimental innovations in cell biology have provided a huge amount of genomescale data sets, settling the stage for understanding organisms on a system level. Recently, complex networks have been widely adopted and serve as a unifying language for widely different systems, including social, technological and biological systems. Still- in most biological cases-the number of interacting units vastly exceeds the number of measurements, hence large-scale models must still be very simple or non-specific. In this thesis we explore the limits of a linear (Lasso) network model on a genomic-scale for the Saccharomyces cerevisae organism and the limits of some analysis tools from the research field of complex networks. The former study (Paper I and Paper III) mainly regards validation issues, but also stipulate novel statistical system hypotheses, e.g., the system is significantly more stable than random, but still flexible to target stimuli. The latter study (Paper II) explores different heuristics in the search for communities (i.e., dense sub-graphs) within large complex networks and how the concept relates to functional modules.

Place, publisher, year, edition, pages
Linköpings Universitet: Linköpings universitet, 2006. 30 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1256
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-38819 (URN)45774 (Local ID)91-85523-53-4 (ISBN)45774 (Archive number)45774 (OAI)
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2013-12-12

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Gustafsson, MikaHörnquist, MichaelLombardi, Anna

Search in DiVA

By author/editor
Gustafsson, MikaHörnquist, MichaelLombardi, Anna
By organisation
Department of Science and TechnologyThe Institute of Technology
In the same journal
Physica A: Statistical Mechanics and its Applications
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 538 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf