liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Identification of active gene networks by filtering co-occurence text mining networks with whole genome expression measurements
Linköping University, Department of Physics, Chemistry and Biology, Computational Biology. Linköping University, The Institute of Technology.
Karolinska Institutet, Sweden.
Karolinska Institutet, Sweden.
Linköping University, Department of Physics, Chemistry and Biology, Computational Biology. Linköping University, The Institute of Technology.
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Background: Since biological networks are believed to govern the cellular behavior under normal and diseased conditions there is a large interest in developing methods that can identify the underlying structure of those networks There has been an explosion of studies using text mining to extract useful biological information from the published biomedical literature as accessed through PubMed. Co-occurrence of gene symbols in abstracts have been proposed as a method to reconstruct gene networks. On the other hand, rapid progress in micro-array technology have produced extensive data-sets of the activity of the entire genome under different biological conditions. Yet, it is not clear how to validate and assess the quality of these inferred networks beyond visual inspection and case studies and it is not feasible to reconstruct gene networks directly from whole genome wide expression data . Here we present a novel method which integrates prior knowledge in the form of published articles with whole-genome wide expression measurements.

Results: We have developed a benchmark system, using a Yeast gene network as a reference network. which enables us to determine the optimal parameters for how to integrate the information from both abstracts and full texts of published articles with whole genome wide expression data sets. We investigate how the quality of the network reconstruction depends on the number of articles used, whether only using abstracts as compared to full text articles. We develop a comprehensive network reconstruction algorithm that utilizes several criteria, including the frequency of co-occurrences in abstracts and full texts, to rank which edges that are most likely to be present in the network.

Conclusions: Our method is a practical tool to effectively identify as many reliable edges as possible in a gene network combining text mining and whole-genome expression data. Our scheme could easily be integrated with other methods and other data types, such as sequence information, in order to find putative interactions between genes.

National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-100843OAI: oai:DiVA.org:liu-100843DiVA: diva2:664013
Available from: 2013-11-13 Created: 2013-11-13 Last updated: 2013-11-13
In thesis
1. Computational methods for cellular network inference and compound evaluation
Open this publication in new window or tab >>Computational methods for cellular network inference and compound evaluation
2005 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Most diseases are caused by a mixture of environmental and genetic components. The genetic component is mainly inherited but can also induced by the environment. Cancer and cardiovascular diseases are not affected by a single gene but more often by a number of genes a nd also the complex system that the interactions between these genes form. To understand and treat these complex diseases we need a better understanding of the underlying gene networks and what parts of the network the diseases target. In the first study presented here we show that text mining of the biological literature together with whole-genome expression data can be used to identify gene networks and that the resulting network edges can be ranked according to their biological reliability. In the second study we present a novel algorithm, CutTree, that can identify the genetic targets of compounds using only a small number of whole-genome expression experiments. Computational tools like these will facilitate the exploration of gene networks in health and disease.

Place, publisher, year, edition, pages
Linköping: Linköpings universitet, 2005. 13 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1208
Series
LiU-TEK-LIC, 63
National Category
Natural Sciences
Identifiers
urn:nbn:se:liu:diva-31005 (URN)LIU-TEK-LIC 2005:63 (ISRN)16696 (Local ID)16696 (Archive number)16696 (OAI)
Available from: 2009-10-09 Created: 2009-10-09 Last updated: 2013-11-13

Open Access in DiVA

No full text

Authority records BETA

Hallén, KristoferTegnér, Jesper

Search in DiVA

By author/editor
Hallén, KristoferTegnér, Jesper
By organisation
Computational BiologyThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 54 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf