Identification of active gene networks by filtering co-occurence text mining networks with whole genome expression measurements
(English)Manuscript (preprint) (Other academic)
Background: Since biological networks are believed to govern the cellular behavior under normal and diseased conditions there is a large interest in developing methods that can identify the underlying structure of those networks There has been an explosion of studies using text mining to extract useful biological information from the published biomedical literature as accessed through PubMed. Co-occurrence of gene symbols in abstracts have been proposed as a method to reconstruct gene networks. On the other hand, rapid progress in micro-array technology have produced extensive data-sets of the activity of the entire genome under different biological conditions. Yet, it is not clear how to validate and assess the quality of these inferred networks beyond visual inspection and case studies and it is not feasible to reconstruct gene networks directly from whole genome wide expression data . Here we present a novel method which integrates prior knowledge in the form of published articles with whole-genome wide expression measurements.
Results: We have developed a benchmark system, using a Yeast gene network as a reference network. which enables us to determine the optimal parameters for how to integrate the information from both abstracts and full texts of published articles with whole genome wide expression data sets. We investigate how the quality of the network reconstruction depends on the number of articles used, whether only using abstracts as compared to full text articles. We develop a comprehensive network reconstruction algorithm that utilizes several criteria, including the frequency of co-occurrences in abstracts and full texts, to rank which edges that are most likely to be present in the network.
Conclusions: Our method is a practical tool to effectively identify as many reliable edges as possible in a gene network combining text mining and whole-genome expression data. Our scheme could easily be integrated with other methods and other data types, such as sequence information, in order to find putative interactions between genes.
Engineering and Technology
IdentifiersURN: urn:nbn:se:liu:diva-100843OAI: oai:DiVA.org:liu-100843DiVA: diva2:664013