liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Tool for Evaluating Strategies for Grouping of Biological Data
Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, The Institute of Technology.
Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, The Institute of Technology. (IDA/ADIT)ORCID iD: 0000-0002-9084-0470
2007 (English)In: Journal of Integrative Bioinformatics, ISSN 1613-4516, Vol. 4, no 3Article in journal (Refereed) Published
Abstract [en]

During the last decade an enormous amount of biological data has been generated and techniques and tools to analyze this data have been developed. Many of these tools use some form of grouping and are used in, for instance, data integration, data cleaning, prediction of protein functionality, and correlation of genes based on microarray data. A number of aspects influence the quality of the grouping results: the data sources, the grouping attributes and the algorithms implementing the grouping procedure. Many methods exist, but it is often not clear which methods perform best for which grouping tasks. The study of the properties, and the evaluation and the comparison of the different aspects that influence the quality of the grouping results, would give us valuable insight in how the grouping procedures could be used in the best way. It would also lead to recommendations on how to improve the current procedures and develop new procedures. To be able to perform such studies and evaluations we need environments that allow us to compare and evaluate different grouping strategies. In this paper we present a framework, KitEGA, for such an environment, and present its current prototype implementation. We illustrate its use by comparing grouping strategies for classifying proteins regarding biological function and isozymes.

Place, publisher, year, edition, pages
2007. Vol. 4, no 3
National Category
Engineering and Technology Computer Science Bioinformatics (Computational Biology)
URN: urn:nbn:se:liu:diva-14037DOI: 10.2390/biecoll-jib-2007-83OAI: diva2:22499
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-05-05
In thesis
1. Integration of Biological Data
Open this publication in new window or tab >>Integration of Biological Data
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Data integration is an important procedure underlying many research tasks in the life sciences, as often multiple data sources have to be accessed to collect the relevant data. The data sources vary in content, data format, and access methods, which often vastly complicates the data retrieval process. As a result, the task of retrieving data requires a great deal of effort and expertise on the part of the user. To alleviate these difficulties, various information integration systems have been proposed in the area. However, a number of issues remain unsolved and new integration solutions are needed.

The work presented in this thesis considers data integration at three different levels. 1) Integration of biological data sources deals with integrating multiple data sources from an information integration system point of view. We study properties of biological data sources and existing integration systems. Based on the study, we formulate requirements for systems integrating biological data sources. Then, we define a query language that supports queries commonly used by biologists. Also, we propose a high-level architecture for an information integration system that meets a selected set of requirements and that supports the specified query language. 2) Integration of ontologies deals with finding overlapping information between ontologies. We develop and evaluate algorithms that use life science literature and take the structure of the ontologies into account. 3) Grouping of biological data entries deals with organizing data entries into groups based on the computation of similarity values between the data entries. We propose a method that covers the main steps and components involved in similarity-based grouping procedures. The applicability of the method is illustrated by a number of test cases. Further, we develop an environment that supports comparison and evaluation of different grouping strategies.

The work is supported by the implementation of: 1) a prototype for a system integrating biological data sources, called BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-based grouping of biological data, called KitEGA.

Place, publisher, year, edition, pages
Institutionen för datavetenskap, 2006
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1035
Datalogi, integration, grouping, databases, ontologies, biological data, ioinformatics, KitEGA, Datalogi
National Category
Engineering and Technology
urn:nbn:se:liu:diva-7484 (URN)91-85523-28-3 (ISBN)
Public defence
2006-09-25, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-02-18

Open Access in DiVA

No full text

Other links

Publisher's full textLink to Ph.D. thesisLink to article

Search in DiVA

By author/editor
Jakoniené, VaidaLambrix, Patrick
By organisation
Database and information techniquesThe Institute of Technology
Engineering and TechnologyComputer ScienceBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 78 hits
ReferencesLink to record
Permanent link

Direct link