liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Integration of Biological Data
Linköping University, Department of Computer and Information Science, IISLAB - Laboratory for Intelligent Information Systems. Linköping University, The Institute of Technology.
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Data integration is an important procedure underlying many research tasks in the life sciences, as often multiple data sources have to be accessed to collect the relevant data. The data sources vary in content, data format, and access methods, which often vastly complicates the data retrieval process. As a result, the task of retrieving data requires a great deal of effort and expertise on the part of the user. To alleviate these difficulties, various information integration systems have been proposed in the area. However, a number of issues remain unsolved and new integration solutions are needed.

The work presented in this thesis considers data integration at three different levels. 1) Integration of biological data sources deals with integrating multiple data sources from an information integration system point of view. We study properties of biological data sources and existing integration systems. Based on the study, we formulate requirements for systems integrating biological data sources. Then, we define a query language that supports queries commonly used by biologists. Also, we propose a high-level architecture for an information integration system that meets a selected set of requirements and that supports the specified query language. 2) Integration of ontologies deals with finding overlapping information between ontologies. We develop and evaluate algorithms that use life science literature and take the structure of the ontologies into account. 3) Grouping of biological data entries deals with organizing data entries into groups based on the computation of similarity values between the data entries. We propose a method that covers the main steps and components involved in similarity-based grouping procedures. The applicability of the method is illustrated by a number of test cases. Further, we develop an environment that supports comparison and evaluation of different grouping strategies.

The work is supported by the implementation of: 1) a prototype for a system integrating biological data sources, called BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-based grouping of biological data, called KitEGA.

Place, publisher, year, edition, pages
Institutionen för datavetenskap , 2006. , 20 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1035
Keyword [en]
Datalogi, integration, grouping, databases, ontologies, biological data, ioinformatics, KitEGA
Keyword [sv]
Datalogi
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-7484ISBN: 91-85523-28-3 (print)OAI: oai:DiVA.org:liu-7484DiVA: diva2:22500
Public defence
2006-09-25, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2017-08-15Bibliographically approved
List of papers
1. Towards transparent access to multiple biological databanks
Open this publication in new window or tab >>Towards transparent access to multiple biological databanks
2003 (English)In: Proceedings of the first Asia-Pacific Bioinformatics Conference, Adelaide, Australia, 2003, Vol. 33, 53-60 p.Conference paper, Published paper (Refereed)
Abstract [en]

Nowadays, biologists use a number of large biological databanks to find relevant information for their research. Users of these databanks face a number of problems. One problem is that users are required to have good knowledge about the contents, implementations and conceptual models of many databanks to be able to ask precise and relevant questions. Further, the terminology that is used in the different databanks may be different. Also, when asking complex queries to multiple databanks, users need to construct a query plan on their own possibly leading to poor performance or not even obtaining results. To alleviate these problems we define an architecture for systems that deal with these problems by allowing for a transparent and integrated way to query the multiple sources. The contribution of this paper is threefold. First, we describe a study of current biological databanks. Then, we propose a base query language that contains operators that should be present in any query language for biological databanks. Further, we present an architecture for a system supporting such a language and providing integrated access to the highly distributed and heterogeneous environment of biological databanks.

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-14032 (URN)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-02-18
2. Information integration systems for biological data source requirements and opportunities
Open this publication in new window or tab >>Information integration systems for biological data source requirements and opportunities
2006 (English)Report (Other (popular science, discussion, etc.))
National Category
Computer Science
Identifiers
urn:nbn:se:liu:diva-14033 (URN)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-02-18
3. Ontology-based integration for bioinformatics
Open this publication in new window or tab >>Ontology-based integration for bioinformatics
2005 (English)In: Proceedings of the VLDB Workshop on Ontologies-bases techniques for DataBases and Information Systems - ODBIS, 2005, 55-58 p.Conference paper, Published paper (Refereed)
Abstract [en]

Information integration systems support researchers in bioinformatics to retrieve data from multiple biological data sources. In this paper we argue that the current approaches should be enhanced by ontological knowledge. We identify the dierent types of ontological knowledge that are available on the Web and propose an approach to use this knowledge to support integrated access to multiple biological data sources. We also show that current ontology-based integration approaches only cover parts of our approach.

 

National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-14034 (URN)
Conference
Workshop on Ontologies-bases techniques for DataBases and Information Systems - ODBIS
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-02-18
4. Alignment of Biomedical Ontologies using Life Science Literature
Open this publication in new window or tab >>Alignment of Biomedical Ontologies using Life Science Literature
Show others...
2006 (English)In: Proceedings of the International Workshop on Knowledge Discovery in Life Science Literature / [ed] Eric G. Bremer, Springer Berlin/Heidelberg, 2006, 1-17 p.Chapter in book (Refereed)
Abstract [en]

This book constitutes the refereed proceedings of the International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006, held in Singapore in conjunction with the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006).

The 12 revised full papers presented together with two invited talks were carefully reviewed and selected for inclusion in the book. The papers cover all topics of knowledge discovery in life science data such as text mining, identification and retrieval of documents, passage retrieval, co-reference resolution, extraction of life science entities or relationships from large collections, automated characterization of biological, biomedical and biotechnological entities and processes, extraction and characterization of more complex patterns and interaction networks, automated generation of text summaries, automated construction, expansion and curation of ontologies for different domains, and construction of controlled vocabularies.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2006
Series
Lecture Notes in Computer Science, ISSN 0302-9743 (print), 1611-3349 (online) ; 3886
National Category
Computer Science
Identifiers
urn:nbn:se:liu:diva-14035 (URN)10.1007/11683568_1 (DOI)000237198800001 ()978-3-540-32809-4 (ISBN)3-540-32809-2 (ISBN)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2016-12-06Bibliographically approved
5. A Method for Similarity-Based Grouping of Biological Data
Open this publication in new window or tab >>A Method for Similarity-Based Grouping of Biological Data
2006 (English)In: Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006. Proceedings / [ed] Ulf Leser, Felix Naumann, Barbara Eckman, Springer Berlin/Heidelberg, 2006, 136-151 p.Chapter in book (Refereed)
Abstract [en]

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2006
Series
Lecture Notes in Computer Science, ISSN 0302-9743 (print), 1611-3349 (online) ; 4075
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-14036 (URN)10.1007/11799511_13 (DOI)000239622300011 ()978-3-540-36595-2 (ISBN)978-3-540-36593-8 (ISBN)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2016-12-06Bibliographically approved
6. Tool for Evaluating Strategies for Grouping of Biological Data
Open this publication in new window or tab >>Tool for Evaluating Strategies for Grouping of Biological Data
2007 (English)In: Journal of Integrative Bioinformatics, ISSN 1613-4516, Vol. 4, no 3Article in journal (Refereed) Published
Abstract [en]

During the last decade an enormous amount of biological data has been generated and techniques and tools to analyze this data have been developed. Many of these tools use some form of grouping and are used in, for instance, data integration, data cleaning, prediction of protein functionality, and correlation of genes based on microarray data. A number of aspects influence the quality of the grouping results: the data sources, the grouping attributes and the algorithms implementing the grouping procedure. Many methods exist, but it is often not clear which methods perform best for which grouping tasks. The study of the properties, and the evaluation and the comparison of the different aspects that influence the quality of the grouping results, would give us valuable insight in how the grouping procedures could be used in the best way. It would also lead to recommendations on how to improve the current procedures and develop new procedures. To be able to perform such studies and evaluations we need environments that allow us to compare and evaluate different grouping strategies. In this paper we present a framework, KitEGA, for such an environment, and present its current prototype implementation. We illustrate its use by comparing grouping strategies for classifying proteins regarding biological function and isozymes.

National Category
Engineering and Technology Computer Science Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:liu:diva-14037 (URN)10.2390/biecoll-jib-2007-83 (DOI)
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2015-05-05

Open Access in DiVA

fulltext(285 kB)360 downloads
File information
File name FULLTEXT01.pdfFile size 285 kBChecksum SHA-1
de9c9f9a1863b8721760169579f913927eccf5ab03ccf6f15d0eb98de3625b4502d1ddb4
Type fulltextMimetype application/pdf

Authority records BETA

Jakonienė, Vaida

Search in DiVA

By author/editor
Jakonienė, Vaida
By organisation
IISLAB - Laboratory for Intelligent Information SystemsThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 360 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1711 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf