liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Integration of Biological Data
Linköpings universitet, Institutionen för datavetenskap, IISLAB - Laboratoriet för intelligenta informationssystem. Linköpings universitet, Tekniska högskolan.
2006 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Data integration is an important procedure underlying many research tasks in the life sciences, as often multiple data sources have to be accessed to collect the relevant data. The data sources vary in content, data format, and access methods, which often vastly complicates the data retrieval process. As a result, the task of retrieving data requires a great deal of effort and expertise on the part of the user. To alleviate these difficulties, various information integration systems have been proposed in the area. However, a number of issues remain unsolved and new integration solutions are needed.

The work presented in this thesis considers data integration at three different levels. 1) Integration of biological data sources deals with integrating multiple data sources from an information integration system point of view. We study properties of biological data sources and existing integration systems. Based on the study, we formulate requirements for systems integrating biological data sources. Then, we define a query language that supports queries commonly used by biologists. Also, we propose a high-level architecture for an information integration system that meets a selected set of requirements and that supports the specified query language. 2) Integration of ontologies deals with finding overlapping information between ontologies. We develop and evaluate algorithms that use life science literature and take the structure of the ontologies into account. 3) Grouping of biological data entries deals with organizing data entries into groups based on the computation of similarity values between the data entries. We propose a method that covers the main steps and components involved in similarity-based grouping procedures. The applicability of the method is illustrated by a number of test cases. Further, we develop an environment that supports comparison and evaluation of different grouping strategies.

The work is supported by the implementation of: 1) a prototype for a system integrating biological data sources, called BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-based grouping of biological data, called KitEGA.

Ort, förlag, år, upplaga, sidor
Institutionen för datavetenskap , 2006. , s. 20
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1035
Nyckelord [en]
Datalogi, integration, grouping, databases, ontologies, biological data, ioinformatics, KitEGA
Nyckelord [sv]
Datalogi
Nationell ämneskategori
Teknik och teknologier
Identifikatorer
URN: urn:nbn:se:liu:diva-7484ISBN: 91-85523-28-3 (tryckt)OAI: oai:DiVA.org:liu-7484DiVA, id: diva2:22500
Disputation
2006-09-25, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2020-03-24Bibliografiskt granskad
Delarbeten
1. Towards transparent access to multiple biological databanks
Öppna denna publikation i ny flik eller fönster >>Towards transparent access to multiple biological databanks
2003 (Engelska)Ingår i: Proceedings of the first Asia-Pacific Bioinformatics Conference, Adelaide, Australia, 2003, Vol. 33, s. 53-60Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Nowadays, biologists use a number of large biological databanks to find relevant information for their research. Users of these databanks face a number of problems. One problem is that users are required to have good knowledge about the contents, implementations and conceptual models of many databanks to be able to ask precise and relevant questions. Further, the terminology that is used in the different databanks may be different. Also, when asking complex queries to multiple databanks, users need to construct a query plan on their own possibly leading to poor performance or not even obtaining results. To alleviate these problems we define an architecture for systems that deal with these problems by allowing for a transparent and integrated way to query the multiple sources. The contribution of this paper is threefold. First, we describe a study of current biological databanks. Then, we propose a base query language that contains operators that should be present in any query language for biological databanks. Further, we present an architecture for a system supporting such a language and providing integrated access to the highly distributed and heterogeneous environment of biological databanks.

Nationell ämneskategori
Teknik och teknologier
Identifikatorer
urn:nbn:se:liu:diva-14032 (URN)
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2015-02-18
2. Information integration systems for biological data source requirements and opportunities
Öppna denna publikation i ny flik eller fönster >>Information integration systems for biological data source requirements and opportunities
2006 (Engelska)Rapport (Övrig (populärvetenskap, debatt, mm))
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:liu:diva-14033 (URN)
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2018-01-13
3. Ontology-based integration for bioinformatics
Öppna denna publikation i ny flik eller fönster >>Ontology-based integration for bioinformatics
2005 (Engelska)Ingår i: Proceedings of the VLDB Workshop on Ontologies-bases techniques for DataBases and Information Systems - ODBIS, 2005, s. 55-58Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Information integration systems support researchers in bioinformatics to retrieve data from multiple biological data sources. In this paper we argue that the current approaches should be enhanced by ontological knowledge. We identify the dierent types of ontological knowledge that are available on the Web and propose an approach to use this knowledge to support integrated access to multiple biological data sources. We also show that current ontology-based integration approaches only cover parts of our approach.

 

Nationell ämneskategori
Teknik och teknologier
Identifikatorer
urn:nbn:se:liu:diva-14034 (URN)
Konferens
Workshop on Ontologies-bases techniques for DataBases and Information Systems - ODBIS
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2015-02-18
4. Alignment of Biomedical Ontologies using Life Science Literature
Öppna denna publikation i ny flik eller fönster >>Alignment of Biomedical Ontologies using Life Science Literature
Visa övriga...
2006 (Engelska)Ingår i: KDLL: International Workshop on Knowledge Discovery in Life Science LIterature Knowledge Discovery in Life Science Literature PAKDD 2006 International Workshop, KDLL 2006, Singapore, April 9, 2006. Proceedings / [ed] Eric G. Bremer, Jörg Hakenberg, Eui-Hong (Sam) Han, Daniel Berrar and Werner Dubitzky, Berlin/Heidelberg: Springer, 2006, s. 1-17Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This book constitutes the refereed proceedings of the International Workshop on Knowledge Discovery in Life Science Literature, KDLL 2006, held in Singapore in conjunction with the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006).

The 12 revised full papers presented together with two invited talks were carefully reviewed and selected for inclusion in the book. The papers cover all topics of knowledge discovery in life science data such as text mining, identification and retrieval of documents, passage retrieval, co-reference resolution, extraction of life science entities or relationships from large collections, automated characterization of biological, biomedical and biotechnological entities and processes, extraction and characterization of more complex patterns and interaction networks, automated generation of text summaries, automated construction, expansion and curation of ontologies for different domains, and construction of controlled vocabularies.

Ort, förlag, år, upplaga, sidor
Berlin/Heidelberg: Springer, 2006
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 3886
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:liu:diva-14035 (URN)10.1007/11683568_1 (DOI)000237198800001 ()978-3-540-32809-4 (ISBN)3-540-32809-2 (ISBN)
Konferens
KDLL: International Workshop on Knowledge Discovery in Life Science LIterature Knowledge Discovery in Life Science Literature PAKDD 2006 International Workshop, KDLL 2006, Singapore, April 9, 2006
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2018-11-27Bibliografiskt granskad
5. A Method for Similarity-Based Grouping of Biological Data
Öppna denna publikation i ny flik eller fönster >>A Method for Similarity-Based Grouping of Biological Data
2006 (Engelska)Ingår i: DILS: International Workshop on Data Integration in the Life Sciences Data Integration in the Life Sciences Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006. Proceedings / [ed] Ulf Leser, Felix Naumann, Barbara Eckman, Springer Berlin/Heidelberg, 2006, s. 136-151Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.

Ort, förlag, år, upplaga, sidor
Springer Berlin/Heidelberg, 2006
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 4075
Serie
Lecture Notes in Bioinformatics ; 4075
Nationell ämneskategori
Teknik och teknologier
Identifikatorer
urn:nbn:se:liu:diva-14036 (URN)10.1007/11799511_13 (DOI)000239622300011 ()978-3-540-36595-2 (ISBN)978-3-540-36593-8 (ISBN)
Konferens
DILS: International Workshop on Data Integration in the Life Sciences Data Integration in the Life Sciences Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006.
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2018-11-27Bibliografiskt granskad
6. Tool for Evaluating Strategies for Grouping of Biological Data
Öppna denna publikation i ny flik eller fönster >>Tool for Evaluating Strategies for Grouping of Biological Data
2007 (Engelska)Ingår i: Journal of Integrative Bioinformatics, ISSN 1613-4516, Vol. 4, nr 3Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

During the last decade an enormous amount of biological data has been generated and techniques and tools to analyze this data have been developed. Many of these tools use some form of grouping and are used in, for instance, data integration, data cleaning, prediction of protein functionality, and correlation of genes based on microarray data. A number of aspects influence the quality of the grouping results: the data sources, the grouping attributes and the algorithms implementing the grouping procedure. Many methods exist, but it is often not clear which methods perform best for which grouping tasks. The study of the properties, and the evaluation and the comparison of the different aspects that influence the quality of the grouping results, would give us valuable insight in how the grouping procedures could be used in the best way. It would also lead to recommendations on how to improve the current procedures and develop new procedures. To be able to perform such studies and evaluations we need environments that allow us to compare and evaluate different grouping strategies. In this paper we present a framework, KitEGA, for such an environment, and present its current prototype implementation. We illustrate its use by comparing grouping strategies for classifying proteins regarding biological function and isozymes.

Nationell ämneskategori
Teknik och teknologier Datavetenskap (datalogi) Bioinformatik (beräkningsbiologi)
Identifikatorer
urn:nbn:se:liu:diva-14037 (URN)10.2390/biecoll-jib-2007-83 (DOI)
Tillgänglig från: 2006-09-28 Skapad: 2006-09-28 Senast uppdaterad: 2018-01-13

Open Access i DiVA

fulltext(285 kB)385 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 285 kBChecksumma MD5
7eccf5ab03ccf6f15d0eb98de3625b4502d1ddb4de9c9f9a1863b8721760169579f91392
Typ fulltextMimetyp application/pdf
Beställ online >>

Personposter BETA

Jakonienė, Vaida

Sök vidare i DiVA

Av författaren/redaktören
Jakonienė, Vaida
Av organisationen
IISLAB - Laboratoriet för intelligenta informationssystemTekniska högskolan
Teknik och teknologier

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 385 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1837 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf