liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Method for Similarity-Based Grouping of Biological Data
Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, The Institute of Technology.
Linköping University, The Institute of Technology. Linköping University, Department of Computer and Information Science, Database and information techniques.
Linköping University, Department of Computer and Information Science, Database and information techniques. Linköping University, The Institute of Technology. (IDA/ADIT)ORCID iD: 0000-0002-9084-0470
2006 (English)In: Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006. Proceedings / [ed] Ulf Leser, Felix Naumann, Barbara Eckman, Springer Berlin/Heidelberg, 2006, 136-151 p.Chapter in book (Refereed)
Abstract [en]

Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of redundancy in databases and data integration. Similarity-based grouping of data entries is not a trivial task in the context of life science data sources as the stored data is complex, highly correlated and represented at different levels of granularity. The contribution of this paper is two-fold. 1) We propose a method for similarity-based grouping and 2) we show results from test cases. As the main steps the method contains specification of grouping rules, pairwise grouping between entries, actual grouping of similar entries, and evaluation and analysis of the results. Often, different strategies can be used in the different steps. The method enables exploration of the influence of the choices and supports evaluation of the results with respect to given classifications. The grouping method is illustrated by test cases based on different strategies and classifications. The results show the complexity of the similarity-based grouping tasks and give deeper insights in the selected grouping tasks, the analyzed data source, and the influence of different strategies on the results.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2006. 136-151 p.
Series
Lecture Notes in Computer Science, ISSN 0302-9743 (print), 1611-3349 (online) ; 4075
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-14036DOI: 10.1007/11799511_13ISI: 000239622300011ISBN: 978-3-540-36595-2 (print)ISBN: 978-3-540-36593-8 (print)OAI: oai:DiVA.org:liu-14036DiVA: diva2:22498
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2016-12-06Bibliographically approved
In thesis
1. Integration of Biological Data
Open this publication in new window or tab >>Integration of Biological Data
2006 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Data integration is an important procedure underlying many research tasks in the life sciences, as often multiple data sources have to be accessed to collect the relevant data. The data sources vary in content, data format, and access methods, which often vastly complicates the data retrieval process. As a result, the task of retrieving data requires a great deal of effort and expertise on the part of the user. To alleviate these difficulties, various information integration systems have been proposed in the area. However, a number of issues remain unsolved and new integration solutions are needed.

The work presented in this thesis considers data integration at three different levels. 1) Integration of biological data sources deals with integrating multiple data sources from an information integration system point of view. We study properties of biological data sources and existing integration systems. Based on the study, we formulate requirements for systems integrating biological data sources. Then, we define a query language that supports queries commonly used by biologists. Also, we propose a high-level architecture for an information integration system that meets a selected set of requirements and that supports the specified query language. 2) Integration of ontologies deals with finding overlapping information between ontologies. We develop and evaluate algorithms that use life science literature and take the structure of the ontologies into account. 3) Grouping of biological data entries deals with organizing data entries into groups based on the computation of similarity values between the data entries. We propose a method that covers the main steps and components involved in similarity-based grouping procedures. The applicability of the method is illustrated by a number of test cases. Further, we develop an environment that supports comparison and evaluation of different grouping strategies.

The work is supported by the implementation of: 1) a prototype for a system integrating biological data sources, called BioTRIFU, 2) algorithms for ontology alignment, and 3) an environment for evaluating strategies for similarity-based grouping of biological data, called KitEGA.

Place, publisher, year, edition, pages
Institutionen för datavetenskap, 2006. 20 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1035
Keyword
Datalogi, integration, grouping, databases, ontologies, biological data, ioinformatics, KitEGA, Datalogi
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-7484 (URN)91-85523-28-3 (ISBN)
Public defence
2006-09-25, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Opponent
Supervisors
Available from: 2006-09-28 Created: 2006-09-28 Last updated: 2017-08-15Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textfind book at a swedish library/hitta boken i ett svenskt biblioteklink

Authority records BETA

Jakoniené, VaidaLambrix, Patrick

Search in DiVA

By author/editor
Jakoniené, VaidaLambrix, Patrick
By organisation
Database and information techniquesThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 112 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf