liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Behaviour-driven clustering based on event-sequence similarity metrics
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.ORCID iD: 0000-0003-4761-8601
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.ORCID iD: 0000-0002-9466-9826
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.
2010 (English)Manuscript (preprint) (Other academic)
Abstract [en]

When analysing event data two key objectives are to first identify interesting subsequences in the data records and then to retrieve groups of records that exhibit similar behaviour. This is especially true when the focus of the exploration is the human, for example when using activity diaries to reveal sub-populations with similar behaviour, medical records to identify groups with similar medical conditions, or web sessions to find groups with similar web-surfing habits. In this paper we propose a visual exploration approach, based on sequence similarity metrics and clustering techniques, that will allow an analyst to interactively explore the distribution of sequences along event data records as well as group the results according to user-selected similarity preferences. We have identified a set of similarity metrics that are specific to event-sequences which we use as input into a clustering algorithm. The user can choose which metrics to use and assign weighting factors to them, which results in groupings that exhibit similar behaviour according to their definition of similarity and interestingness. The resulting clusters can be interactively explored in a multiple linked-view environment showing the clusters, the cluster quality, the similarity metrics and meta (background) information describing the clustered individuals in order to make comparisons within and between groups. Using such an interactive approach that considers user preferences and takes advantage of background knowledge gives a basis for enhanced analytical reasoning by providing a more complete understanding of the retrieved groupings and can lead to a more thorough analysis and accurate assessments.

Place, publisher, year, edition, pages
Keyword [en]
Event-based data, activity diary data, event sequences, similarity metrics, clustering, interactive exploration
National Category
Engineering and Technology
URN: urn:nbn:se:liu:diva-58310OAI: diva2:338116
Available from: 2010-08-10 Created: 2010-08-10 Last updated: 2015-09-22
In thesis
1. Everyday mining: Exploring sequences in event-based data
Open this publication in new window or tab >>Everyday mining: Exploring sequences in event-based data
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Utforskning av sekvenser i händelsebaserade data
Abstract [en]

Event-based data are encountered daily in many disciplines and are used for various purposes. They are collections of ordered sequences of events where each event has a start time and a duration. Examples of such data include medical records, internet surfing records, transaction records, industrial process or system control records, and activity diary data.

This thesis is concerned with the exploration of event-based data, and in particular the identification and analysis of sequences within them. Sequences are interesting in this context since they enable the understanding of the evolving character of event data records over time. They can reveal trends, relationships and similarities across the data, allow for comparisons to be made within and between the records, and can also help predict forthcoming events.The presented work has researched methods for identifying and exploring such event-sequences which are based on modern visualization, interaction and data mining techniques.

An interactive visualization environment that facilitates analysis and exploration of event-based data has been designed and developed, which permits a user to freely explore different aspects of this data and visually identify interesting features and trends. Visual data mining methods have been developed within this environment, that facilitate the automatic identification and exploration of interesting sequences as patterns. The first method makes use of a sequence mining algorithm that identifies sequences of events as patterns, in an iterative fashion, according to certain user-defined constraints. The resulting patterns can then be displayed and interactively explored by the user.The second method has been inspired by web-mining algorithms and the use of graph similarity. A tree-inspired visual exploration environment has been developed that allows a user to systematically and interactively explore interesting event-sequences.Having identified interesting sequences as patterns it becomes interesting to further explore how these are incorporated across the data and classify the records based on the similarities in the way these sequences are manifested within them. In the final method developed in this work, a set of similarity metrics has been identified for characterizing event-sequences, which are then used within a clustering algorithm in order to find similarly behavinggroups. The resulting clusters, as well as attributes of the clusteringparameters and data records, are displayed in a set of linked views allowing the user to interactively explore relationships within these.

The research has been focused on the exploration of activity diary data for the study of individuals' time-use and has resulted in a powerful research tool facilitating understanding and thorough analysis of the complexity of everyday life.

Place, publisher, year, edition, pages
Norrköping: Linköping University Electronic Press, 2010. 76 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1331
Event-based data, activity diary data, event-sequences, interactive exploration, sequence identification, visual data mining
National Category
Computer Science
urn:nbn:se:liu:diva-58311 (URN)978-91-7393-343-8 (ISBN)
Public defence
2010-09-10, Domteater, Norrköpings Visualiseringscenter C, Kungsgatan 54, 602 33 Norrköping, 09:15 (English)
Available from: 2010-09-01 Created: 2010-08-10 Last updated: 2015-09-22Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Vrotsou, KaterinaYnnerman, AndersCooper, Matthew
By organisation
Visual Information Technology and Applications (VITA)The Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 233 hits
ReferencesLink to record
Permanent link

Direct link