liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interactive Quantification of Categorical Variables in Mixed Data Sets
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.
Linköping University, Department of Science and Technology, Visual Information Technology and Applications (VITA). Linköping University, The Institute of Technology.
2008 (English)In: Information Visualisation, 2008. IV '08. 12th International Conference / [ed] Ebad Banissi, Liz Stuart, Mikael Jern, Gennady Andrienko, Francis T. Marchese, Nasrullah Memon, Reda Alhajj, Theodor G Wyeld, Remo Aslak Burkhard, Georges Grinstein, Dennis Groth, Anna Ursyn, Carsten Maple, Anthony Faiola and Brock Craft, Los Alamitos, California: IEEE Computer Society, 2008, p. 3-10Conference paper, Published paper (Refereed)
Abstract [en]

Data sets containing a combination of categorical and continuous variables (mixed data sets) are difficult to analyse since no generalized similarity measure exists for categorical variables. Quantification of categorical variables makes it possible to represent this type of data using techniques designed for numerical data. This paper presents a quantification process of categorical variables in mixed data sets that incorporates information on relationships among the continuous variables into the process, as well as utilizing the domain knowledge of a user. An interactive visualization environment using parallel coordinates as a visual interface is provided, where the user is able to control the quantification process and analyse the result. The efficiency of the approach is demonstrated using two mixed data sets.

Place, publisher, year, edition, pages
Los Alamitos, California: IEEE Computer Society, 2008. p. 3-10
Series
IEEE International Conference on Information Visualisation, ISSN 1550-6037
Keywords [en]
Categorical data, mixed data, parallel coordinates, quantification, correspondence analysis, clustering
National Category
Other Engineering and Technologies
Identifiers
URN: urn:nbn:se:liu:diva-43480DOI: 10.1109/IV.2008.33ISI: 000259178400001Local ID: 73940ISBN: 978-0-7695-3268-4 (print)OAI: oai:DiVA.org:liu-43480DiVA, id: diva2:264339
Conference
12th International Conference Information Visualisation, IV '08, London, UK, 9-11 July 2008
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2025-02-18Bibliographically approved
In thesis
1. Algorithmically Guided Information Visualization: Explorative Approaches for High Dimensional, Mixed and Categorical Data
Open this publication in new window or tab >>Algorithmically Guided Information Visualization: Explorative Approaches for High Dimensional, Mixed and Categorical Data
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Algoritmiskt vägledd informationsvisualisering för högdimensionell och kategorisk data
Abstract [en]

Facilitated by the technological advances of the last decades, increasing amounts of complex data are being collected within fields such as biology, chemistry and social sciences. The major challenge today is not to gather data, but to extract useful information and gain insights from it. Information visualization provides methods for visual analysis of complex data but, as the amounts of gathered data increase, the challenges of visual analysis become more complex.

This thesis presents work utilizing algorithmically extracted patterns as guidance during interactive data exploration processes, employing information visualization techniques. It provides efficient analysis by taking advantage of fast pattern identification techniques as well as making use of the domain expertise of the analyst. In particular, the presented research is concerned with the issues of analysing categorical data, where the values are names without any inherent order or distance; mixed data, including a combination of categorical and numerical data; and high dimensional data, including hundreds or even thousands of variables.

The contributions of the thesis include a quantification method, assigning numerical values to categorical data, which utilizes an automated method to define category similarities based on underlying data structures, and integrates relationships within numerical variables into the quantification when dealing with mixed data sets. The quantification is incorporated in an interactive analysis pipeline where it provides suggestions for numerical representations, which may interactively be adjusted by the analyst. The interactive quantification enables exploration using commonly available visualization methods for numerical data. Within the context of categorical data analysis, this thesis also contributes the first user study evaluating the performance of what are currently the two main visualization approaches for categorical data analysis.

Furthermore, this thesis contributes two dimensionality reduction approaches, which aim at preserving structure while reducing dimensionality, and provide flexible and user-controlled dimensionality reduction. Through algorithmic quality metric analysis, where each metric represents a structure of interest, potentially interesting variables are extracted from the high dimensional data. The automatically identified structures are visually displayed, using various visualization methods, and act as guidance in the selection of interesting variable subsets for further analysis. The visual representations furthermore provide overview of structures within the high dimensional data set and may, through this, aid in focusing subsequent analysis, as well as enabling interactive exploration of the full high dimensional data set and selected variable subsets. The thesis also contributes the application of algorithmically guided approaches for high dimensional data exploration in the rapidly growing field of microbiology, through the design and development of a quality-guided interactive system in collaboration with microbiologists.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2011. p. 72
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1400
Keywords
Information visualization, data mining, high dimensional data, categorical data, mixed data
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-70860 (URN)978-91-7393-056-7 (ISBN)
Public defence
2011-11-11, Domen, Norrköpings Visualiseringscenter, Kungsgatan 54, 602 33 Norrköping, 09:15 (English)
Opponent
Supervisors
Available from: 2011-10-06 Created: 2011-09-20 Last updated: 2019-12-19Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Johansson, SaraJern, MikaelJohansson, Jimmy

Search in DiVA

By author/editor
Johansson, SaraJern, MikaelJohansson, Jimmy
By organisation
Visual Information Technology and Applications (VITA)The Institute of Technology
Other Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 689 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf