liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Word Space Models for Web User Clustering and Page Prefetching
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.
2012 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Abstract [en]

This study evaluates methods for clustering web users via vector space models, for the purpose of web page prefetching for possible applications of server optimization. An experiment using Latent Semantic Analysis (LSA) is deployed to investigate whether LSA can reproduce the encouraging results obtained from previous research with Random Indexing (RI) and a chaos based optimization algorithm (CAS-C). This is not only motivated by LSA being yet another vector space model, but also by a study indicating LSA to outperform RI in a task similar to the web user clustering and prefetching task. The prefetching task was used to verify the applicability of LSA, where both RI and CAS-C have shown promising results. The original data set from the RI web user clustering and prefetching task was modeled using weighted (tf-idf) LSA. Clusters were defined using a common clustering algorithm (k-means). The least scattered cluster configuration for the model was identified by combining an internal validity measure (SSE) and a relative criterion validity measure (SD index). The assumed optimal cluster configuration was used for the web page prefetching task.   Precision and recall of the LSA based method is found to be on par with RI and CAS-C, in as much that it solves the web user clustering and web task with similar characteristics as unweighted RI. The hypothesized inherent gains to precision and recall by using LSA was neither confirmed nor conclusively disproved. The effects of different weighting functions for RI are discussed and a number of methodological factors are identified for further research concerning LSA based clustering and prefetching.

Place, publisher, year, edition, pages
2012. , 44 p.
Keyword [en]
LSA, RI, CAS-C, Clustering, Prefetching, Web mining
Keyword [sv]
Segmentering
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-82012ISRN: LIU-IDA/KOGVET-G--12/029--SEOAI: oai:DiVA.org:liu-82012DiVA: diva2:557437
Subject / course
Cognitive science programme
Uppsok
Technology
Available from: 2012-11-16 Created: 2012-09-27 Last updated: 2012-11-16Bibliographically approved

Open Access in DiVA

thesis-albsu022-120927(519 kB)293 downloads
File information
File name FULLTEXT01.pdfFile size 519 kBChecksum SHA-512
6ee643434cbe2406e0c79b881215a5928ddf26011b5e6be8f3773b0635494391b6763afdf85930c855493aaa5bdf5ffe8134b9f4633546232ebbf2ce805243d7
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sundin, Albin
By organisation
Department of Computer and Information ScienceFaculty of Arts and Sciences
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 293 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 209 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf