liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Word Space Models for Web User Clustering and Page Prefetching
Linköping University, Department of Computer and Information Science. Linköping University, Faculty of Arts and Sciences.
2012 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Abstract [en]

This study evaluates methods for clustering web users via vector space models, for the purpose of web page prefetching for possible applications of server optimization. An experiment using Latent Semantic Analysis (LSA) is deployed to investigate whether LSA can reproduce the encouraging results obtained from previous research with Random Indexing (RI) and a chaos based optimization algorithm (CAS-C). This is not only motivated by LSA being yet another vector space model, but also by a study indicating LSA to outperform RI in a task similar to the web user clustering and prefetching task. The prefetching task was used to verify the applicability of LSA, where both RI and CAS-C have shown promising results. The original data set from the RI web user clustering and prefetching task was modeled using weighted (tf-idf) LSA. Clusters were defined using a common clustering algorithm (k-means). The least scattered cluster configuration for the model was identified by combining an internal validity measure (SSE) and a relative criterion validity measure (SD index). The assumed optimal cluster configuration was used for the web page prefetching task.   Precision and recall of the LSA based method is found to be on par with RI and CAS-C, in as much that it solves the web user clustering and web task with similar characteristics as unweighted RI. The hypothesized inherent gains to precision and recall by using LSA was neither confirmed nor conclusively disproved. The effects of different weighting functions for RI are discussed and a number of methodological factors are identified for further research concerning LSA based clustering and prefetching.

Place, publisher, year, edition, pages
2012. , 44 p.
Keyword [en]
LSA, RI, CAS-C, Clustering, Prefetching, Web mining
Keyword [sv]
National Category
Computer Science Language Technology (Computational Linguistics)
URN: urn:nbn:se:liu:diva-82012ISRN: LIU-IDA/KOGVET-G--12/029--SEOAI: diva2:557437
Subject / course
Cognitive science programme
Available from: 2012-11-16 Created: 2012-09-27 Last updated: 2012-11-16Bibliographically approved

Open Access in DiVA

thesis-albsu022-120927(519 kB)240 downloads
File information
File name FULLTEXT01.pdfFile size 519 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sundin, Albin
By organisation
Department of Computer and Information ScienceFaculty of Arts and Sciences
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 240 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 157 hits
ReferencesLink to record
Permanent link

Direct link