liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cluster Analysis of Discussions on Internet Forums
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Intergrated Computer systems.
2016 (English)Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesisAlternative title
Klusteranalys av Diskussioner på Internetforum (Swedish)
Abstract [en]

The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way.

The activity of finding information from large data collections is known as information retrieval and many tools and techniques have been developed to tackle common problems. Cluster analysis is a technique for grouping similar objects into smaller groups (clusters) such that the objects within a cluster are more similar than objects between clusters.

We have investigated the clustering algorithms, Graclus and Non-Exhaustive Overlapping k-means (NEO-k-means), on textual data taken from Reddit, a social network service. One of the difficulties with the aforementioned algorithms is that both have an input parameter controlling how many clusters to find. We have used a greedy modularity maximization algorithm in order to estimate the number of clusters that exist in discussion threads.

We have shown that it is possible to find subtopics within discussions and that in terms of execution time, Graclus has a clear advantage over NEO-k-means.

Place, publisher, year, edition, pages
2016. , 62 p.
Keyword [en]
Cluster Analysis, Text Mining, Internet Forum
National Category
Computer Science
Identifiers
URN: urn:nbn:se:liu:diva-129934ISRN: LIU-IDA/LITH-EX-G--16/037—SEOAI: oai:DiVA.org:liu-129934DiVA: diva2:945628
External cooperation
iMatrics AB
Subject / course
Computer science
Supervisors
Examiners
Available from: 2016-07-04 Created: 2016-07-01 Last updated: 2016-09-19Bibliographically approved

Open Access in DiVA

fulltext(9313 kB)131 downloads
File information
File name FULLTEXT01.pdfFile size 9313 kBChecksum SHA-512
38149daa3489f85086e59c8cb5a171d7dee8ea541367148fe86c28375efa7dfaf2b8595549543983e2282819f2bb061631e5b86479f411d4666d98c1ee265876
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Holm, Rasmus
By organisation
Artificial Intelligence and Intergrated Computer systems
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 131 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 4055 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf