liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing
Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
2006 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The current surge of interest in search and comparison tasks in natural language processing has brought with it a focus on vector space approaches and vector space dimensionality reduction techniques. Presenting data as points in hyperspace provides opportunities to use a variety of welldeveloped tools pertinent to this representation. Dimensionality reduction allows data to be compressed and generalised. Eigen decomposition and related algorithms are one category of approaches to dimensionality reduction, providing a principled way to reduce data dimensionality that has time and again shown itself capable of enabling access to powerful generalisations in the data. Issues with the approach, however, include computational complexity and limitations on the size of dataset that can reasonably be processed in this way. Large datasets are a persistent feature of natural language processing tasks. This thesis focuses on two main questions. Firstly, in what ways can eigen decomposition and related techniques be extended to larger datasets? Secondly, this having been achieved, of what value is the resulting approach to information retrieval and to statistical language modelling at the ngram level? The applicability of eigen decomposition is shown to be extendable through the use of an extant algorithm; the Generalized Hebbian Algorithm (GHA), and the novel extension of this algorithm to paired data; the Asymmetric Generalized Hebbian Algorithm (AGHA). Several original extensions to the these algorithms are also presented, improving their applicability in various domains. The applicability of GHA to Latent Semantic Analysisstyle tasks is investigated. Finally, AGHA is used to investigate the value of singular value decomposition, an eigen decomposition variant, to ngram language modelling. A sizeable perplexity reduction is demonstrated.

Place, publisher, year, edition, pages
Institutionen för datavetenskap , 2006. , 138 p.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1045
Keyword [en]
Generalized Hebbian Algorithm, Language Modelling, Singular Value Decomposition, Eigen Decomposition, Latent Semantic Analysis, Vector Space Models
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-7922ISBN: 9185643882 (print)OAI: oai:DiVA.org:liu-7922DiVA: diva2:22830
Public defence
2006-11-24, Visionen, Hus E, Campus Valla, Linköpings universitet, Linköping, 00:00 (English)
Opponent
Supervisors
Available from: 2006-12-13 Created: 2006-12-13 Last updated: 2014-01-13

Open Access in DiVA

fulltext(1121 kB)474 downloads
File information
File name FULLTEXT01.pdfFile size 1121 kBChecksum SHA-1
48781b2643744778d32f90a8172296ec452d85ddaccfc99f3397c4c5ca474bc3e3d4c0c5
Type fulltextMimetype application/pdf

Authority records BETA

Gorrell, Genevieve

Search in DiVA

By author/editor
Gorrell, Genevieve
By organisation
NLPLAB - Natural Language Processing LaboratoryThe Institute of Technology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 474 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 413 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf