liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards a Standard Dataset of Swedish Word Vectors
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. (Natural Language Processing)
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. (Natural Language Processing)
Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. (Natural Language Processing)ORCID iD: 0000-0002-2492-9872
2016 (English)In: Proceedings of the Sixth Swedish Language Technology Conference (SLTC), 2016Conference paper, (Refereed)
Abstract [en]

Word vectors, embeddings of words into a low-dimensional space, have been shown to be useful for a large number of natural language processing tasks. Our goal with this paper is to provide a useful dataset of such vectors for Swedish. To this end, we investigate three standard embedding methods: the continuous bag-of-words and the skip-gram model with negative sampling of Mikolov et al. (2013a), and the global vectors of Pennington et al. (2014). We compare these methods using QVEC-CCA (Tsvetkov et al., 2016), an intrinsic evaluation measure that quantifies the correlation of learned word vectors with external linguistic resources. For this propose we use SALDO, the Swedish Association Lexicon (Borin et al., 2013). Our experiments show that the continuous bag-of-words model produces vectors that are most highly correlated to SALDO, with the skip-gram model very close behind. Our learned vectors will be provided for download at the paper’s website.

Place, publisher, year, edition, pages
2016.
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-134901OAI: oai:DiVA.org:liu-134901DiVA: diva2:1077779
Conference
Sixth Swedish Language Technology Conference (SLTC), Umeå 17-18 nov 2016
Available from: 2017-03-01 Created: 2017-03-01 Last updated: 2017-03-29Bibliographically approved

Open Access in DiVA

fulltext(122 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 122 kBChecksum SHA-512
1acce2446056aacc0b267e9f6115eebbca3163ee486faf91a9d2267997ae641f6e35b2c12246916cef0cbf93a193df136594ed0d94430f6d6b26704060818c39
Type fulltextMimetype application/pdf

Other links

PDF

Search in DiVA

By author/editor
Fallgren, PerSegeblad, JesperKuhlmann, Marco
By organisation
Human-Centered systemsFaculty of Science & Engineering
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 21 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf