liu.seSök publikationer i DiVA
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using a Character-Based Language Model for Caption Generation
Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
2019 (Engelska)Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)Alternativ titel
Användning av teckenbaserad språkmodell för generering av bildtext (Svenska)
Abstract [en]

Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness.

Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data.

To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens.

The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion.

A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens.

Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models.

Ort, förlag, år, upplaga, sidor
2019. , s. 49
Nyckelord [en]
Natural Language Processing, NLP, Machine Learning, ML, Neural Network, Caption Generation, Deep Learning, Recurrent Neural Network, Long-Short-Term-Memory, LSTM, word2vec, Language Model
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
URN: urn:nbn:se:liu:diva-163001ISRN: LIU-IDA/LITH-EX-A--19/095--SEOAI: oai:DiVA.org:liu-163001DiVA, id: diva2:1383356
Ämne / kurs
Datavetenskap
Presentation
2019-11-26, Alan Turing, Linköpings Universitet, Linköping, 13:15 (Engelska)
Handledare
Examinatorer
Tillgänglig från: 2020-01-09 Skapad: 2020-01-07 Senast uppdaterad: 2020-01-09Bibliografiskt granskad

Open Access i DiVA

fulltext(1515 kB)10 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1515 kBChecksumma SHA-512
8ee38a7071963124f495921eb39614582ccd05e693a86c0cd37f0fafef6ceac11210d96b378e6f0a19a4811e272fde5148a4b093e73b40498f5e7c4dcb0ec339
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Keisala, Simon
Av organisationen
Interaktiva och kognitiva system
Språkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 10 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 88 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf