liu.seSearch for publications in DiVA
1718192021222320 of 69
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Speech Act Classification: Bootstrapping an embedding-based classifier from a rule-based classifier for Swedish sentences
Linköping University, Department of Computer and Information Science.
2024 (English)Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
Abstract [en]

When we speak, we carry out social actions that are mediated through spoken words. For example, through speech, we may ask for directions. These spoken actions are referred to as speech acts. We humans unconsciously understand and categorize speech acts all the time. But, how can we make computers do the same? 

 The objective of this thesis was to develop an automatic classifier for speech acts in Swedish sentences. To do this, I first annotated a test set of speech acts, following the MATTER development cycle. The sentences in this test set originate from online discussion forums. I then developed and trained a rule-based classifier using a subsample of these sentences. Finally, this rule-based classifier was used for automatically annotating a large training set, which was then used for training a neural network for classifying speech acts—essentially bootstrapping the network from the rule-based classifier. This neural network uses SBERT to compute the sentence embeddings of the sentences and then classifies their speech acts based on these embeddings. 

 The results indicate that using the MATTER cycle is a feasible approach for creating a test set for speech acts. Furthermore, the results show that the embedding-based classifier outperforms the rule-based classifier, but also that the rule-based classifier vastly outperforms the baseline. However, it was not possible to conclude if the embedding-based classifier's higher performance was due to the increase in data or because of its differing architecture. 

Place, publisher, year, edition, pages
2024.
Keywords [en]
NLP, Speech Act, SBERT, Rule-Based Classification, Semi-supervised Learning
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:liu:diva-205702ISRN: LIU-IDA/KOGVET-G--24/020--SEOAI: oai:DiVA.org:liu-205702DiVA, id: diva2:1880018
Subject / course
Cognitive science
Supervisors
Examiners
Available from: 2025-05-08 Created: 2024-06-30 Last updated: 2025-05-08Bibliographically approved

Open Access in DiVA

fulltext(1463 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 1463 kBChecksum SHA-512
ce6badbb59b092a8e21d53753fa93f3fe69210a428e59146feacb543692e8b46502d00f746934af5d997bf58818560d4f3da492a07e53b8afa82780376ee8198
Type fulltextMimetype application/pdf

By organisation
Department of Computer and Information Science
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 63 hits
1718192021222320 of 69
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf