liu.seSök publikationer i DiVA
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Improving Speech Recognition for Arabic language Using Low Amounts of Labeled Data
Linköpings universitet, Institutionen för datavetenskap.
2021 (Engelska)Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
Abstract [en]

The importance of Automatic Speech Recognition (ASR) Systems, whose job is to generate text from audio, is increasing as the number of applications of these systems is rapidly going up. However, when it comes to training ASR systems, the process is difficult and rather tedious, and that could be attributed to the lack of training data. ASRs require huge amounts of annotated training data containing the audio files and the corresponding accurately written transcript files. This annotated (labeled) training data is very difficult to find for most of the languages, it usually requires people to perform the annotation manually which, apart from the monetary price it costs, is error-prone. A supervised training task is impractical for this scenario. 

The Arabic language is one of the languages that do not have an abundance of labeled data, which makes its ASR system's accuracy very low compared to other resource-rich languages such as English, French, or Spanish. In this research, we take advantage of unlabeled voice data by learning general data representations from unlabeled training data (only audio files) in a self-supervised task or pre-training phase. This phase is done by using wav2vec 2.0 framework which masks out input in the latent space and solves a contrastive task. The model is then fine-tuned on a few amounts of labeled data. We also exploit models that have been pre-trained on different languages, by using wav2vec 2.0, for the purpose of fine-tuning them on Arabic language by using annotated Arabic data.  

We show that using wav2vec 2.0 framework for pre-training on Arabic is considerably time and resource-consuming. It took the model 21.5 days (about 3 weeks) to complete 662 epochs and get a validation accuracy of 58%.  Arabic is a right-to-left (rtl) language with many diacritics that indicate how letters should be pronounced, these two features make it difficult for Arabic to fit into these models, as it requires heavy pre-processing for the transcript files. We demonstrate that we can fine-tune a cross-lingual model, that is trained on raw waveforms of speech in multiple languages, on Arabic data and get a low word error rate 36.53%. We also prove that by fine-tuning the model parameters we can increase the accuracy, thus, decrease the word error rate from 54.00% to 36.69%.   

Ort, förlag, år, upplaga, sidor
2021. , s. 37
Serie
LIU-IDA/STAT-A--21/045—SE
Nyckelord [en]
Arabic Language, Speech Recognition, ASR, Signal Processing, wav2vec, XLSR
Nationell ämneskategori
Signalbehandling
Identifikatorer
URN: urn:nbn:se:liu:diva-176437OAI: oai:DiVA.org:liu-176437DiVA, id: diva2:1567188
Externt samarbete
DigitalTolk
Presentation
2021-06-02, 09:20 (Engelska)
Handledare
Examinatorer
Tillgänglig från: 2021-06-18 Skapad: 2021-06-16 Senast uppdaterad: 2021-06-18Bibliografiskt granskad

Open Access i DiVA

fulltext(2001 kB)773 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2001 kBChecksumma SHA-512
13d450ffa45e3ee98ebd31ae7b2abfc45bcf6f04d7f1f60cc712f220a93e4da37aa00ab0e79baba2c09449c7160d0d8e7852f8741cc2c12a413eb5a152218179
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för datavetenskap
Signalbehandling

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 773 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 1543 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf