liu.seSearch for publications in DiVA
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Improving Speech Recognition for Arabic language Using Low Amounts of Labeled Data
Linköpings universitet, Institutionen för datavetenskap.
2021 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
Abstract [en]

The importance of Automatic Speech Recognition (ASR) Systems, whose job is to generate text from audio, is increasing as the number of applications of these systems is rapidly going up. However, when it comes to training ASR systems, the process is difficult and rather tedious, and that could be attributed to the lack of training data. ASRs require huge amounts of annotated training data containing the audio files and the corresponding accurately written transcript files. This annotated (labeled) training data is very difficult to find for most of the languages, it usually requires people to perform the annotation manually which, apart from the monetary price it costs, is error-prone. A supervised training task is impractical for this scenario. 

The Arabic language is one of the languages that do not have an abundance of labeled data, which makes its ASR system's accuracy very low compared to other resource-rich languages such as English, French, or Spanish. In this research, we take advantage of unlabeled voice data by learning general data representations from unlabeled training data (only audio files) in a self-supervised task or pre-training phase. This phase is done by using wav2vec 2.0 framework which masks out input in the latent space and solves a contrastive task. The model is then fine-tuned on a few amounts of labeled data. We also exploit models that have been pre-trained on different languages, by using wav2vec 2.0, for the purpose of fine-tuning them on Arabic language by using annotated Arabic data.  

We show that using wav2vec 2.0 framework for pre-training on Arabic is considerably time and resource-consuming. It took the model 21.5 days (about 3 weeks) to complete 662 epochs and get a validation accuracy of 58%.  Arabic is a right-to-left (rtl) language with many diacritics that indicate how letters should be pronounced, these two features make it difficult for Arabic to fit into these models, as it requires heavy pre-processing for the transcript files. We demonstrate that we can fine-tune a cross-lingual model, that is trained on raw waveforms of speech in multiple languages, on Arabic data and get a low word error rate 36.53%. We also prove that by fine-tuning the model parameters we can increase the accuracy, thus, decrease the word error rate from 54.00% to 36.69%.   

sted, utgiver, år, opplag, sider
2021. , s. 37
Serie
LIU-IDA/STAT-A--21/045—SE
Emneord [en]
Arabic Language, Speech Recognition, ASR, Signal Processing, wav2vec, XLSR
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-176437OAI: oai:DiVA.org:liu-176437DiVA, id: diva2:1567188
Eksternt samarbeid
DigitalTolk
Presentation
2021-06-02, 09:20 (engelsk)
Veileder
Examiner
Tilgjengelig fra: 2021-06-18 Laget: 2021-06-16 Sist oppdatert: 2021-06-18bibliografisk kontrollert

Open Access i DiVA

fulltext(2001 kB)773 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 2001 kBChecksum SHA-512
13d450ffa45e3ee98ebd31ae7b2abfc45bcf6f04d7f1f60cc712f220a93e4da37aa00ab0e79baba2c09449c7160d0d8e7852f8741cc2c12a413eb5a152218179
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 773 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 1543 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf