This thesis addresses the question of what native speakers of Swedish do when items originating in English and several other foreign languages occur in their native language. This issue is investigated at the phonological and morphological levels of contemporary Swedish. The perspective is descriptive and the approach employed is empirical, involving analyses of several corpora of spoken and some of written Swedish. The focus is on naturally occurring but not yet well-described phonological and morphological patterns which are critical to, and can be applied in, speech technology applications.
The phonetic and phonological aspects are investigated in two studies. In a spoken language production srudy, well-known foreign names and words were recorded by 491 subjects, yielding almost 24,000 segments of potential interest, which were later transcribed and analyzed at the phonetic level. In a transcription srudy of proper names, 61,500 of the most common names in Sweden were transcribed under guidelines allowing extensions of the allophanic repertoire. The transcription conventions were developed jointly during the course of the exercise by four phonetically trained experts. An analysis of the transcriptions shows that several such extensions were deemed necessary for speech generation in a number of instances and as possible pronunciation variants, that should all be allowed in speech recognition, in even more instances. A couple of phonotactically previously impermissible sequences in Swedish are also encountered and judged as necessary to introduce. Some additional speech sounds were also considered possible but not encountered so far in the sample of names covered.
At the morphological level, it is shown how English word elements take part in Swedish morphological processes such as inflection, derivation and compounding. This is illustrated using examples from several corpora of both spoken and written Swedish. Problems in acquiring enough spoken language data for the application of data-driven methods are also discussed, and it is shown that knowledge-based strategies may in fact be better suited to tackle the task than data-driven alternatives, due to fundamental frequency properties of large corpora.
The overall results suggest that any description of contemporary spoken Swedish (regardless of whether it is formal, pedagogical or technical) needs to be extended with both phonological and morphological material at least of English origin. Socio-linguistic and other possible underlying factors governing the variability observed in the data are examined and it is shown that education and age play a significant role, in the sense that subjects with higher education as well as those between the ages of 25-45 produced significantly more segments that extend beyond the traditional Swedish allophone set. Results also show that the individual variability is large and it issuggested that interacting phonological constraints and their relaxation may be one way of explaining this.
Drawing on the results from the studies made, consequences for Swedish speech technology applications are discussed and a set of requirements is proposed. The conventions for lexical transcription that were developed and subsequently implemented and evaluated in the case of proper names are also used in the implementation of a lexical component, where one publicly available Finite-State tool is first tried out in a pilot study, but shown to be inadequate in terms of the linguistic description it may entail. Therefore, a more flexible toolbox is used in a larger scale proof-of-concept experiment using data from one of the previously analyzed corpora.
The requirements arrived at in this thesis have previously been used in the development of a concatenative demi-syllable-based synthesizer for Swedish, and as one possible strand of future research, it is suggested that the present results be combined with recent advancements in speech alignment/recognition technology on the one hand and unit selection-based synthesis techniques, on the other. In order to be able to choose between different renderings of a particular name, e.g. echoing the user's own pronunciation in a spoken dialogue system, both recognition, dictionary resources, speech alignment and synthesis procedures need to be controlled.
Linköping: Linköpings universitet , 2004. , 151 p.
2004-06-18, Seminarierum Visionen, Hus B, Linköpings Universitet, Linköping, 10:15 (Swedish)