liu.seSök publikationer i DiVA
Ändra sökning
Avgränsa sökresultatet
1234567 1 - 50 av 304
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Abrahamsson, Peder
    Linköpings universitet, Institutionen för datavetenskap.
    Mer lättläst: Påbyggnad av ett automatiskt omskrivningsverktyg till lätt svenska2011Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Det svenska språket ska finnas tillgängligt för alla som bor och verkar i Sverige. Därförär det viktigt att det finns lättlästa alternativ för dem som har svårighet att läsa svensktext. Detta arbete bygger vidare på att visa att det är möjligt att skapa ett automatisktomskrivningsprogram som gör texter mer lättlästa. Till grund för arbetet liggerCogFLUX som är ett verktyg för automatisk omskrivning till lätt svenska. CogFLUXinnehåller funktioner för att syntaktiskt skriva om texter till mer lättläst svenska.Omskrivningarna görs med hjälp av omskrivningsregler framtagna i ett tidigare projekt.I detta arbete implementeras ytterligare omskrivningsregler och även en ny modul förhantering av synonymer. Med dessa nya regler och modulen ska arbetet undersöka omdet är det är möjligt att skapa system som ger en mer lättläst text enligt etableradeläsbarhetsmått som LIX, OVIX och Nominalkvot. Omskrivningsreglerna ochsynonymhanteraren testas på tre olika texter med en total lägnd på ungefär hundra tusenord. Arbetet visar att det går att sänka både LIX-värdet och Nominalkvoten signifikantmed hjälp av omskrivningsregler och synonymhanterare. Arbetet visar även att det finnsfler saker kvar att göra för att framställa ett riktigt bra program för automatiskomskrivning till lätt svenska.

    Ladda ner fulltext (pdf)
    fulltext
  • 2.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts.1998Ingår i: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL'98) / [ed] Pierre Isabelle, Stroudsburg, PA, USA: The Association for Computational Linguistics , 1998, s. 29-35Konferensbidrag (Refereegranskat)
  • 3.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Alignment-based profiling of Europarl data in an English-Swedish parallel corpus2010Ingår i: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias, Paris, France: European Language Resources Association (ELRA) , 2010, s. 3398-3404Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the sameparallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structurallevel. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion ofmany types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majorityof Europarl segments are parallel translations.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 4.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Comparing machine translation and human translation: A case study2017Ingår i: RANLP 2017 The First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT) Proceedings of the Workshop, September 7th, 2017 / [ed] Irina Temnikova, Constantin Orasan, Gloria Corpas and Stephan Vogel, Shoumen, Bulgaria: Association for Computational Linguistics , 2017, s. 21-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT.

    Ladda ner fulltext (pdf)
    Comparing machine translation and human translation: A case study
  • 5.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Converting an English-Swedish Parallel Treebank to Universal Dependencies2015Ingår i: Proceedings of the Third International Conference on Dependency Linguistics (DepLing 2015), Association for Computational Linguistics, 2015, s. 10-19, artikel-id W15-2103Konferensbidrag (Refereegranskat)
    Abstract [en]

    The paper reports experiences of automatically converting the dependency analysis of the LinES English-Swedish parallel treebank to universal dependencies (UD). The most tangible result is a version of the treebank that actually employs the relations and parts-of-speech categories required by UD, and no other. It is also more complete in that punctuation marks have received dependencies, which is not the case in the original version. We discuss our method in the light of problems that arise from the desire to keep the syntactic analyses of a parallel treebank internally consistent, while available monolingual UD treebanks for English and Swedish diverge somewhat in their use of UD annotations. Finally, we compare the output from the conversion program with the existing UD treebanks.

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Towards a research infrastructure for translation studies.2014Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In principle the CLARIN research infrastructure provides a good environment to support research on translation. In reality, the progress within CLARIN in this area seems to be fairly slow. In this paper I will give examples of the resources currently available, and suggest what is needed to achieve a relevant research infrastructure for translation studies. Also, I argue that translation studies has more to gain from language technology, and statistical machine translation in particular, than what is generally assumed, and give some examples.

    Ladda ner fulltext (pdf)
    fulltext
  • 7.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Towards an adequate account of parataxis in Universal Dependencies2019Ingår i: Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019) / [ed] Alexandre Rademaker, Francis Tyers, Association for Computational Linguistics, 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    The parataxis relation as defined for Universal Dependencies 2.0 is general and, for this reason,sometimes hard to distinguish from competing analyses, such as coordination, conj, or apposi-tion, appos. The specific subtypes that are listed for parataxis are also quite different in character.In this study we first show that the actual practice by UD-annotators is varied, using the parallelUD (PUD-) treebanks as data. We then review the current definitions and guidelines and suggestimprovements.

  • 8.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Danielsson, Henrik
    Linköpings universitet, Institutet för handikappvetenskap (IHV). Linköpings universitet, Institutionen för beteendevetenskap och lärande, Handikappvetenskap. Linköpings universitet, Filosofiska fakulteten.
    Bengtsson, Staffan
    The Swedish Institute for Disability Research, Jönköping University, Sweden.
    Arvå, Hampus
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Holme, Lotta
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Pedagogik och didaktik. Linköpings universitet, Utbildningsvetenskap.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Studying Disability Related Terms with Swe-Clarin Resources2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    In Swedish, as in other languages, the words used to refer to disabilities and people with disabilities are manifold. Recommendations as to which terms to use have been changed several times over the last hundred years. In this exploratory paper we have used textual resources provided by Swe-Clarin to study such changes quantitatively. We demonstrate that old and new recommendations co-exist for long periods of time, and that usage sometimes converges.

    Ladda ner fulltext (pdf)
    Introduction to proceedings
    Ladda ner fulltext (pdf)
    Article in full text
  • 9.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Holmer, Daniel
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Holmlid, Stefan
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Analysing Changes in Official Use of the Design Concept Using SweCLARIN Resources2022Ingår i: Proceedings of the CLARIN Annual meeting, 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    We show how the tools and language resources developed within the SweClarin infrastructure can be used to investigate changes in the use and understanding of the Swedish related words arkitektur, design, form, and formgivning. Specifically, we compare their use in two governmental public reports on design, one from 1999 and the other from 2015. We test the hypothesis that their meaning has developed in a way that blurs distinctions that may be important to stakeholders in the respective fields.

  • 10.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Holmer, Daniel
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Holmlid, Stefan
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Analysing changes in official use of the design concept using SweCLARIN resources2023Ingår i: Selected papers from the CLARIN Annual Conference 2022 / [ed] Tomaž Erjavec and Maria Eskevich, Linköping: Linköping University Electronic Press, 2023Konferensbidrag (Refereegranskat)
    Abstract [en]

    We investigate changes in the use of four Swedish words from the fields of design and archi- tecture. It has been suggested that their meanings have been blurred, especially in governmental reports and policy documents, so that distinctions between them that are important to stakeholders in the respective fields are lost. Specifically, we compare usage in two governmental public reports on design, one from 1999 and the other from 2015, and additionally in opinion responses to the 2015 report. Our approach is to contextualise occurrences of the words in different representations of the texts using word embeddings, topic modelling and sentiment analysis. Tools and language resources developed within the SweClarin infrastructure have been crucial for the implementation of the study.

  • 11.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Megyesi, BeátaUppsala universitet, Institutionen för lingvistik och filologi.
    Proceedings of the Workshop on NLP and Pseudonymisation2019Proceedings (redaktörskap) (Refereegranskat)
    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 12.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    A knowledge-lite approach to word alignment2000Ingår i: Parallel Text Processing: Alignment and Use of Translation Corpora / [ed] Jean Veronis, Dordrecht, The Netherlands: Kluwer Academic Publishers, 2000, s. 97-116Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.

  • 13.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Correspondence measures for MT evaluation.2000Ingår i: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000, Paris, France: European Language Resources Association (ELRA) , 2000, s. 41-46Konferensbidrag (Refereegranskat)
  • 14.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Ridings, Daniel
    Department of Swedish Language, Goteborg University, Goteborg Sweden.
    Sågvall Hein, Anna
    Department of Linguistics, Uppsala University, Uppsala Sweden.
    Tiedemann, Jörg
    Department of Linguistics, Uppsala University, Uppsala Sweden.
    Automatic Processing of Parallel Corpora: A Swedish Perspective1999Rapport (Övrigt vetenskapligt)
    Abstract [en]

    As empirical methods have come to the fore in multilingual language technology and translation studies, the processing of parallel texts and parallel corpora have become a major research area in computational linguistics. In this article we review the state of the art in alignment and data extraction techniques for parallel texts, and give an overview of current work in Sweden in this area. In a final section, we summarize the results achieved so far and make some proposals for future research.

    Ladda ner fulltext (pdf)
    fulltext
  • 15.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Sågvall Hein, Anna
    Institutionen för lingvistik, Uppsala universitet..
    Tiedemann, Jörg
    Institutionen för lingvistik, Uppsala universitet.
    Evaluation of word alignment systems2000Ingår i: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000), Paris, France: European Language Resources Association (ELRA) , 2000, s. 1255-1261Konferensbidrag (Refereegranskat)
  • 16.
    Albertsson, Sarah
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Rennes, Evelina
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Similarity-Based Alignment of Monolingual Corpora for Text Simplification2016Ingår i: CL4LC 2016 - Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop, The COLING 2016 Organizing Committee , 2016, s. 154-163Konferensbidrag (Refereegranskat)
    Abstract [en]

    Comparable or parallel corpora are beneficial for many NLP tasks.  The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus.  The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was devel- oped, and the methods were compared.  For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

    Ladda ner fulltext (pdf)
    fulltext
  • 17.
    Anderberg, Caroline
    Linköpings universitet, Institutionen för datavetenskap.
    Text complexity visualisations: An exploratory study on teachers interpretations of radar chart visualisations of text complexity2022Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Det är både en viktig och krävande uppgift för lärare att hitta lämplig textnivå för elever med varierande läsförmågor. Radardiagramsvisualiseringar av textkomplexitet kan poten- tiellt stötta den processen, men de måste utvärderas för att undersöka om de är intuitiva, vilka mått som bör inkluderas samt om de säger något om komplexiteten av en text. Den här studien utforskar hur visualiseringar av textkomplexitet i form av radardiagram tolkas, vilka mått de bör inkludera samt vilken information de bör innehålla i syfte att vara begripliga för lärare som jobbar med elever med språk och/eller lässvårigheter. En förundersökning och tre fokusgruppsessioner utfördes, med lärare från särgymnasium och särvuxskolor. Efter tematisk analys av data från fokusgrupperna genererades fem teman. Reultaten visade att visualiseringarna var begripliga till viss del, men de behöver anpassas till målgruppen genom att se till att måtten är relevanta samt att skalan, färgerna, kategorierna och måtten är tydligt förklarade.

    Ladda ner fulltext (pdf)
    fulltext
  • 18.
    Andersson, Elsa
    Linköpings universitet, Institutionen för datavetenskap.
    Methods for increasing cohesion in automatically extracted summaries of Swedish news articles: Using and extending multilingual sentence transformers in the data-processing stage of training BERT models for extractive text summarization2022Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Developments in deep learning and machine learning overall has created a plethora of opportunities for easier training of automatic text summarization (ATS) models for producing summaries with higher quality. ATS can be split into extractive and abstractive tasks; extractive models extract sentences from the original text to create summaries. On the contrary, abstractive models generate novel sentences to create summaries. While extractive summaries are often preferred over abstractive ones, summaries created by extractive models trained on Swedish texts often lack cohesion, which affects the readability and overall quality of the summary. Therefore, there is a need to improve the process of training ATS models in terms of cohesion, while maintaining other text qualities such as content coverage. This thesis explores and implements methods at the data-processing stage aimed at improving cohesion of generated summaries. The methods are based around Sentence-BERT for creating advanced sentence embeddings that can be used to rank sentences in a text in terms of if it should be included in the extractive summary or not. Three models are trained using different methods and evaluated using ROUGE, BERTScore for measuring content coverage and Coh-Metrix for measuring cohesion. The results of the evaluation suggest that the methods can indeed be used to create more cohesive summaries, although content coverage was reduced, which gives rise to the potential for extensive future exploration of further implementation. 

    Ladda ner fulltext (pdf)
    fulltext
  • 19.
    Andersson, Elsa
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Methods for increasing cohesion in automatically extracted summaries of Swedish news articles2022Konferensbidrag (Refereegranskat)
  • 20.
    Andersson, Henrik
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
    Anchor-based Topic Modeling with Human Interpretable Results2020Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Topic models are useful tools for exploring large data sets of textual content by exposing a generative process from which the text was produced. Anchor-based topic models utilize the anchor word assumption to define a set of algorithms with provable guarantees which recover the underlying topics with a run time practically independent of corpus size. A number of extensions to the initial anchor word-based algorithms, and enhancements made to tangential models, have been proposed which improve the intrinsic characteristics of the model making them more interpretable by humans. This thesis evaluates improvements to human interpretability due to: low-dimensional word embeddings in combination with a regularized objective function, automatic topic merging using tandem anchors, and utilizing word embeddings to synthetically increase corpus density. Results show that tandem anchors are viable vehicles for automatic topic merging, and that using word embeddings significantly improves the original anchor method across all measured metrics. Combining low-dimensional embeddings and a regularized objective results in computational downsides with small or no improvements to the metrics measured.

    Ladda ner fulltext (pdf)
    fulltext
  • 21.
    Aretoulaki, Maria
    et al.
    FORWISS, Bavarian Research Centre for Knowledge-Based Systems, Erlangen, Germany.
    Ludwig, Bernd
    FORWISS, Bavarian Research Centre for Knowledge-Based Systems, Erlangen, Germany.
    Automation-Descriptions and Theorem-Proving: A Marriage made in Heaven?1999Rapport (Övrigt vetenskapligt)
    Abstract [en]

    In this paper, Finite-State-Automata (FSA) and theorem-proving approaches to spoken dialogue systems (SLDS) are contrasted to each other. FSA are too rigid to deal with unpredictable user reactions, such as corrections or counter-questions, whereas plan-based approaches are usually too complex to be effectively used, given the unreliability of word recognition and the elliptical and unconventional nature of spontaneous speech. As an alternative, a Dialogue Manager architecture is proposed which uses knowledge on both the possible sequences of dialogue acts and the dynamic representation of the task and requirements for its fulfillment. The behaviour of the specific user is taken into consideration, including their expectations about the system and the service offered, as are instances of miscommunication and disagreement in the course of the dialogue, and the successful completion of sub-plans relevant to the task and the dialogue flow.

    Ladda ner fulltext (pdf)
    fulltext
  • 22.
    Askarieh, Sona
    Linköpings universitet, Institutionen för kultur och kommunikation. Linköpings universitet, Filosofiska fakulteten.
    Cohesion and Comprehensibility in Swedish-English Machine Translated Texts2014Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Access to various texts in different languages causes an increasing demand for fast, multi-purpose, and cheap translators. Pervasive internet use intensifies the necessity for intelligent and cheap translators, since traditional translation methods are excessively slow to translate different texts. During the past years, scientists carried out much research in order to add human and artificial intelligence into the old machine translation systems and the idea of developing a machine translation system came into existence during the days of World War (Kohenn, 2010). The new invention was useful in order to help the human translators and many other people who need to translate different types of texts according to their needs. The new translation systems are useful in meeting people’s needs. Since the machine translation systems vary according to the quality of the systems outputs, their performance should be evaluated from the linguistic point of view in order to reach a fair judgment about the quality of the systems outputs. To achieve this goal, two various Swedish texts were translated by two different machine translation systems in the thesis. The translated texts were evaluated to examine the extent to which errors affect the comprehensibility of the translations. The performances of the systems were evaluated using three approaches. Firstly, most common linguistically errors, which appear in the machine translation systems outputs, were analyzed (e.g. word alignment of the translated texts). Secondly, the influence of different types of errors on the cohesion chains were evaluated. Finally, the effect of the errors on the comprehensibility of the translations were investigated.

    Numerical results showed that some types of errors have more effects on the comprehensibility of the systems’ outputs. The obtained data illustrated that the subjects’ comprehension of the translated texts depend on the type of error, but not frequency. The analyzing depicted which translation system had best performance.

    Ladda ner fulltext (pdf)
    fulltext
  • 23.
    Asutay, Erkin
    et al.
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Psykologi. Linköpings universitet, Filosofiska fakulteten.
    Genevsky, Alexander
    Erasmus Univ, Netherlands.
    Barrett, Lisa Feldman
    Northeastern Univ, MA 02115 USA.
    Hamilton, Paul
    Linköpings universitet, Institutionen för biomedicinska och kliniska vetenskaper, Centrum för social och affektiv neurovetenskap. Linköpings universitet, Medicinska fakulteten.
    Slovic, Paul
    Decis Res, OR USA.
    Västfjäll, Daniel
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Psykologi. Linköpings universitet, Filosofiska fakulteten. Decis Res, OR USA.
    Affective Calculus: The Construction of Affect Through Information Integration Over Time2021Ingår i: Emotion, ISSN 1528-3542, E-ISSN 1931-1516, Vol. 21, nr 1, s. 159-174Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Humans receive a constant stream of input that potentially influence their affective experience. Despite intensive research on affect, it is still largely unknown how various sources of information are integrated into the single, unified affective features that accompany consciousness. Here, we aimed to investigate how a stream of evocative input we receive is dynamically represented in self-reported affect. In 4 experiments, participants viewed a number of sequentially presented images and reported their momentary affective experience on valence and arousal scales. The number and duration of images in a trial varied across studies. In Study 4, we also measured participants physiological responses while they viewed images. We formulated and compared several models with respect to their capacity to predict self-reported affect based on normative image ratings, physiological measurements, and prior affective experience (measured in the previous trial). Our data best supported a model incorporating a temporally sensitive averaging mechanism for affective integration that assigns higher weights to effectively more potent and recently represented stimuli. Crucially, affective averaging of sensory information and prior affect accounted for distinct contributions to currently experienced affect. Taken together, the current study provides evidence that prior affect and integrated affective impact of stimuli partly shape currently experienced affect.

  • 24.
    Asutay, Erkin
    et al.
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Psykologi. Linköpings universitet, Filosofiska fakulteten.
    Genevsky, Alexander
    Linköpings universitet, Institutionen för ekonomisk och industriell utveckling, Nationalekonomi. Linköpings universitet, Filosofiska fakulteten. Erasmus Univ, Netherlands.
    Hamilton, Paul
    Linköpings universitet, Institutionen för biomedicinska och kliniska vetenskaper, Centrum för social och affektiv neurovetenskap. Linköpings universitet, Medicinska fakulteten. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Västfjäll, Daniel
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Psykologi. Linköpings universitet, Filosofiska fakulteten. Decis Res, OR USA.
    Affective Context and Its Uncertainty Drive Momentary Affective Experience2022Ingår i: Emotion, ISSN 1528-3542, E-ISSN 1931-1516, Vol. 22, nr 6, s. 1336-1346Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Affect fluctuates in a moment-to-moment fashion, reflecting the continuous relationship between the individual and the environment. Despite substantial research, there remain important open questions regarding how a stream of sensory input is dynamically represented in experienced affect. Here, approaching affect as a temporally dependent process, we show that momentary affect is shaped by a combination of the affective impact of stimuli (i.e., visual images for the current studies) and previously experienced affect. We also found that this temporal dependency is influenced by uncertainty of the affective context. Participants in each trial viewed sequentially presented images and subsequently reported their affective experience, which was modeled based on images normative affect ratings and participants previously reported affect. Study 1 showed that self-reported valence and arousal in a given trial is partly shaped by the affective impact of the given images and previously experienced affect. In Study 2, we manipulated context uncertainty by controlling occurrence probabilities for normatively pleasant and unpleasant images in separate blocks. Increasing context uncertainty (i.e., random occurrence of pleasant and unpleasant images) was associated with increased negative affect. In addition, the relative contribution of the most recent image to experienced pleasantness increased with increasing context uncertainty. Taken together, these findings provide clear behavioral evidence that momentary affect is a temporally dependent and continuous process, which reflects the affective impact of recent input variables and the previous internal state, and that this process is sensitive to the affective context and its uncertainty.

  • 25.
    Auer, Cornelia
    et al.
    Zuse Institut Berlin, Germany.
    Hotz, Ingrid
    Zuse Institut Berlin, Germany.
    Complete Tensor Field Topology on 2D Triangulated Manifolds embedded in 3D2011Ingår i: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 30, nr 3, s. 831-840Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper is concerned with the extraction of the surface topology of tensor fields on 2D triangulated manifoldsembedded in 3D. In scientific visualization topology is a meaningful instrument to get a hold on the structure of agiven dataset. Due to the discontinuity of tensor fields on a piecewise planar domain, standard topology extractionmethods result in an incomplete topological skeleton. In particular with regard to the high computational costs ofthe extraction this is not satisfactory. This paper provides a method for topology extraction of tensor fields thatleads to complete results. The core idea is to include the locations of discontinuity into the topological analysis.For this purpose the model of continuous transition bridges is introduced, which allows to capture the entiretopology on the discontinuous field. The proposed method is applied to piecewise linear three-dimensional tensorfields defined on the vertices of the triangulation and for piecewise constant two or three-dimensional tensor fieldsgiven per triangle, e.g. rate of strain tensors of piecewise linear flow fields.

  • 26.
    Axelsson, Nils
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
    Dynamic Programming Algorithms for Semantic Dependency Parsing2017Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Dependensparsning kan vara ett användbart verktyg för att få datorer att kunna läsa text. Kuhlmann och Jonsson kom 2015 fram till ett logiskt deduktionssystem som kan parsa till ickekorsande grafer med en asymptotisk tidskomplexitet O(n3), där "n" är meningens som parsas längd. Detta arbete utökar Kuhlmann och Jonssons deduktionssystem så att det kan introducera vissa korsande bågar, medan en asymptotisk tidskomplexitet O(n4) uppnås.

    För att tillåta deduktionssystemet att introducera korsande bågar, introduceras 15 nya logiska delgrafstyper, eller item. Dessa item-typer tillåter deduktionssystemet att introducera korsande bågar på ett sådant sätt att acyklicitet bibehålls. Antalet logiska inferensregler tags från Kuhlmanns och Jonssons 19 till 172, på grund av den större mängden kombinationer av de nu 20 item-typerna.

    Resultatet är en mindre ökning av täckning på testdata (ungefär 10 procentenheter, d v s från cirka 70% till 80%), och jämförbar placering med Kuhlmann och Jonsson enligt måtten från uppgift 18 från SemEval 2015. Härledningsunikhet kan inte garanteras på grund av hur bågar introduceras i det nya deduktionssystemet. Den utökade algoritmen, QAC, parsar till en svårdefinierad grafklass, som jämförs empiriskt med 1-endpoint-crossing-grafer och grafer med pagenumber 2 eller mindre. QAC:s grafklass har lägre täckning än båda dessa, och har ingen högre gräns i pagenumber eller antal korsningar.

    Slutsatsen är att det inte nödvändigtvis är optimalt att utöka ett mycket minimalt och specifikt deduktionssystem, och att det kan vara bättre att inleda processen med en specifik grafklass i åtanke. Dessutom föreslås flera alternativa metoder för att utöka Kuhlmann och Jonsson.

    Ladda ner fulltext (pdf)
    fulltext
  • 27.
    Axelsson, Robin
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska högskolan.
    Implementation och utvärdering av termlänkare i Java2013Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Aligning parallell terms in a parallell corpus can be done by aligning all words and phrases in the corpus and then performing term extraction on the aligned set of word pairs. Alternatively, term extraction in the source and target text can be made separately and then the resulting term candidates can be aligned, forming aligned parallell terms. This thesis describes an implementation of a word aligner that is applied on extracted term candidates in both the source and the target texts. The term aligner uses statistical measures, the tool Giza++ and heuristics in the search for alignments. The evaluation reveals that the best results are obtained when the term alignment relies heavily on the Giza++ tool and Levenshtein heuristic.

    Ladda ner fulltext (pdf)
    fulltext01
  • 28.
    Basirat, Ali
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. Uppsala Univ, Sweden.
    Allassonniere-Tang, Marc
    Univ Lyon, France.
    Berdicevskis, Aleksandrs
    Univ Gothenburg, Sweden.
    An empirical study on the contribution of formal and semantic features to the grammatical gender of nouns2021Ingår i: Linguistics Vanguard, E-ISSN 2199-174X, Vol. 7, nr 1, artikel-id 20200048Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This study conducts an experimental evaluation of two hypotheses about the contributions of formal and semantic features to the grammatical gender assignment of nouns. One of the hypotheses (Corbett and Fraser 2000) claims that semantic features dominate formal ones. The other hypothesis, formulated within the optimal gender assignment theory (Rice 2006), states that form and semantics contribute equally. Both hypotheses claim that the combination of formal and semantic features yields the most accurate gender identification. In this paper, we operationalize and test these hypotheses by trying to predict grammatical gender using only character-based embeddings (that capture only formal features), only context-based embeddings (that capture only semantic features) and the combination of both. We performed the experiment using data from three languages with different gender systems (French, German and Russian). Formal features are a significantly better predictor of gender than semantic ones, and the difference in prediction accuracy is very large. Overall, formal features are also significantly better than the combination of form and semantics, but the difference is very small and the results for this comparison are not entirely consistent across languages.

  • 29.
    Bilos, Rober
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Incremental scanning and token-based editing1987Licentiatavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    A primary goal with this thesis work has been to investigate the consequences of a token-based program representation. Among the results which are presented here are an incremental scanning algorithm together with a token-based syntax sensitive editing approach for program editing.The design and implementation of an incremental scanner and a practically useful syntax-sensitive editor is described in some detail. The language independent incremental scanner converts textual edit operations to corresponding operations on the token sequence. For example, user input is converted to tokens as it is typed in. This editor design makes it possible to edit programs with almost the same flexibility as with a conventional text editor and also provides some features offered by a syntax-directed editor, such as template instantiation, automatic indentation and prettyprinting, lexical and syntactic error handling.We have found that a program represented as a token sequence can on the average be represented in less than half the storage space required for a program in text form. Also, interactive syntax checking is speeded up since rescanning is not needed.The current implementation, called TOSSED - Token-based Syntax Sensitive Editor, supports editing and development of programs written in Pascal. The user is guaranteed a lexically and syntactically correct program on exit from the editor, which avoids many unnecessary compilations. The scanner, parser, prettyprinter, and syntactic error recovery are table-driven and language independent template specification is supported. Thus, editors supporting other languages can be generated.

  • 30.
    Bissessar, Daniel
    et al.
    Linköpings universitet, Institutionen för datavetenskap.
    Bois, Alexander
    Linköpings universitet, Institutionen för datavetenskap.
    Evaluation of methods for question answering data generation: Using large language models2022Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    One of the largest challenges in the field of artificial intelligence and machine learning isthe acquisition of a large quantity of quality data to train models on.This thesis investigates and evaluates approaches to data generation in a telecom domain for the task of extractive QA. To do this a pipeline was built using a combination ofBERT-like models and T5 models for data generation. We then evaluated our generateddata using the downstream task of QA on a telecom domain data set. We measured theperformance using EM and F1-scores. We achieved results that are state of the art on thetelecom domain data set.We found that synthetic data generation is a viable approach to obtaining synthetictelecom QA data with the potential of improving model performance when used in addition to human-annotated data. We also found that using models from the general domainprovided results that are on par or better than domain-specific models for the generation, which provides possibilities to use a single generation pipeline for many differentdomains. Furthermore, we found that increasing the amount of synthetic data providedlittle benefit for our models on the downstream task, with diminishing returns setting inquickly. We were unable to pinpoint the reason for this. In short, our approach works butmuch more work remains to understand and optimize it for greater results

    Ladda ner fulltext (pdf)
    fulltext
  • 31.
    Borggren, Lukas
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem.
    Automatic Categorization of News Articles With Contextualized Language Models2021Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss.

    Ladda ner fulltext (pdf)
    fulltext
  • 32.
    Boye, Johan
    et al.
    Telia Research, Spoken Language Processing, Farsta Sweden.
    Wirén, Mats
    Telia Research, Spoken Language Processing, Farsta Sweden.
    Rayner, Manny
    SRI International, Millers Yard, Cambridge UK.
    Lewin, Ian
    SRI International, Millers Yard, Cambridge UK.
    Carter, David
    SRI International, Millers Yard, Cambridge UK.
    Becket, Ralph
    SRI International, Millers Yard, Cambridge UK.
    Language-Processing Strategies and Mixed-Initiative Dialogues1999Rapport (Övrigt vetenskapligt)
    Abstract [en]

    We describe an implemented spoken-language dialogue system for a travel-planning domain, which accesses a commercially available travel-information web-server and supports a flexible mixed-initiative dialogue strategy. We argue, based on data from initial Wizard-of-Oz experiments, that mixed-initiative strategies are appropriate for many types of user, but require more sophisticated architectures for processing of language and dialogue; we then use these observations to motivate an architecture which combines parallel deep and shallow natural language analysis engines and an agenda-driven dialogue manager. We outline the top-level processing strategy used by the dialogue manager, and also a novel formalism, which we call Flat Utterance Description, that allows us to reduce the output of the deep and shallow language-processing engines to a common representation.

    Ladda ner fulltext (pdf)
    fulltext
  • 33.
    Braun, Marc
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten. University of Stuttgart, Fraunhofer IPA.
    Kunz, Jenny
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.
    A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models2024Konferensbidrag (Refereegranskat)
  • 34.
    Bremin, Sofia
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Hu, Hongzhan
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Karlsson, Johanna
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Prytz Lillkull, Anna
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Wester, Martin
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Danielsson, Henrik
    Linköpings universitet, Institutet för handikappvetenskap (IHV). Linköpings universitet, Institutionen för beteendevetenskap och lärande, Handikappvetenskap. Linköpings universitet, Filosofiska fakulteten.
    Stymne, Sara
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Methods for human evaluation of machine translation2010Ingår i: Proceedings of the Swedish Language Technology Conference (SLTC2010), 2010, s. 47-48Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic metrics. The main difficulty lies in the fact that there is not one single correct translation, but many alternative good translation options.MT systems are often evaluated using automatic metrics, which commonly rely on comparing a translation to only a single human reference translation. An alternative is different types of human evaluations, commonly ranking be-tween systems or estimations of adequacy and fluency on some scale, or error analyses.

    We have explored four different evaluation methods on output from three different statistical MT systems. The main focus is on different types of human evaluation. We compare two conventional evaluation methods, human error analysis and automatic metrics, to two lesser used evaluation methods based on reading comprehension and eye-tracking. These two methods of evaluations are performed without the subjects seeing the source sentence. There have been few previous attempts of using reading comprehension and eye-tracking for MT evaluation.

    One example of a reading comprehension study is Fuji (1999) who conducted an experiment to compare English-to-Japanese MT to several versions of manual corrections of the system output. He found significant differences be-tween texts with large differences on reading comprehension questions. Doherty and O’Brien (2009) is the only study we are aware of using eye-tracking for MT evaluation. They found that the average gaze time and fixation counts were significantly lower for sentences judged as excellent in an earlier evaluation, than for bad sentences.

    Like previous research we find that both reading comprehension and eye-tracking can be useful for MT evaluation.

    The results of these methods are consistent with the other methods for comparison between systems with a big quality difference. For systems with similar quality, however, the different evaluation methods often does not show any significant differences.

  • 35.
    Bretan, Ivan
    et al.
    Telia Research AB, Haninge, SWEDEN.
    Eklund, Robert
    Telia Research AB, Haninge, SWEDEN.
    MacDermid, Catriona
    Telia Research AB, Haninge, SWEDEN.
    Approaches to gathering realistic training data for speech translation systems1996Ingår i: Proceedings of Third IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, 1996, Institute of Electrical and Electronics Engineers (IEEE), 1996, s. 97-100Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Spoken Language Translator (SLT) is a multi-lingual speech-to-speech translation prototype supporting English, Swedish and French within the air traffic information system (ATIS) domain. The design of SLT is characterized by a strongly corpus-driven approach, which accentuates the need for cost-efficient collection procedures to obtain training data. This paper discusses various approaches to the data collection issue pursued within a speech translation framework. Original American English speech and language data have been collected using traditional Wizard-of-Oz (WOZ) techniques, a relatively costly procedure yielding high-quality results. The resulting corpus has been translated textually into Swedish by a large number of native speakers (427) and used as prompts for training the target language speech model. This ᅵbudgetᅵ collection method is compared to the accepted method, i.e., gathering data by means of a full-blown WOZ simulation. The results indicate that although translation in this case proved economical and produced considerable data, the method is not sensitive to certain features typical of spoken language, for which WOZ is superior

    Ladda ner fulltext (pdf)
    Approaches to gathering realistic training data for speech translation systems
  • 36.
    Bridal, Olle
    Linköpings universitet, Institutionen för datavetenskap.
    Named-entity recognition with BERT for anonymization of medical records2021Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Sharing data is an important part of the progress of science in many fields. In the largely deep learning dominated field of natural language processing, textual resources are in high demand. In certain domains, such as that of medical records, the sharing of data is limited by ethical and legal restrictions and therefore requires anonymization. The process of manual anonymization is tedious and expensive, thus automated anonymization is of great value. Since medical records consist of unstructured text, pieces of sensitive information have to be identified in order to be masked for anonymization. Named-entity recognition (NER) is the subtask of information extraction named entities, such as person names or locations, are identified and categorized. Recently, models that leverage unsupervised training on large quantities of unlabeled training data have performed impressively on the NER task, which shows promise in their usage for the problem of anonymization. In this study, a small set of medical records was annotated with named-entity tags. Because of the lack of any training data, a BERT model already fine-tuned for NER was then evaluated on the evaluation set. The aim was to find out how well the model would perform on NER on medical records, and to explore the possibility of using the model to anonymize medical records. The most positive result was that the model was able to identify all person names in the dataset. The average accuracy for identifying all entity types was however relatively low. It is discussed that the success of identifying person names shows promise in the model’s application for anonymization. However, because the overall accuracy is significantly worse than that of models fine-tuned on domain-specific data, it is suggested that there might be better methods for anonymization in the absence of relevant training data.

    Ladda ner fulltext (pdf)
    fulltext
  • 37.
    Capshaw, Riley
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
    Relation Classification using Semantically-Enhanced Syntactic Dependency Paths: Combining Semantic and Syntactic Dependencies for Relation Classification using Long Short-Term Memory Networks2018Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Many approaches to solving tasks in the field of Natural Language Processing (NLP) use syntactic dependency trees (SDTs) as a feature to represent the latent nonlinear structure within sentences. Recently, work in parsing sentences to graph-based structures which encode semantic relationships between words—called semantic dependency graphs (SDGs)—has gained interest. This thesis seeks to explore the use of SDGs in place of and alongside SDTs within a relation classification system based on long short-term memory (LSTM) neural networks. Two methods for handling the information in these graphs are presented and compared between two SDG formalisms. Three new relation extraction system architectures have been created based on these methods and are compared to a recent state-of-the-art LSTM-based system, showing comparable results when semantic dependencies are used to enhance syntactic dependencies, but with significantly fewer training parameters.

    Ladda ner fulltext (pdf)
    fulltext
  • 38.
    Carlsson, Bertil
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Using the pyramid method to create gold standards for evaluation of extraction based text summarization techniques2010Ingår i: Proceedings of the Third Swedish Language Technology Conference (SLTC-2010), 2010Konferensbidrag (Refereegranskat)
  • 39.
    Cederblad, Gustav
    Linköpings universitet, Institutionen för datavetenskap.
    Finding Synonyms in Medical Texts: Creating a system for automatic synonym extraction from medical texts2018Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    This thesis describes the work of creating an automatic system for identifying synonyms and semantically related words in medical texts. Before this work, as a part of the project E-care@home, medical texts have been classified as either lay or specialized by both a lay annotator and an expert annotator. The lay annotator, in this case, is a person without any medical knowledge, whereas the expert annotator has professional knowledge in medicine. Using these texts made it possible to create co-occurrences matrices from which the related words could be identified. Fifteen medical terms were chosen as system input. The Dice similarity of these words in a context window of ten words around them was calculated. As output, five candidate related terms for each medical term was returned. Only unigrams were considered. The candidate related terms were evaluated using a questionnaire, where 223 healthcare professionals rated the similarity using a scale from one to five. A Fleiss kappa test showed that the agreement among these raters was 0.28, which is a fair agreement. The evaluation further showed that there was a significant correlation between the human ratings and the relatedness score (Dice similarity). That is, words with higher Dice similarity tended to get a higher human rating. However, the Dice similarity interval in which the words got the highest average human rating was 0.35-0.39. This result means that there is much room for improving the system. Further developments of the system should remove the unigram limitation and expand the corpus the provide a more accurate and reliable result.

    Ladda ner fulltext (pdf)
    fulltext
  • 40.
    Collins, Christopher
    et al.
    Ontario Tech University, Oshawa, Canada.
    Fokkens, Antske
    Free University, Amsterdam, Netherlands.
    Kerren, Andreas
    Linköpings universitet, Institutionen för teknik och naturvetenskap, Medie- och Informationsteknik. Linköpings universitet, Tekniska fakulteten. Linnaeus University, Sweden.
    Weaver, Chris
    University of Oklahoma, Norman, USA.
    Chatzimparmpas, Angelos
    Linnaeus University, Department of Computer Science and Media Technology, ISOVIS Research Group, Sweden.
    Visual Text Analytics: Report from Dagstuhl Seminar 221912022Rapport (Övrigt vetenskapligt)
    Abstract [en]

    Text data is one of the most abundant types of data available, produced every day across all domains of society. Understanding the contents of this data can support important policy decisions, help us understand society and culture, and improve business processes. While machine learning techniques are growing in their power for analyzing text data, there is still a clear role for human analysis and decision-making. This seminar explored the use of visual analytics applied to text data as a means to bridge the complementary strengths of people and computers. The field of visual text analytics applies visualization and interaction approaches which are tightly coupled to natural language processing systems to create analysis processes and systems for examining text and multimedia data. During the seminar, interdisciplinary working groups of experts from visualization, natural language processing, and machine learning examined seven topic areas to reflect on the state of the field, identify gaps in knowledge, and create an agenda for future cross-disciplinary research. This report documents the program and the outcomes of Dagstuhl Seminar 22191 "Visual Text Analytics".

    Ladda ner fulltext (pdf)
    fulltext
  • 41.
    Dahlbäck, Nils
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten.
    Forsblad, Mattias
    Linköpings universitet, Institutionen för samhälls- och välfärdsstudier, Avdelningen Åldrande och social förändring. Linköpings universitet, Filosofiska fakulteten.
    Hydén, Lars-Christer
    Linköpings universitet, Institutionen för samhälls- och välfärdsstudier, Avdelningen Åldrande och social förändring. Linköpings universitet, Filosofiska fakulteten.
    Reflections and Comments on Research on Memory and Conversation From an Ethnographic Perspective2019Ingår i: Topics in Cognitive Science, ISSN 1756-8757, E-ISSN 1756-8765, Vol. 11, nr 4, s. 817-820Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Reflecting on three papers included in this issue, we suggest that research on memory and conversation could benefit by making more use of analyzing real-life situations or close to real-life scenarios, full speech and body interactions, and the interaction with the physical environment. We also suggest that the process of remembering during conversation is investigated on a level of detail and sequence that allow for locating actual functions of different actions. Finally, we suggest that a life-span perspective on transactive memory systems must also model the development, maintenance, breakdown, and reestablishment of such systems.

    Ladda ner fulltext (pdf)
    fulltext
  • 42.
    Danielsson, Benjamin
    Linköpings universitet, Institutionen för datavetenskap.
    A Study on Text Classification Methods and Text Features2019Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance

    Ladda ner fulltext (pdf)
    fulltext
  • 43.
    Danielsson, Benjamin
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem.
    Exploring Patient Classification Based on Medical Records: The case of implant bearing patients2022Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    In this thesis, the application of transformer-based models on the real-world task of identifying patients as implant bearing is investigated. The task is approached as a classification task and five transformer-based models relying on the BERT architecture are implemented, along with a Support Vector Machine (SVM) as a baseline for comparison. The models are fine-tuned with Swedish medical texts, i.e. patients’ medical histories.

    The five transformer-based models in question makes use of two pre-trained BERT models, one released by the National Library of Sweden and a second one using the same pre-trained model but which has also been further pre-trained on domain specific language. These are in turn fine-tuned using five different types of architectures. These are: (1) a typical BERT model, (2) GAN-BERT, (3) RoBERT, (4) chunkBERT, (5) a frequency based optimized BERT. The final classifier, an SVM baseline, is trained using TF-IDF as the feature space.

    The data used in the thesis comes from a subset of an unreleased corpus from four Swedish clinics that cover a span of five years. The subset contains electronic medical records of patients belonging to the radiology, and cardiology clinics. Four training sets were created, respectively containing 100, 200, 300, and 903 labelled records. The test set, containing 300 labelled samples, was also created from said subset. The labels upon which the models are trained are created by labelling the patients as implant bearing based on the amount of implant terms each patient history contain.

    The results are promising, and show favourable performance when classifying the patient histories. Models trained on 903 and 300 samples are able to outperform the baseline, and at their peak, BERT, chunkBERT and the frequency based optimized BERT achieves an F1-measure of 0.97. When trained using 100 and 200 labelled records all of the transformerbased models are outperformed by the baseline, except for the semi-supervised GAN-BERT which is able to achieve competitive scores with 200 records.

    There is not a clear delineation between using the pre-trained BERT or the BERT model that has additional pre-training on domain specific language. However, it is believed that further research could shed additional light on the subject since the results are inconclusive.

    Ladda ner fulltext (pdf)
    fulltext
  • 44.
    De Bona, Fabio
    et al.
    Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
    Riezler, Stefan
    Hall, Keith
    Ciaramita, Massimiliano
    Herdagdelen, Amac
    University of Trento, Rovereto, Italy.
    Holmqvist, Maria
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Learning dense models of query similarity from user click logs2010Ingår i: HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, s. 474-482Konferensbidrag (Refereegranskat)
  • 45.
    Debusmann, Ralph
    et al.
    Saarland University, Saarbrücken, Germany.
    Kuhlmann, Marco
    Uppsala universitet, Institutionen för lingvistik och filologi.
    Dependency Grammar: Classification and Exploration2010Ingår i: Resource-Adaptive Cognitive Processes / [ed] Matthew W. Crocker, Jörg Siekmann, Springer Berlin/Heidelberg, 2010, s. 365-388Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Syntactic representations based on word-to-word dependencies have a long tradition in descriptive linguistics [29]. In recent years, they have also become increasingly used in computational tasks, such as information extraction [5], machine translation [43], and parsing [42]. Among the purported advantages of dependency over phrase structure representations are conciseness, intuitive appeal, and closeness to semantic representations such as predicate-argument structures. On the more practical side, dependency representations are attractive due to the increasing availability of large corpora of dependency analyses, such as the Prague Dependency Treebank [19].

  • 46.
    Dienes, Péter
    et al.
    Saarland University, Saarbrücken, Germany.
    Koller, Alexander
    Saarland University, Saarbrücken, Germany.
    Kuhlmann, Marco
    Saarland University, Saarbrücken, Germany.
    Statistical A-Star Dependency Parsing2003Ingår i: Proceedings of the Workshop on Prospects and Advances in the Syntax/Semantics Interface / [ed] Denys Duchier and Geert-Jan Kruijff, 2003, s. 85-89Konferensbidrag (Refereegranskat)
    Abstract [en]

    Extensible Dependency Grammar (XDG; Duchier and Debusmann (2001)) is a recently developed dependency grammar formalism that allows the characterization of linguistic structures along multiple dimensions of description. It can be implemented efficiently using constraint programming (CP; Koller and Niehren 2002). In the CP context, parsing is cast as a search problem: The states of the search are partial parse trees, successful end states are complete and valid parses. In this paper, we propose a probability model for XDG dependency trees and an A-Star search control regime for the XDG parsing algorithm that guarantees the best parse to be found first. Extending XDG with a statistical component has the benefit of bringing the formalism further into the grammatical mainstream; it also enables XDG to efficiently deal with large, corpus-induced grammars that come with a high degree of ambiguity.

  • 47.
    Doostmohammadi, Ehsan
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.
    Kuhlmann, Marco
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem.
    On the Effects of Video Grounding on Language Models2022Ingår i: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models, 2022Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Transformer-based models trained on text and vision modalities try to improve the performance on multimodal downstream tasks or tackle the problem Transformer-based models trained on text and vision modalities try to improve the performance on multimodal downstream tasks or tackle the problem of lack of grounding, e.g., addressing issues like models’ insufficient commonsense knowledge. While it is more straightforward to evaluate the effects of such models on multimodal tasks, such as visual question answering or image captioning, it is not as well-understood how these tasks affect the model itself, and its internal linguistic representations. In this work, we experiment with language models grounded in videos and measure the models’ performance on predicting masked words chosen based on their imageability. The results show that the smaller model benefits from video grounding in predicting highly imageable words, while the results for the larger model seem harder to interpret.of lack of grounding, e.g., addressing issues like models’ insufficient commonsense knowledge. While it is more straightforward to evaluate the effects of such models on multimodal tasks, such as visual question answering or image captioning, it is not as well-understood how these tasks affect the model itself, and its internal linguistic representations. In this work, we experiment with language models grounded in videos and measure the models’ performance on predicting masked words chosen based on their imageability. The results show that the smaller model benefits from video grounding in predicting highly imageable words, while the results for the larger model seem harder to interpret.

  • 48.
    Drewes, Frank
    et al.
    Umeå University.
    Knight, Kevin
    University of Southern California, Information Sciences Institute.
    Kuhlmann, Marco
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Formal Models of Graph Transformation in Natural Language Processing (Dagstuhl Seminar 15122)2015Ingår i: Dagstuhl Reports, ISSN 2192-5283, Vol. 5, nr 3, s. 143-161Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    In natural language processing (NLP) there is an increasing interest in formal models for processing graphs rather than more restricted structures such as strings or trees. Such models of graph transformation have previously been studied and applied in various other areas of computer science, including formal language theory, term rewriting, theory and implementation of programming languages, concurrent processes, and software engineering. However, few researchers from NLP are familiar with this work, and at the same time, few researchers from the theory of graph transformation are aware of the specific desiderata, possibilities and challenges that one faces when applying the theory of graph transformation to NLP problems. The Dagstuhl Seminar 15122 “Formal Models of Graph Transformation in Natural Language Processing” brought researchers from the two areas together. It initiated an interdisciplinary exchange about existing work, open problems, and interesting applications.

  • 49.
    Drewes, Frank
    et al.
    Umeå University, Umeå, Sweden.
    Kuhlmann, MarcoUppsala universitet, Institutionen för lingvistik och filologi.
    ATANLP 2012 Workshop on Applications of Tree Automata Techniques in Natural Language Processing: Proceedings of the Workshop2012Proceedings (redaktörskap) (Övrigt vetenskapligt)
  • 50.
    Drewes, Frank
    et al.
    Umeå University, Umeå, Sweden.
    Kuhlmann, MarcoUppsala universitet, Institutionen för lingvistik och filologi.
    Workshop on Applications of Tree Automata in Natural Language Processing 2010 (ATANLP 2010)2010Proceedings (redaktörskap) (Övrigt vetenskapligt)
1234567 1 - 50 av 304
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf