liu.seSök publikationer i DiVA
Ändra sökning
Avgränsa sökresultatet
1234 1 - 50 av 170
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Abrahamsson, Peder
    Linköpings universitet, Institutionen för datavetenskap.
    Mer lättläst: Påbyggnad av ett automatiskt omskrivningsverktyg till lätt svenska2011Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Det svenska språket ska finnas tillgängligt för alla som bor och verkar i Sverige. Därförär det viktigt att det finns lättlästa alternativ för dem som har svårighet att läsa svensktext. Detta arbete bygger vidare på att visa att det är möjligt att skapa ett automatisktomskrivningsprogram som gör texter mer lättlästa. Till grund för arbetet liggerCogFLUX som är ett verktyg för automatisk omskrivning till lätt svenska. CogFLUXinnehåller funktioner för att syntaktiskt skriva om texter till mer lättläst svenska.Omskrivningarna görs med hjälp av omskrivningsregler framtagna i ett tidigare projekt.I detta arbete implementeras ytterligare omskrivningsregler och även en ny modul förhantering av synonymer. Med dessa nya regler och modulen ska arbetet undersöka omdet är det är möjligt att skapa system som ger en mer lättläst text enligt etableradeläsbarhetsmått som LIX, OVIX och Nominalkvot. Omskrivningsreglerna ochsynonymhanteraren testas på tre olika texter med en total lägnd på ungefär hundra tusenord. Arbetet visar att det går att sänka både LIX-värdet och Nominalkvoten signifikantmed hjälp av omskrivningsregler och synonymhanterare. Arbetet visar även att det finnsfler saker kvar att göra för att framställa ett riktigt bra program för automatiskomskrivning till lätt svenska.

  • 2.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts.1998Ingår i: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL'98) / [ed] Pierre Isabelle, Stroudsburg, PA, USA: The Association for Computational Linguistics , 1998, s. 29-35Konferensbidrag (Refereegranskat)
  • 3.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Alignment-based profiling of Europarl data in an English-Swedish parallel corpus2010Ingår i: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias, Paris, France: European Language Resources Association (ELRA) , 2010, s. 3398-3404Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the sameparallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structurallevel. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion ofmany types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majorityof Europarl segments are parallel translations.

  • 4.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Comparing machine translation and human translation: A case study2017Ingår i: RANLP 2017 The First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT) Proceedings of the Workshop, September 7th, 2017 / [ed] Irina Temnikova, Constantin Orasan, Gloria Corpas and Stephan Vogel, Shoumen, Bulgaria: Association for Computational Linguistics , 2017, s. 21-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT.

  • 5.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Converting an English-Swedish Parallel Treebank to Universal Dependencies2015Ingår i: Proceedings of the Third International Conference on Dependency Linguistics (DepLing 2015), Association for Computational Linguistics, 2015, s. 10-19, artikel-id W15-2103Konferensbidrag (Refereegranskat)
    Abstract [en]

    The paper reports experiences of automatically converting the dependency analysis of the LinES English-Swedish parallel treebank to universal dependencies (UD). The most tangible result is a version of the treebank that actually employs the relations and parts-of-speech categories required by UD, and no other. It is also more complete in that punctuation marks have received dependencies, which is not the case in the original version. We discuss our method in the light of problems that arise from the desire to keep the syntactic analyses of a parallel treebank internally consistent, while available monolingual UD treebanks for English and Swedish diverge somewhat in their use of UD annotations. Finally, we compare the output from the conversion program with the existing UD treebanks.

  • 6.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Towards a research infrastructure for translation studies.2014Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In principle the CLARIN research infrastructure provides a good environment to support research on translation. In reality, the progress within CLARIN in this area seems to be fairly slow. In this paper I will give examples of the resources currently available, and suggest what is needed to achieve a relevant research infrastructure for translation studies. Also, I argue that translation studies has more to gain from language technology, and statistical machine translation in particular, than what is generally assumed, and give some examples.

  • 7.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    A knowledge-lite approach to word alignment2000Ingår i: Parallel Text Processing: Alignment and Use of Translation Corpora / [ed] Jean Veronis, Dordrecht, The Netherlands: Kluwer Academic Publishers, 2000, s. 97-116Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.

  • 8.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Correspondence measures for MT evaluation.2000Ingår i: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000, Paris, France: European Language Resources Association (ELRA) , 2000, s. 41-46Konferensbidrag (Refereegranskat)
  • 9.
    Ahrenberg, Lars
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Sågvall Hein, Anna
    Institutionen för lingvistik, Uppsala universitet..
    Tiedemann, Jörg
    Institutionen för lingvistik, Uppsala universitet.
    Evaluation of word alignment systems2000Ingår i: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000), Paris, France: European Language Resources Association (ELRA) , 2000, s. 1255-1261Konferensbidrag (Refereegranskat)
  • 10.
    Albertsson, Sarah
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Rennes, Evelina
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Similarity-Based Alignment of Monolingual Corpora for Text Simplification2016Ingår i: CL4LC 2016 - Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop, The COLING 2016 Organizing Committee , 2016, s. 154-163Konferensbidrag (Refereegranskat)
    Abstract [en]

    Comparable or parallel corpora are beneficial for many NLP tasks.  The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus.  The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was devel- oped, and the methods were compared.  For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

  • 11.
    Askarieh, Sona
    Linköpings universitet, Institutionen för kultur och kommunikation. Linköpings universitet, Filosofiska fakulteten.
    Cohesion and Comprehensibility in Swedish-English Machine Translated Texts2014Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Access to various texts in different languages causes an increasing demand for fast, multi-purpose, and cheap translators. Pervasive internet use intensifies the necessity for intelligent and cheap translators, since traditional translation methods are excessively slow to translate different texts. During the past years, scientists carried out much research in order to add human and artificial intelligence into the old machine translation systems and the idea of developing a machine translation system came into existence during the days of World War (Kohenn, 2010). The new invention was useful in order to help the human translators and many other people who need to translate different types of texts according to their needs. The new translation systems are useful in meeting people’s needs. Since the machine translation systems vary according to the quality of the systems outputs, their performance should be evaluated from the linguistic point of view in order to reach a fair judgment about the quality of the systems outputs. To achieve this goal, two various Swedish texts were translated by two different machine translation systems in the thesis. The translated texts were evaluated to examine the extent to which errors affect the comprehensibility of the translations. The performances of the systems were evaluated using three approaches. Firstly, most common linguistically errors, which appear in the machine translation systems outputs, were analyzed (e.g. word alignment of the translated texts). Secondly, the influence of different types of errors on the cohesion chains were evaluated. Finally, the effect of the errors on the comprehensibility of the translations were investigated.

    Numerical results showed that some types of errors have more effects on the comprehensibility of the systems’ outputs. The obtained data illustrated that the subjects’ comprehension of the translated texts depend on the type of error, but not frequency. The analyzing depicted which translation system had best performance.

  • 12.
    Auer, Cornelia
    et al.
    Zuse Institut Berlin, Germany.
    Hotz, Ingrid
    Zuse Institut Berlin, Germany.
    Complete Tensor Field Topology on 2D Triangulated Manifolds embedded in 3D2011Ingår i: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 30, nr 3, s. 831-840Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper is concerned with the extraction of the surface topology of tensor fields on 2D triangulated manifoldsembedded in 3D. In scientific visualization topology is a meaningful instrument to get a hold on the structure of agiven dataset. Due to the discontinuity of tensor fields on a piecewise planar domain, standard topology extractionmethods result in an incomplete topological skeleton. In particular with regard to the high computational costs ofthe extraction this is not satisfactory. This paper provides a method for topology extraction of tensor fields thatleads to complete results. The core idea is to include the locations of discontinuity into the topological analysis.For this purpose the model of continuous transition bridges is introduced, which allows to capture the entiretopology on the discontinuous field. The proposed method is applied to piecewise linear three-dimensional tensorfields defined on the vertices of the triangulation and for piecewise constant two or three-dimensional tensor fieldsgiven per triangle, e.g. rate of strain tensors of piecewise linear flow fields.

  • 13.
    Axelsson, Nils
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
    Dynamic Programming Algorithms for Semantic Dependency Parsing2017Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Dependensparsning kan vara ett användbart verktyg för att få datorer att kunna läsa text. Kuhlmann och Jonsson kom 2015 fram till ett logiskt deduktionssystem som kan parsa till ickekorsande grafer med en asymptotisk tidskomplexitet O(n3), där "n" är meningens som parsas längd. Detta arbete utökar Kuhlmann och Jonssons deduktionssystem så att det kan introducera vissa korsande bågar, medan en asymptotisk tidskomplexitet O(n4) uppnås.

    För att tillåta deduktionssystemet att introducera korsande bågar, introduceras 15 nya logiska delgrafstyper, eller item. Dessa item-typer tillåter deduktionssystemet att introducera korsande bågar på ett sådant sätt att acyklicitet bibehålls. Antalet logiska inferensregler tags från Kuhlmanns och Jonssons 19 till 172, på grund av den större mängden kombinationer av de nu 20 item-typerna.

    Resultatet är en mindre ökning av täckning på testdata (ungefär 10 procentenheter, d v s från cirka 70% till 80%), och jämförbar placering med Kuhlmann och Jonsson enligt måtten från uppgift 18 från SemEval 2015. Härledningsunikhet kan inte garanteras på grund av hur bågar introduceras i det nya deduktionssystemet. Den utökade algoritmen, QAC, parsar till en svårdefinierad grafklass, som jämförs empiriskt med 1-endpoint-crossing-grafer och grafer med pagenumber 2 eller mindre. QAC:s grafklass har lägre täckning än båda dessa, och har ingen högre gräns i pagenumber eller antal korsningar.

    Slutsatsen är att det inte nödvändigtvis är optimalt att utöka ett mycket minimalt och specifikt deduktionssystem, och att det kan vara bättre att inleda processen med en specifik grafklass i åtanke. Dessutom föreslås flera alternativa metoder för att utöka Kuhlmann och Jonsson.

  • 14.
    Axelsson, Robin
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska högskolan.
    Implementation och utvärdering av termlänkare i Java2013Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Aligning parallell terms in a parallell corpus can be done by aligning all words and phrases in the corpus and then performing term extraction on the aligned set of word pairs. Alternatively, term extraction in the source and target text can be made separately and then the resulting term candidates can be aligned, forming aligned parallell terms. This thesis describes an implementation of a word aligner that is applied on extracted term candidates in both the source and the target texts. The term aligner uses statistical measures, the tool Giza++ and heuristics in the search for alignments. The evaluation reveals that the best results are obtained when the term alignment relies heavily on the Giza++ tool and Levenshtein heuristic.

  • 15.
    Bilos, Rober
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Incremental scanning and token-based editing1987Licentiatavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    A primary goal with this thesis work has been to investigate the consequences of a token-based program representation. Among the results which are presented here are an incremental scanning algorithm together with a token-based syntax sensitive editing approach for program editing.The design and implementation of an incremental scanner and a practically useful syntax-sensitive editor is described in some detail. The language independent incremental scanner converts textual edit operations to corresponding operations on the token sequence. For example, user input is converted to tokens as it is typed in. This editor design makes it possible to edit programs with almost the same flexibility as with a conventional text editor and also provides some features offered by a syntax-directed editor, such as template instantiation, automatic indentation and prettyprinting, lexical and syntactic error handling.We have found that a program represented as a token sequence can on the average be represented in less than half the storage space required for a program in text form. Also, interactive syntax checking is speeded up since rescanning is not needed.The current implementation, called TOSSED - Token-based Syntax Sensitive Editor, supports editing and development of programs written in Pascal. The user is guaranteed a lexically and syntactically correct program on exit from the editor, which avoids many unnecessary compilations. The scanner, parser, prettyprinter, and syntactic error recovery are table-driven and language independent template specification is supported. Thus, editors supporting other languages can be generated.

  • 16. Bremin, Sofia
    et al.
    Hu, Hongzhan
    Karlsson, Johanna
    Prytz Lillkull, Anna
    Wester, Martin
    Danielsson, Henrik
    Linköpings universitet, Institutet för handikappvetenskap (IHV). Linköpings universitet, Institutionen för beteendevetenskap och lärande, Handikappvetenskap. Linköpings universitet, Filosofiska fakulteten.
    Stymne, Sara
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Methods for human evaluation of machine translation2010Ingår i: Proceedings of the Swedish Language Technology Conference (SLTC2010), 2010, s. 47-48Konferensbidrag (Övrigt vetenskapligt)
  • 17.
    Bretan, Ivan
    et al.
    Telia Research AB, Haninge, SWEDEN.
    Eklund, Robert
    Telia Research AB, Haninge, SWEDEN.
    MacDermid, Catriona
    Telia Research AB, Haninge, SWEDEN.
    Approaches to gathering realistic training data for speech translation systems1996Ingår i: Proceedings of Third IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, 1996, Institute of Electrical and Electronics Engineers (IEEE), 1996, s. 97-100Konferensbidrag (Refereegranskat)
    Abstract [en]

    The Spoken Language Translator (SLT) is a multi-lingual speech-to-speech translation prototype supporting English, Swedish and French within the air traffic information system (ATIS) domain. The design of SLT is characterized by a strongly corpus-driven approach, which accentuates the need for cost-efficient collection procedures to obtain training data. This paper discusses various approaches to the data collection issue pursued within a speech translation framework. Original American English speech and language data have been collected using traditional Wizard-of-Oz (WOZ) techniques, a relatively costly procedure yielding high-quality results. The resulting corpus has been translated textually into Swedish by a large number of native speakers (427) and used as prompts for training the target language speech model. This ᅵbudgetᅵ collection method is compared to the accepted method, i.e., gathering data by means of a full-blown WOZ simulation. The results indicate that although translation in this case proved economical and produced considerable data, the method is not sensitive to certain features typical of spoken language, for which WOZ is superior

  • 18.
    Capshaw, Riley
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system.
    Relation Classification using Semantically-Enhanced Syntactic Dependency Paths: Combining Semantic and Syntactic Dependencies for Relation Classification using Long Short-Term Memory Networks2018Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Many approaches to solving tasks in the field of Natural Language Processing (NLP) use syntactic dependency trees (SDTs) as a feature to represent the latent nonlinear structure within sentences. Recently, work in parsing sentences to graph-based structures which encode semantic relationships between words—called semantic dependency graphs (SDGs)—has gained interest. This thesis seeks to explore the use of SDGs in place of and alongside SDTs within a relation classification system based on long short-term memory (LSTM) neural networks. Two methods for handling the information in these graphs are presented and compared between two SDG formalisms. Three new relation extraction system architectures have been created based on these methods and are compared to a recent state-of-the-art LSTM-based system, showing comparable results when semantic dependencies are used to enhance syntactic dependencies, but with significantly fewer training parameters.

  • 19.
    Carlsson, Bertil
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Using the pyramid method to create gold standards for evaluation of extraction based text summarization techniques2010Ingår i: Proceedings of the Third Swedish Language Technology Conference (SLTC-2010), 2010Konferensbidrag (Refereegranskat)
  • 20.
    Cederblad, Gustav
    Linköpings universitet, Institutionen för datavetenskap.
    Finding Synonyms in Medical Texts: Creating a system for automatic synonym extraction from medical texts2018Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    This thesis describes the work of creating an automatic system for identifying synonyms and semantically related words in medical texts. Before this work, as a part of the project E-care@home, medical texts have been classified as either lay or specialized by both a lay annotator and an expert annotator. The lay annotator, in this case, is a person without any medical knowledge, whereas the expert annotator has professional knowledge in medicine. Using these texts made it possible to create co-occurrences matrices from which the related words could be identified. Fifteen medical terms were chosen as system input. The Dice similarity of these words in a context window of ten words around them was calculated. As output, five candidate related terms for each medical term was returned. Only unigrams were considered. The candidate related terms were evaluated using a questionnaire, where 223 healthcare professionals rated the similarity using a scale from one to five. A Fleiss kappa test showed that the agreement among these raters was 0.28, which is a fair agreement. The evaluation further showed that there was a significant correlation between the human ratings and the relatedness score (Dice similarity). That is, words with higher Dice similarity tended to get a higher human rating. However, the Dice similarity interval in which the words got the highest average human rating was 0.35-0.39. This result means that there is much room for improving the system. Further developments of the system should remove the unigram limitation and expand the corpus the provide a more accurate and reliable result.

  • 21.
    Danielsson, Benjamin
    Linköpings universitet, Institutionen för datavetenskap.
    A Study on Text Classification Methods and Text Features2019Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [en]

    When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance

  • 22.
    De Bona, Fabio
    et al.
    Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
    Riezler, Stefan
    Hall, Keith
    Ciaramita, Massimiliano
    Herdagdelen, Amac
    University of Trento, Rovereto, Italy.
    Holmqvist, Maria
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Learning dense models of query similarity from user click logs2010Ingår i: HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, s. 474-482Konferensbidrag (Refereegranskat)
  • 23.
    Debusmann, Ralph
    et al.
    Saarland University, Saarbrücken, Germany.
    Kuhlmann, Marco
    Uppsala universitet, Institutionen för lingvistik och filologi.
    Dependency Grammar: Classification and Exploration2010Ingår i: Resource-Adaptive Cognitive Processes / [ed] Matthew W. Crocker, Jörg Siekmann, Springer Berlin/Heidelberg, 2010, s. 365-388Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Syntactic representations based on word-to-word dependencies have a long tradition in descriptive linguistics [29]. In recent years, they have also become increasingly used in computational tasks, such as information extraction [5], machine translation [43], and parsing [42]. Among the purported advantages of dependency over phrase structure representations are conciseness, intuitive appeal, and closeness to semantic representations such as predicate-argument structures. On the more practical side, dependency representations are attractive due to the increasing availability of large corpora of dependency analyses, such as the Prague Dependency Treebank [19].

  • 24.
    Dienes, Péter
    et al.
    Saarland University, Saarbrücken, Germany.
    Koller, Alexander
    Saarland University, Saarbrücken, Germany.
    Kuhlmann, Marco
    Saarland University, Saarbrücken, Germany.
    Statistical A-Star Dependency Parsing2003Ingår i: Proceedings of the Workshop on Prospects and Advances in the Syntax/Semantics Interface / [ed] Denys Duchier and Geert-Jan Kruijff, 2003, s. 85-89Konferensbidrag (Refereegranskat)
    Abstract [en]

    Extensible Dependency Grammar (XDG; Duchier and Debusmann (2001)) is a recently developed dependency grammar formalism that allows the characterization of linguistic structures along multiple dimensions of description. It can be implemented efficiently using constraint programming (CP; Koller and Niehren 2002). In the CP context, parsing is cast as a search problem: The states of the search are partial parse trees, successful end states are complete and valid parses. In this paper, we propose a probability model for XDG dependency trees and an A-Star search control regime for the XDG parsing algorithm that guarantees the best parse to be found first. Extending XDG with a statistical component has the benefit of bringing the formalism further into the grammatical mainstream; it also enables XDG to efficiently deal with large, corpus-induced grammars that come with a high degree of ambiguity.

  • 25.
    Drewes, Frank
    et al.
    Umeå University.
    Knight, Kevin
    University of Southern California, Information Sciences Institute.
    Kuhlmann, Marco
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Formal Models of Graph Transformation in Natural Language Processing (Dagstuhl Seminar 15122)2015Ingår i: Dagstuhl Reports, ISSN 2192-5283, Vol. 5, nr 3, s. 143-161Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    In natural language processing (NLP) there is an increasing interest in formal models for processing graphs rather than more restricted structures such as strings or trees. Such models of graph transformation have previously been studied and applied in various other areas of computer science, including formal language theory, term rewriting, theory and implementation of programming languages, concurrent processes, and software engineering. However, few researchers from NLP are familiar with this work, and at the same time, few researchers from the theory of graph transformation are aware of the specific desiderata, possibilities and challenges that one faces when applying the theory of graph transformation to NLP problems. The Dagstuhl Seminar 15122 “Formal Models of Graph Transformation in Natural Language Processing” brought researchers from the two areas together. It initiated an interdisciplinary exchange about existing work, open problems, and interesting applications.

  • 26.
    Drewes, Frank
    et al.
    Umeå University, Umeå, Sweden.
    Kuhlmann, MarcoUppsala universitet, Institutionen för lingvistik och filologi.
    ATANLP 2012 Workshop on Applications of Tree Automata Techniques in Natural Language Processing: Proceedings of the Workshop2012Proceedings (redaktörskap) (Övrigt vetenskapligt)
  • 27.
    Drewes, Frank
    et al.
    Umeå University, Umeå, Sweden.
    Kuhlmann, MarcoUppsala universitet, Institutionen för lingvistik och filologi.
    Workshop on Applications of Tree Automata in Natural Language Processing 2010 (ATANLP 2010)2010Proceedings (redaktörskap) (Övrigt vetenskapligt)
  • 28.
    Edholm, Lars
    Linköpings universitet, Institutionen för datavetenskap.
    Automatisk kvalitetskontroll av terminologi i översättningar2007Självständigt arbete på avancerad nivå (magisterexamen), 20 poäng / 30 hpStudentuppsats
    Abstract [sv]

    Kvalitet hos översättningar är beroende av korrekt användning av specialiserade termer, som kan göra översättningen lättare att förstå och samtidigt minska tidsåtgång och kostnader för översättningen (Lommel, 2007). Att terminologi används konsekvent är viktigt, och något som bör granskas vid en kvalitetskontroll av exempelvis översatt dokumentation (Esselink, 2000). Det finns idag funktioner för automatisk kontroll av terminologi i flera kommersiella program. Denna studie syftar till att utvärdera sådana funktioner, då ingen tidigare större studie av detta har påträffats.

    För att få en inblick i hur kvalitetskontroll sker i praktiken genomfördes först två kvalitativa intervjuer med personer involverade i detta på en översättningsbyrå. Resultaten jämfördes med aktuella teorier inom området och visade på stor överensstämmelse med vad exempelvis Bass (2006) förespråkar.

    Utvärderingarna inleddes med en granskning av täckningsgrad hos en verklig termdatabas jämfört med subjektivt markerade termer i en testkorpus baserad på ett autentiskt översättningsminne. Granskningen visade dock på relativt låg täckningsgrad. För att öka täckningsgraden modifierades termdatabasen, bland annat utökades den med längre termer ur testkorpusen.

    Därefter kördes fyra olika programs funktion för kontroll av terminologi i testkorpusen jämfört med den modifierade termdatabasen. Slutligen modifierades även testkorpusen, där ett antal fel placerades ut för att få en mer idealiserad utvärdering. Resultaten i form av larm för potentiella fel kategoriserades och bedömdes som riktiga eller falska larm. Detta utgjorde basen för mått på kontrollernas precision och i den sista utvärderingen även deras recall.

    Utvärderingarna visade bland annat att det för terminologi i översättningar på engelska - svenska var mest fördelaktigt att matcha termdatabasens termer som delar av ord i översättningens käll- och målsegment. På så sätt kan termer med olika böjningsformer fångas utan stöd för språkspecifik morfologi. En orsak till många problem vid matchningen var utseendet på termdatabasens poster, som var mer anpassat för mänskliga översättare än för maskinell läsning.

    Utifrån intervjumaterialet och utvärderingarnas resultat formulerades rekommendationer kring införandet av verktyg för automatisk kontroll av terminologi. På grund av osäkerhetsfaktorer i den automatiska kontrollen motiveras en manuell genomgång av dess resultat. Genom att köra kontrollen på stickprov som redan granskats manuellt ur andra aspekter, kan troligen en lämplig omfattning av resultat att gå igenom manuellt erhållas. Termdatabasens kvalitet är avgörande för dess täckningsgrad för översättningar, och i förlängningen också för nyttan med att använda den för automatisk kontroll.

  • 29.
    Eklund, Robert
    Stockholm University, Department of Computational Linguistics, Institute of Linguistics.
    A Probabilistic Tagging Module Based on Surface Pattern Matching1993Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [en]

    A problem with automatic tagging and lexical analysis is that it is never 100 % accurate. In order to arrive at better figures, one needs to study the character of what is left untagged by automatic taggers. In this paper untagged residue outputted by the automatic analyser SWETWOL (Karlsson 1992) at Helsinki is studied. SWETWOL assigns tags to words in Swedish texts mainly through dictionary lookup. The contents of the untagged residue files are described and discussed, and possible ways of solving different problems are proposed. One method of tagging residual output is proposed and implemented: the left-stripping method, through which untagged words are bereaved their left-most letters, searched in a dictionary, and if found, tagged according to the information found in the said dictionary. If the stripped word is not found in the dictionary, a match is searched in ending lexica containing statistical information about word classes associated with that particular word form (i.e., final letter cluster, be this a grammatical suffix or not), and the relative frequency of each word class. If a match is found, the word is given graduated tagging according to the statistical information in the ending lexicon. If a match is not found, the word is stripped of what is now its left-most letter and is recursively searched in a dictionary and ending lexica (in that order). The ending lexica employed in this paper are retrieved from a reversed version of Nusvensk Frekvensordbok (Allén 1970), and contain endings of between one and seven letters. The contents of the ending lexica are to a certain degree described and discussed. The programs working according to the principles described are run on files of untagged residual output. Appendices include, among other things, LISP source code, untagged and tagged files, the ending lexica containing one and two letter endings and excerpts from ending lexica containing three to seven letters.

  • 30.
    Eklund, Robert
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Disfluency in Swedish human–human and human–machine travel booking dialogues2004Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    This thesis studies disfluency in spontaneous Swedish speech, i.e., the occurrence of hesitation phenomena like eh, öh, truncated words, repetitions and repairs, mispronunciations, truncated words and so on. The thesis is divided into three parts:

    PART I provides the background, both concerning scientific, personal and industrial–academic aspects in the Tuning in quotes, and the Preamble and Introduction (chapter 1).

    PART II consists of one chapter only, chapter 2, which dives into the etiology of disfluency. Consequently it describes previous research on disfluencies, also including areas that are not the main focus of the present tome, like stuttering, psychotherapy, philosophy, neurology, discourse perspectives, speech production, application-driven perspectives, cognitive aspects, and so on. A discussion on terminology and definitions is also provided. The goal of this chapter is to provide as broad a picture as possible of the phenomenon of disfluency, and how all those different and varying perspectives are related to each other.

    PART III describes the linguistic data studied and analyzed in this thesis, with the following structure: Chapter 3 describes how the speech data were collected, and for what reason. Sum totals of the data and the post-processing method are also described. Chapter 4 describes how the data were transcribed, annotated and analyzed. The labeling method is described in detail, as is the method employed to do frequency counts. Chapter 5 presents the analysis and results for all different categories of disfluencies. Besides general frequency and distribution of the different types of disfluencies, both inter- and intra-corpus results are presented, as are co-occurrences of different types of disfluencies. Also, inter- and intra-speaker differences are discussed. Chapter 6 discusses the results, mainly in light of previous research. Reasons for the observed frequencies and distribution are proposed, as are their relation to language typology, as well as syntactic, morphological and phonetic reasons for the observed phenomena. Future work is also envisaged, both work that is possible on the present data set, work that is possible on the present data set given extended labeling and work that I think should be carried out, but where the present data set fails, in one way or another, to meet the requirements of such studies.

    Appendices 1–4 list the sum total of all data analyzed in this thesis (apart from Tok Pisin data). Appendix 5 provides an example of a full human–computer dialogue.

  • 31.
    Eklund, Robert
    Stockholms universitet, Stockholm, Sverige.
    En introduktion till programmering i prolog1996Övrigt (Övrigt vetenskapligt)
    Abstract [sv]

    Detta kompendium är ämnat som en grundläggande introduktion till programmeringsspråket PROLOG.I Eftersom det operativa ordet här är "grundläggande" så förstås att kompendiet inte har några anspråk på att tillfredsställa en professionell hackers2 alla lustar. Särskild hänsyn har i stället tagits till de personer vilka inte har någon som helst tidigare programmeringserfarenhet. Detta innebär att personer som redan är förtrogna med andra programmeringsspråk kan komma att tycka att framställningen till viss del och i någon mening är trivial ( och måhända på gränsen till felaktig, en fara vid alla försök att förenkla). Det innebär också att mycket är utelämnat, och att således personer som redan kan prolog kan komma att utbrista "Men varför tog du inte med det här?!". Jag har försökt att ta med sådant som oundgängligen utgör ett slags bas för att gå vidare. Det som har utelämnats är givetvis inte oviktigt, utan sådant som inte krävs för att kunna leka och ha kul med prolog som första bekantskap. Om man berättar allt på första träffen så finns ju inga hemligheter kvar att upptäcka!

  • 32.
    Eklund, Robert
    Linköpings universitet, Institutionen för kultur och kommunikation, Avdelningen för språk och litteratur. Linköpings universitet, Filosofiska fakulteten.
    Neurala korrelat till fyllda pauser: En fMRI-studie av disfluensperception2019Ingår i: Röstläget, ISSN 1103-3983, nr Februari 2019, s. 13-17Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [sv]

    Mänskligt, spontalt producerat talspråk kännetecknas av att inte vara helt ”flytande”. Jag sätter ordet inom citationstecken eftersom det råder delade meningar om huruvida ”oflyt” i själva verket underlättar såväl talproduktion som talperception. Den vanligaste termen för detta är disfluenser, men även denna term är inte helt etablerad. En annan sak att hålla iminne är att den alternativa stavningen dysfluenser förekommer, speciellt engelskspråkig litteratur.

    Disfluenser har studerats i över ett sekel, och en introduktion följer nedan. Denna artikel redovisar resultaten från en unik fMRIstudie av den mest ”speciella” av de olika disfluenstyperna, det som ofta(st) benämns ”fyllda pauser”, som (i svenska) ”eh” eller ”öh”. Notera att även denna term inte är etablerad.

  • 33. Fagerlund, Martin
    et al.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Eldén, Lars
    Linköpings universitet, Matematiska institutionen, Beräkningsvetenskap. Linköpings universitet, Tekniska högskolan.
    Ahrenberg, Lars
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Computing Word Senses by Semantic Mirroring and Spectral Graph Partitioning2010Ingår i: Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing / [ed] Carmen Banea, Alessandro Moschitti, Swapna Somasundaran and Fabio Massimo Zanzotto, Stroudsburg, PA, USA: The Association for Computational Linguistics , 2010, s. 103-107Konferensbidrag (Refereegranskat)
    Abstract [en]

    Using the technique of ”semantic mirroring”a graph is obtained that representswords and their translations from a parallelcorpus or a bilingual lexicon. The connectednessof the graph holds informationabout the different meanings of words thatoccur in the translations. Spectral graphtheory is used to partition the graph, whichleads to a grouping of the words accordingto different senses. We also report resultsfrom an evaluation using a small sample ofseed words from a lexicon of Swedish andEnglish adjectives.

  • 34.
    Fahlborg, Daniel
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Rennes, Evelina
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. RISE SICS East, Linköping, Sweden.
    Introducing SAPIS – an API Service for Text Analysis and Simplification2016Konferensbidrag (Refereegranskat)
    Abstract [en]

    In several projects, we are developing tools and techniques for simplifying and analyzing textual data, aiming to enhance the accessibility of texts. We present SAPIS, an API service by which these techniques can be reached from a remote server. The API currently involves four running services, and is designed for easy implementation of new services. SAPIS aims to reach professional or daily users interested in the simplification and analysis of texts.

  • 35.
    Falkenjack, Johan
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Towards a Model of General Text Complexity for Swedish2018Licentiatavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    In an increasingly networked world, where the amount of written information is growing at a rate never before seen, the ability to read and absorb written information is of utmost importance for anything but a superficial understanding of life's complexities. That is an example of a sentence which is not very easy to read. It can be said to have a relatively high degree of text complexity. Nevertheless, the sentence is also true. It is important to be able to read and understand written materials. While not everyone might have a job where they have to read a lot, access to written material is necessary in order to participate in modern society. Most information, from news reporting, to medical information, to governmental information, come primarily in a written form.

    But what makes the sentence at the start of this abstract so complex? We can probably all agree that the length is part of it. But then what? Researches in the field of readability and text complexity analysis have been studying this question for almost 100 years. That research has over time come to include many computational and data driven methods within the field of computational linguistics.

    This thesis cover some of my contributions to this field of research, though with a main focus on Swedish rather than English text. It aims to explore two primary questions (1) Which linguistic features are most important when assessing text complexity in Swedish? and (2) How can we deal with the problem of data sparsity with regards to complexity annotated texts in Swedish?

    The first issue is tackled by exploring the task of identifying easy-to-read ("lättläst") text using classification with Support Vector Machines. A large set of linguistic features is evaluated with regards to predictive performance and is shown to separate easy-to-read texts from regular texts with a very high accuracy. Meanwhile, using a genetic algorithm for variable selection, we find that almost the same accuracy can be reached with only 8 features. This implies that this classification problem is not very hard and that results might not generalize to comparing less easy-to-read texts.

    This, in turn, brings us to the second question. Except for easy-to-read labeled texts, the data with text complexity annotations is very sparse. It consist of multiple small corpora using different scales to label documents. To deal with this problem, we propose a novel statistical model. The model belongs to the larger family of Probit models and is implemented in a Bayesian fashion and estimated using a Gibbs sampler based on extending a well established Gibbs sampler for the Ordered Probit model. This model is evaluated using both simulated and real world readability data with very promising results.

  • 36.
    Falkenjack, Johan
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska högskolan. Santa Anna IT Research Institute AB, Linköping, Sweden.
    Heimann Mühlenbock, Katarina
    Språkbanken, University of Gothenburg, Gothenburg.
    Using the probability of readability to order Swedish texts2012Ingår i: Proceedings of the Fourth Swedish Language Technology Conference, 2012, s. 27-28Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this study we present a new approach to rank readability in Swedish texts based on lexical, morpho-syntactic and syntactic analysis of text as well as machine learning. The basic premise and theory is presented as well as a small experiment testing the feasibility, but not actual performance, of the approach. The experiment shows that it is possible to implement a system based on the approach, however, the actual performance of such a system has not been evaluated as the necessary resources for such an evaluation does not yet exist for Swedish. The experiment also shows that a classifier based on the aforementioned linguistic analysis, on our limited test set, outperforms classifiers based on established metrics used to assess readability such as LIX, OVIX and Nominal Ratio.

  • 37.
    Falkenjack, Johan
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska högskolan. SICS East Swedish ICT AB .
    Heimann Mühlenbock, Katarina
    Göteborgs Universitet.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska högskolan. SICS East Swedish ICT AB .
    Features indicating readability in Swedish text2013Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) / [ed] Stephan Oepen, Kristin Hagen, Janne Bondi Johannesse, Linköping, 2013, s. 27-40Konferensbidrag (Refereegranskat)
    Abstract [en]

    Studies have shown that modern methods of readability assessment, using automated linguistic analysis and machine learning (ML), is a viable road forward for readability classification and ranking. In this paper we present a study of different levels of analysis and a large number of features and how they affect an ML-system’s accuracy when it comes to readability assessment. We test a large number of features proposed for different languages (mainly English) and evaluate their usefulness for readability assessment for Swedish as well as comparing their performance to that of established metrics. We find that the best performing features are language models based on part-of-speech and dependency type.

  • 38.
    Falkenjack, Johan
    et al.
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Tekniska högskolan.
    Classifying easy-to-read texts without parsing2014Ingår i: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Association for Computational Linguistics, 2014, s. 114-122Konferensbidrag (Refereegranskat)
    Abstract [en]

    Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classi- fier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.

  • 39.
    Falkenjack, Johan
    et al.
    SICS East Swedish ICT AB.
    Jönsson, Arne
    SICS East Swedish ICT AB.
    Implicit readability ranking using the latent variable of a Bayesian Probit model2016Ingår i: CL4LC 2016 - Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop, 2016, s. 104-112Konferensbidrag (Refereegranskat)
    Abstract [en]

    Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora.  Often, relevant corpora consist only of easy-to-read texts with no  rank  information  or  empirical  readability  scores,  making  only  binary  approaches,  such  as classification, applicable.  We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encourag- ing results.  We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

  • 40.
    Falkenjack, Johan
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Rennes, Evelina
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Fahlborg, Daniel
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Johansson, Vida
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Filosofiska fakulteten.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten.
    Services for text simplification and analysis2017Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, Linköping University Electronic Press, 2017, Vol. 131, s. 309-313, artikel-id 044Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a language technology service for web editors’ work on making texts easier to understand, including tools for text complexity analysis, text simplification and text summarization. We also present a text analysis service focusing on measures of text complexity.

  • 41.
    Falkenjack, Johan
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    Santini, Marina
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. SICS East Swedish ICT AB, Linköping, Sweden.
    Jönsson, Arne
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Filosofiska fakulteten. SICS East Swedish ICT AB, Linköping, Sweden.
    An Exploratory Study on Genre Classification using Readability Features2016Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We present a preliminary study that explores whether text features used for readability assessment are reliable genre-revealingfeatures. We empirically explore the difference between genre and domain. We carry out two sets of experiments with bothsupervised and unsupervised methods. Findings on the Swedish national corpus (the SUC) show that readability cues are goodindicators of genre variation.

  • 42.
    Fallgren, Per
    Linköpings universitet, Institutionen för datavetenskap. Linköpings universitet, Filosofiska fakulteten.
    Användning av Self Organizing Maps som en metod att skapa semantiska representationer ur text2015Självständigt arbete på grundnivå (kandidatexamen), 12 poäng / 18 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    Denna studie är ett kognitionsvetenskapligt examensarbete som syftar på att skapa en modell som skapar semantiska representationer utifrån ett mer biologiskt plausibelt tillvägagångssätt jämfört med traditionella metoder. Denna modell kan ses som ett första steg i utredningen av ansatsen som följer. Studien utreder antagandet om Self Organizing Maps kan användas för att skapa semantiska representationer ur stora mängder text utifrån ett distribuerat inspirerat tillvägagångssätt. Resultatet visar på ett potentiellt fungerande system, men som behöver utredas vidare i framtida studier för verifiering av högre grad.

  • 43.
    Fallgren, Per
    Linköpings universitet, Institutionen för datavetenskap.
    Thoughts don't have Colour, do they?: Finding Semantic Categories of Nouns and Adjectives in Text Through Automatic Language Processing2017Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
    Abstract [en]

    Not all combinations of nouns and adjectives are possible and some are clearly more fre- quent than other. With this in mind this study aims to construct semantic representations of the two types of parts-of-speech, based on how they occur with each other. By inves- tigating these ideas via automatic natural language processing paradigms the study aims to find evidence for a semantic mutuality between nouns and adjectives, this notion sug- gests that the semantics of a noun can be captured by its corresponding adjectives, and vice versa. Furthermore, a set of proposed categories of adjectives and nouns, based on the ideas of Gärdenfors (2014), is presented that hypothetically are to fall in line with the produced representations. Four evaluation methods were used to analyze the result rang- ing from subjective discussion of nearest neighbours in vector space to accuracy generated from manual annotation. The result provided some evidence for the hypothesis which suggests that further research is of value. 

  • 44.
    Fallgren, Per
    et al.
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Segeblad, Jesper
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Kuhlmann, Marco
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Towards a Standard Dataset of Swedish Word Vectors2016Ingår i: Proceedings of the Sixth Swedish Language Technology Conference (SLTC), 2016Konferensbidrag (Refereegranskat)
    Abstract [en]

    Word vectors, embeddings of words into a low-dimensional space, have been shown to be useful for a large number of natural language processing tasks. Our goal with this paper is to provide a useful dataset of such vectors for Swedish. To this end, we investigate three standard embedding methods: the continuous bag-of-words and the skip-gram model with negative sampling of Mikolov et al. (2013a), and the global vectors of Pennington et al. (2014). We compare these methods using QVEC-CCA (Tsvetkov et al., 2016), an intrinsic evaluation measure that quantifies the correlation of learned word vectors with external linguistic resources. For this propose we use SALDO, the Swedish Association Lexicon (Borin et al., 2013). Our experiments show that the continuous bag-of-words model produces vectors that are most highly correlated to SALDO, with the skip-gram model very close behind. Our learned vectors will be provided for download at the paper’s website.

  • 45.
    Ferrara Boston, Marisa
    et al.
    Department of Linguistics, Cornell University, Ithaca, NY, USA.
    Hale, John
    Department of Linguistics, Cornell University, Ithaca, NY, USA.
    Kuhlmann, Marco
    Uppsala universitet, Institutionen för lingvistik och filologi.
    Dependency Structures Derived from Minimalist Grammars2010Ingår i: The Mathematics of Language: 10th and 11th Biennial Conference, MOL 10, Los Angeles, CA, USA, July 28–30, 2007, and MOL 11, Bielefeld, Germany, August 20–21, 2009, Revised Selected Papers, Springer Berlin/Heidelberg, 2010, s. 1-12Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper provides an interpretation of Minimalist Grammars (Stabler, 1997; Stabler & Keenan, 2003) in terms of dependency structures. Under this interpretation, merge operations derive projective dependency structures, and movement operations create both non-projective and illnested structures. This provides a new characterization of the generative capacity of Minimalist Grammar, and makes it possible to discuss the linguistic relevance of non-projectivity and illnestedness based on grammars that derive structures with these properties.

  • 46.
    Foo, Jody
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Computational Terminology: Exploring Bilingual and Monolingual Term Extraction2012Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    Terminologies are becoming more important to modern day society as technology and science continue to grow at an accelerating rate in a globalized environment. Agreeing upon which terms should be used to represent which concepts and how those terms should be translated into different languages is important if we wish to be able to communicate with as little confusion and misunderstandings as possible.

    Since the 1990s, an increasing amount of terminology research has been devoted to facilitating and augmenting terminology-related tasks by using computers and computational methods. One focus for this research is Automatic Term Extraction (ATE).

    In this compilation thesis, studies on both bilingual and monolingual ATE are presented. First, two publications reporting on how bilingual ATE using the align-extract approach can be used to extract patent terms. The result in this case was 181,000 manually validated English-Swedish patent terms which were to be used in a machine translation system for patent documents. A critical component of the method used is the Q-value metric, presented in the third paper, which can be used to rank extracted term candidates (TC) in an order that correlates with TC precision. The use of Machine Learning (ML) in monolingual ATE is the topic of the two final contributions. The first ML-related publication shows that rule induction based ML can be used to generate linguistic term selection patterns, and in the second ML-related publication, contrastive n-gram language models are used in conjunction with SVM ML to improve the precision of term candidates selected using linguistic patterns.

    Delarbeten
    1. Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools
    Öppna denna publikation i ny flik eller fönster >>Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools
    2010 (Engelska)Ingår i: Terminology in Everyday Life / [ed] Marcel Thelen and Frieda Steurs, John Benjamins Publishing Company , 2010, s. 163-180Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Using a standardized term bank in both authoring and translation processes can facilitate the use of consistent terminology, which in turn minimizes confusion and frustration from the readers. One of the problems of creating a standardized term bank, is the time and effort required. Recent developments in term extraction techniques based on word alignment can improve extraction of term candidates when parallel texts are available. The aligned units are processed automatically, but a large quantity of term candidates will still have to be processed by a terminologist to select which candidates should be promoted to standardized terms. To minimize the work needed to process the extracted term candidates, we propose a method based on using efficient editing tools, as well as ranking the extracted set of term candidates by quality. This sorted set of term candidates can then be edited, categorized and filtered in a more effective way. In this paper, the process and methods used to arrive at a standardized term bank are presented and discussed.

     

    Ort, förlag, år, upplaga, sidor
    John Benjamins Publishing Company, 2010
    Serie
    Terminology and Lexicography Research and Practice, ISSN 1388-8455 ; 13
    Nyckelord
    terminology, extraction, term bank, automation
    Nationell ämneskategori
    Språkteknologi (språkvetenskaplig databehandling) Datavetenskap (datalogi)
    Identifikatorer
    urn:nbn:se:liu:diva-59842 (URN)978 90 272 2337 1 (ISBN)
    Tillgänglig från: 2010-09-27 Skapad: 2010-09-27 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
    2. Automatic Extraction and Manual Validation of Hierarchical Patent Terminology
    Öppna denna publikation i ny flik eller fönster >>Automatic Extraction and Manual Validation of Hierarchical Patent Terminology
    Visa övriga...
    2009 (Engelska)Ingår i: NORDTERM 16. Ontologier og taksonomier.: Rapport fra NORDTERM 2009 / [ed] B. Nistrup Madsen & H. Erdman Thomsen, Copenhagen, Denmark: Copenhagen Business School Press, 2009, s. 249-262Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    Several methods can be applied to create a set of validated terms from existing documents. In this paper we describe an automatic bilingual term candidate extraction method, and the validation process used to create a hierarchical patent terminology. The process described was used to extract terms from patent texts, commissioned by the Swedish Patent Office with the purpose of using the terms for machine translation. Information on the correct linguistic inflection patterns and hierarchical partitioning of terms based on their use are of utmost importance.The process contains six phases, 1) Analysis of the source material and system configuration; 2) Term candidate extraction; 3) Term candidate filtering and initial linguistic validation; 4) Manual validation by domain experts; 5) Final linguistic validation; and 6) Publishing the validated terms.Input to the extraction process consisted of more than 91 000 patent document pairs in English and Swedish, 565 million words in English and 450 million words in Swedish. The English documents were supplied in EBD SGML format and the Swedish documents were supplied in OCR processed scans of patent documents. After grammatical and statistical analysis, the documents were word-aligned. Using the word-aligned material, candidate terms were extracted based on linguistic patterns. 750 000 term candidates were extracted and stored in a relational database. The term candidates were processed in 8 months resulting in 181 000 unique validated term pairs that were exported into several hierarchically organized OLIF files.

    Ort, förlag, år, upplaga, sidor
    Copenhagen, Denmark: Copenhagen Business School Press, 2009
    Nyckelord
    automatic term extraction, computational terminology, patent terminology
    Nationell ämneskategori
    Språkteknologi (språkvetenskaplig databehandling)
    Identifikatorer
    urn:nbn:se:liu:diva-75236 (URN)978-87-994577-0-0 (ISBN)
    Konferens
    NORDTERM 2009, København, Danmark 9‐12. juni 2009
    Tillgänglig från: 2012-02-23 Skapad: 2012-02-22 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
    3. Terminology extraction and term ranking for standardizing term banks
    Öppna denna publikation i ny flik eller fönster >>Terminology extraction and term ranking for standardizing term banks
    2007 (Engelska)Ingår i: Proceedings of 16th Nordic Conference of Computational Linguistics Nodalida,2007 / [ed] Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit, Tartu, Estonia: University of Tartu , 2007, s. 349-354Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    This paper presents how word alignment techniques could be used for building standardized term banks. It is shown that time and effort could be saved by a relatively simple evaluation metric based on frequency data from term pairs, and source and target distributions inside the alignment results. The proposed Q-value metric is shown to outperform other tested metrics such as Dice's coefficient, and simple pair frequency.

     

    Ort, förlag, år, upplaga, sidor
    Tartu, Estonia: University of Tartu, 2007
    Nyckelord
    terminology extraction, metric, word alignment
    Nationell ämneskategori
    Datavetenskap (datalogi)
    Identifikatorer
    urn:nbn:se:liu:diva-41011 (URN)54924 (Lokalt ID)978-9985-4-0513-0 (ISBN)54924 (Arkivnummer)54924 (OAI)
    Konferens
    NODALIDA 2007, 16th Nordic Conference of Computational Linguistics, 24-26 May 2007, University of Tartu, Estonia
    Tillgänglig från: 2010-09-29 Skapad: 2009-10-10 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
    4. Using machine learning to perform automatic term recognition
    Öppna denna publikation i ny flik eller fönster >>Using machine learning to perform automatic term recognition
    2010 (Engelska)Ingår i: Proceedings of the LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods / [ed] Núria Bel, Béatrice Daille, Andrejs Vasiljevs, European Language Resources Association, 2010, s. 49-54Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    In this paper a machine learning approach is applied to Automatic Term Recognition (ATR). Similar approaches have been successfully used in Automatic Keyword Extraction (AKE). Using a dataset consisting of Swedish patent texts and validated terms belonging to these texts, unigrams and bigrams are extracted and annotated with linguistic and statistical feature values. Experiments using a varying ratio between positive and negative examples in the training data are conducted using the annotated n-grams. The results indicate that a machine learning approach is viable for ATR. Furthermore, a machine learning approach for bilingual ATR is discussed. Preliminary analysis however indicate that some modifications have to be made to apply the monolingual machine learning approach to a bilingual context.

    Ort, förlag, år, upplaga, sidor
    European Language Resources Association, 2010
    Nationell ämneskategori
    Språkteknologi (språkvetenskaplig databehandling)
    Identifikatorer
    urn:nbn:se:liu:diva-75237 (URN)000356879501100 ()978-2-9517408-6-0 (ISBN)
    Konferens
    LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods, 23 May 2010, Valletta, Malta
    Tillgänglig från: 2012-03-01 Skapad: 2012-02-22 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
    5. Exploring termhood using language models
    Öppna denna publikation i ny flik eller fönster >>Exploring termhood using language models
    2011 (Engelska)Ingår i: Proceedings of the Workshop CHAT 2011: Creation, Harmonization and Application of Terminology Resources / [ed] Tatiana Gornostay, Andrejs Vasiljevs, Tartu University Library (Estonia): Northern European Association for Language Technology (NEALT) , 2011, s. 32-35Konferensbidrag, Publicerat paper (Refereegranskat)
    Abstract [en]

    Term extraction metrics are mostly based on frequency counts. This can be a problem when trying to extract previously unseen multi-word terms. This paper explores whether smoothed language models can be used instead. Although a simplistic use of language models is examined in this paper, the results indicate that with more refinement, smoothed language models may be used instead of unsmoothed frequency-count based termhood metrics.

    Ort, förlag, år, upplaga, sidor
    Tartu University Library (Estonia): Northern European Association for Language Technology (NEALT), 2011
    Serie
    NEALT Proceedings Series, ISSN 1736-8197, E-ISSN 1736-6305 ; Vol. 12
    Nyckelord
    automatic term extraction, computational terminology, machine learning
    Nationell ämneskategori
    Språkteknologi (språkvetenskaplig databehandling)
    Identifikatorer
    urn:nbn:se:liu:diva-75238 (URN)
    Konferens
    NODALIDA 2011 Workshop Creation, Harmonization and Application of Terminology Resources, May 11, 2011, Riga, Latvia
    Tillgänglig från: 2012-02-23 Skapad: 2012-02-22 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
  • 47.
    Foo, Jody
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Exploring termhood using language models2011Ingår i: Proceedings of the Workshop CHAT 2011: Creation, Harmonization and Application of Terminology Resources / [ed] Tatiana Gornostay, Andrejs Vasiljevs, Tartu University Library (Estonia): Northern European Association for Language Technology (NEALT) , 2011, s. 32-35Konferensbidrag (Refereegranskat)
    Abstract [en]

    Term extraction metrics are mostly based on frequency counts. This can be a problem when trying to extract previously unseen multi-word terms. This paper explores whether smoothed language models can be used instead. Although a simplistic use of language models is examined in this paper, the results indicate that with more refinement, smoothed language models may be used instead of unsmoothed frequency-count based termhood metrics.

  • 48.
    Foo, Jody
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools2010Ingår i: Terminology in Everyday Life / [ed] Marcel Thelen and Frieda Steurs, John Benjamins Publishing Company , 2010, s. 163-180Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Using a standardized term bank in both authoring and translation processes can facilitate the use of consistent terminology, which in turn minimizes confusion and frustration from the readers. One of the problems of creating a standardized term bank, is the time and effort required. Recent developments in term extraction techniques based on word alignment can improve extraction of term candidates when parallel texts are available. The aligned units are processed automatically, but a large quantity of term candidates will still have to be processed by a terminologist to select which candidates should be promoted to standardized terms. To minimize the work needed to process the extracted term candidates, we propose a method based on using efficient editing tools, as well as ranking the extracted set of term candidates by quality. This sorted set of term candidates can then be edited, categorized and filtered in a more effective way. In this paper, the process and methods used to arrive at a standardized term bank are presented and discussed.

     

  • 49.
    Foo, Jody
    et al.
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Merkel, Magnus
    Linköpings universitet, Institutionen för datavetenskap, NLPLAB - Laboratoriet för databehandling av naturligt språk. Linköpings universitet, Tekniska högskolan.
    Using machine learning to perform automatic term recognition2010Ingår i: Proceedings of the LREC 2010 Workshop on Methods for automatic acquisition of Language Resources and their evaluation methods / [ed] Núria Bel, Béatrice Daille, Andrejs Vasiljevs, European Language Resources Association, 2010, s. 49-54Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper a machine learning approach is applied to Automatic Term Recognition (ATR). Similar approaches have been successfully used in Automatic Keyword Extraction (AKE). Using a dataset consisting of Swedish patent texts and validated terms belonging to these texts, unigrams and bigrams are extracted and annotated with linguistic and statistical feature values. Experiments using a varying ratio between positive and negative examples in the training data are conducted using the annotated n-grams. The results indicate that a machine learning approach is viable for ATR. Furthermore, a machine learning approach for bilingual ATR is discussed. Preliminary analysis however indicate that some modifications have to be made to apply the monolingual machine learning approach to a bilingual context.

  • 50.
    Gavin, Jacob
    et al.
    Linköpings universitet.
    Hammarbäck, Jimmy
    Linköpings universitet.
    Hammarbäck, Madeleine
    Linköpings universitet.
    Helmersson, Benjamin
    Linköpings universitet.
    Nyberg, Martina
    Linköpings universitet.
    Svensson, Cassandra
    Linköpings universitet.
    Foo, Jody
    Linköpings universitet, Institutionen för datavetenskap, Interaktiva och kognitiva system. Linköpings universitet, Tekniska fakulteten.
    Koch, Felix-Sebastian
    Linköpings universitet, Institutionen för beteendevetenskap och lärande, Psykologi. Linköpings universitet, Filosofiska fakulteten.
    An eye-tracking study on the importance of consistent terminology2014Ingår i: Proceedings of the Fifth Swedish Language Technology Conference (SLTC-14), 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    Using inconsistent terminology, e.g. having different terms in documentation and e.g. labels and menu items in a user interface is believed to be confusing to users. However, few empirical studies exist on this particular topic. In this paper we show how users' interaction with an interface is affected by inconsistent terminology. An experimental eye-tracking study with 30 participants was conducted where the participants were shown a user interface and a task description. The terminology in the interface and task description was manipulated to be either consistent or inconsistent. The results show that terminological inconsistencies led to a significantly higher number of visual fixations, more time needed to perform the task, and more returns to the task description. The conclusion is that inconsistent use of terms create unnecessary cognitive workload for the user that can be avoided by ensuring terminological consistency within a system.

1234 1 - 50 av 170
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf