liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 85) Show all publications
Santini, M., Jönsson, A., Strandqvist, W., Cederblad, G., Nyström, M., Alirezaie, M., . . . Kristoffersson, A. (2019). Designing an Extensible Domain-Specific Web Corpus for “Layfication”: A Case Study in eCare at Home. In: Maya Dimitrova and Hiroaki Wagatsuma (Ed.), Cyber-Physical Systems for Social Applications: (pp. 98-155). Hershey, PA, USA: IGI Global
Open this publication in new window or tab >>Designing an Extensible Domain-Specific Web Corpus for “Layfication”: A Case Study in eCare at Home
Show others...
2019 (English)In: Cyber-Physical Systems for Social Applications / [ed] Maya Dimitrova and Hiroaki Wagatsuma, Hershey, PA, USA: IGI Global, 2019, p. 98-155Chapter in book (Refereed)
Abstract [en]

In the era of data-driven science, corpus-based language technology is an essential part of cyber physical systems. In this chapter, the authors describe the design and the development of an extensible domain-specific web corpus to be used in a distributed social application for the care of the elderly at home. The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible and extensible textual resource, where additional documents and additional languages will be appended over time. The main purpose of the corpus is to be used for building and training language technology applications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automatic identification of more intuitive linguistic expressions that can help laypeople (e.g., patients, family caregivers, and home care aides) understand medical terms, which often appear opaque. Exploratory experiments are presented and discussed.

Place, publisher, year, edition, pages
Hershey, PA, USA: IGI Global, 2019
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:liu:diva-156964 (URN)10.4018/978-1-5225-7879-6.ch006 (DOI)9781522593454 (ISBN)9781522578802 (ISBN)
Projects
E-care@home
Funder
Knowledge Foundation, 20140217
Available from: 2019-05-17 Created: 2019-05-17 Last updated: 2019-05-17Bibliographically approved
Santini, M., Strandqvist, W. & Jönsson, A. (2019). Profiling specialized web corpus qualities: A progress report on "Domainhood". Argentinian Journal of Applied Linguistics, 7(1), 8-26
Open this publication in new window or tab >>Profiling specialized web corpus qualities: A progress report on "Domainhood"
2019 (English)In: Argentinian Journal of Applied Linguistics, ISSN 0136-006X, E-ISSN 1478-6362, Vol. 7, no 1, p. 8-26Article in journal (Refereed) Published
Abstract [en]

In this article we describe ways to profile the domain specificity, a.k.a. domainhood, of specialized web corpora in English and in Swedish. Several studies have been carried out to measure the "qualities" of general-purpose web corpora. On the contrary, less attention has been paid to the evaluation of specialized or domain-specific web corpora. To fill this gap, in this article we present case studies where we explore the effectiveness of several statistical measures – i.e. rank correlation coefficients (Kendall and Spearman), Kullback–Leibler divergence, log-likelihood and burstiness - to assess domainhood. Our findings indicate that it is possible to profile the domainhood quality of a corpus. However, further research is needed to generalize on the results.

Abstract [es]

En este artículo describimos formas de trazar la especificidad del dominio ("domainhood") de los corpus de webs especializados en inglés y en sueco. Muchos estudios se han llevado a cabo para medir las "cualidades" de los corpus de webs de carácter general. Sin embargo, se ha prestado menos atención a la evaluación de corpus de web especializados o de dominios específicos. Para llenar este vacío, en este artículo presentamos estudios de caso donde exploramos la efectividad de diferentes medidas estadísticas, a saber, coeficientes de correlación de rango (Kendall and Spearman), divergencia Kullback–Leibler, probabilidad de registro y burstiness – para evaluar la especificidad del dominio. Nuestros resultados indican que es posible perfilar la calidad de dominio de un corpus. Sin embargo, es necesaria una mayor investigación para generalizar en los resultados.

Place, publisher, year, edition, pages
KMK Scientific Press Ltd., 2019
Keywords
corpus evaluation; term extraction; log- likelihood; rank correlation; Kullback-Leibler divergence, evaluación de corpus; extracción de términos; probabilidad de registro correlación de rango; divergencia Kullback-Leibler
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-158299 (URN)
Available from: 2019-06-28 Created: 2019-06-28 Last updated: 2019-08-08Bibliographically approved
Jönsson, S., Rennes, E., Falkenjack, J. & Jönsson, A. (2018). A Component based Approach to Measuring Text Complexity. In: : . Paper presented at The Seventh Swedish Language Technology Conference (SLTC-18), Stockholm, Sweden, 7-9 November 2018.
Open this publication in new window or tab >>A Component based Approach to Measuring Text Complexity
2018 (English)Conference paper, Published paper (Other academic)
Abstract [en]

We present results from assessing text complexity based on a factorisation of text property measures into components. The components are evaluated by investigating their ability to classify texts belonging to different genres. Our results show that the text complexity components correctly classify texts belonging to specific genres, given that the genres adhere to certain textual conventions. We also propose a radar chart visualisation to communicate component based text complexity.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-154146 (URN)
Conference
The Seventh Swedish Language Technology Conference (SLTC-18), Stockholm, Sweden, 7-9 November 2018
Available from: 2019-01-29 Created: 2019-01-29 Last updated: 2019-04-08Bibliographically approved
Santini, M., Strandqvist, W., Nyström, M., Alirezai, M. & Jönsson, A. (2018). Can We Quantify Domainhood?: Exploring Measures to Assess Domain-Specificity in Web Corpora. In: Elloumi M. et al. (Ed.), Communications in Computer and Information Science, vol 903. Springer, Cham: . Paper presented at Database and Expert Systems Applications. DEXA 2018.. , 903
Open this publication in new window or tab >>Can We Quantify Domainhood?: Exploring Measures to Assess Domain-Specificity in Web Corpora
Show others...
2018 (English)In: Communications in Computer and Information Science, vol 903. Springer, Cham / [ed] Elloumi M. et al., 2018, Vol. 903Conference paper, Published paper (Refereed)
Abstract [en]

Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenient because their creation is fast and inexpensive. Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. Less attention has been paid to assess the representativeness of specialized or domain-specific web corpora. In this paper, we focus on the assessment of domain representativeness of web corpora and we claim that it is possible to assess the degree of domain-specificity, or domainhood, of web corpora. We present a case study where we explore the effectiveness of different measures - namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficient, Kullback–Leibler divergence, log-likelihood and burstiness - to gauge domainhood. Our findings indicate that burstiness is the most suitable measure to single out domain-specific words from a specialized corpus and to allow for the quantification of domainhood.

Series
Communications in Computer and Information Science, ISSN 1865-0929, E-ISSN 1865-0937 ; 903
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-151423 (URN)10.1007/978-3-319-99133-7_17 (DOI)000460552400017 ()978-3-319-99132-0 (ISBN)978-3-319-99133-7 (ISBN)
Conference
Database and Expert Systems Applications. DEXA 2018.
Note

Funding agencies: E-care@home, a "SIDUS - Strong Distributed Research Environment" project - Swedish Knowledge Foundation

Available from: 2018-09-20 Created: 2018-09-20 Last updated: 2019-03-20
Santini, M., Strandqvist, W. & Jönsson, A. (2018). Profiling Domain Specificity of Specialized Web Corpora using Burstiness. Explorations and Open Issues. In: : . Paper presented at Proceedings of The Seventh Swedish Language Technology Conference 2018 (SLTC-18), Stockholm, Sweden, 7-9 November 2018.
Open this publication in new window or tab >>Profiling Domain Specificity of Specialized Web Corpora using Burstiness. Explorations and Open Issues
2018 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In this paper we describe an approach to profile the domain specificity of specialized web corpora in Swedish. The proposedapproach is based on burstiness.   Burstiness is a statistical measure that identifies words with uneven distribution across thedocuments of a corpus. We apply burstiness to two medical web corpora that have different size and different domain granularity.Results are promising and show that burstiness is an appropriate measure to profile the domain specificity when matched againstreference lists (gold standards) that represent the target domains. However, further research is needed to find adequate evaluationmetrics, less empirical cut-off points and more principled gold standard design.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-154147 (URN)
Conference
Proceedings of The Seventh Swedish Language Technology Conference 2018 (SLTC-18), Stockholm, Sweden, 7-9 November 2018
Available from: 2019-01-29 Created: 2019-01-29 Last updated: 2019-08-06Bibliographically approved
Santini, M., Jönsson, A., Nyström, M. & Alireza, M. (2017). A Web Corpus for eCare: Collection, Lay Annotation and Learning - First Results. In: : . Paper presented at 2nd International Workshop on Language Technologies and Applications (LTA'17), Prague, Czech Republic, 3-6 September, 2017.
Open this publication in new window or tab >>A Web Corpus for eCare: Collection, Lay Annotation and Learning - First Results
2017 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this position paper, we put forward two claims: 1) it is possible to design a dynamic and extensible corpus without running the risk of getting into scalability problems; 2) it is possible to devise noise-resistant Language Technology applications without affecting performance. To support our claims, we describe the design, construction and limitations of a very specialized medical web corpus, called eCare_Sv_01, and we present two experiments on lay-specialized text classification. eCare_Sv_01 is a small corpus of web documents written in Swedish. The corpus contains documents about chronic diseases. The sublanguage used in each document has been labelled as “lay” or “specialized” by a lay annotator. The corpus is designed as a flexible text resource, where additional medical documents will be appended over time. Experiments show that the lay-specialized labels assigned by the lay annotator are reliably learned by standard classifiers. More specifically, Experiment 1 shows that scalability is not an issue when increasing the size of the datasets to be learned from 156 up to 801 documents. Experiment 2 shows that lay-specialized labels can be learned regardless of the large amount of disturbing factors, such as machine translated documents or low-quality texts that are numerous in the corpus

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-141054 (URN)
Conference
2nd International Workshop on Language Technologies and Applications (LTA'17), Prague, Czech Republic, 3-6 September, 2017
Available from: 2017-09-21 Created: 2017-09-21 Last updated: 2018-01-13Bibliographically approved
Johansson, R. & Jönsson, A. (2017). Consider Clojure: A modern Lisp that runs on Java and Javascript hosts. In: : . Paper presented at PROCEEDINGS OF THE 12’TH SWECOG CONFERENCE, GÖTEBORG, OCTOBER 6-7, 2016.
Open this publication in new window or tab >>Consider Clojure: A modern Lisp that runs on Java and Javascript hosts
2017 (English)Conference paper, Oral presentation only (Refereed)
Series
Studies in Informatics, ISSN 1653-2325 ; 2
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-141055 (URN)978-91-983667-0-9 (ISBN)
Conference
PROCEEDINGS OF THE 12’TH SWECOG CONFERENCE, GÖTEBORG, OCTOBER 6-7, 2016
Available from: 2017-09-21 Created: 2017-09-21 Last updated: 2018-01-13
Santini, M. & Jönsson, A. (2017). E-care@home: Towards a better communication between patients and doctors using Language Technology. In: : . Paper presented at Medicinteknikdagarna, Västerås, Sweden, October 10-11, 2017.
Open this publication in new window or tab >>E-care@home: Towards a better communication between patients and doctors using Language Technology
2017 (English)Conference paper, Oral presentation only (Refereed)
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-141058 (URN)
Conference
Medicinteknikdagarna, Västerås, Sweden, October 10-11, 2017
Available from: 2017-09-21 Created: 2017-09-21 Last updated: 2018-01-13Bibliographically approved
Falkenjack, J., Rennes, E., Fahlborg, D., Johansson, V. & Jönsson, A. (2017). Services for text simplification and analysis. In: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa: . Paper presented at 21st Nordic Conference on Computational Linguistics, NoDaLiDa, Wallenberg Conference Center, Gothenburg, Sweden, May 23-24, 2017 (pp. 309-313). Linköping University Electronic Press, 131, Article ID 044.
Open this publication in new window or tab >>Services for text simplification and analysis
Show others...
2017 (Swedish)In: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, Linköping University Electronic Press, 2017, Vol. 131, p. 309-313, article id 044Conference paper, Published paper (Refereed)
Abstract [en]

We present a language technology service for web editors’ work on making texts easier to understand, including tools for text complexity analysis, text simplification and text summarization. We also present a text analysis service focusing on measures of text complexity.

Place, publisher, year, edition, pages
Linköping University Electronic Press, 2017
Series
Linköping Electronic Conference Proceedings, ISSN 1650-3686, E-ISSN 1650-3740
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:liu:diva-141053 (URN)978-91-7685-601-7 (ISBN)
Conference
21st Nordic Conference on Computational Linguistics, NoDaLiDa, Wallenberg Conference Center, Gothenburg, Sweden, May 23-24, 2017
Available from: 2017-09-21 Created: 2017-09-21 Last updated: 2019-07-03Bibliographically approved
Loutfi, A., Jönsson, A., Karlsson, L., Lind, L., Linden, M., Pecora, F. & Voigt, T. (2016). Ecare@Home: A Distributed Research Environment on Semantic Interoperability. In: Mobyen Uddid Ahmed, Shahina Begum, Wasim Raad (Ed.), Mobyen Uddin Ahmed; Shahina Begum; Wasim Raad (Ed.), Internet of Things Technologies for HealthCare: Third International Conference, HealthyIoT 2016, Västerås, Sweden, October 18-19, 2016, Revised Selected Papers. Paper presented at Third International Conference, HealthyIoT 2016, Västerås, Sweden, October 18-19, 2016 (pp. 3-8). Springer
Open this publication in new window or tab >>Ecare@Home: A Distributed Research Environment on Semantic Interoperability
Show others...
2016 (English)In: Internet of Things Technologies for HealthCare: Third International Conference, HealthyIoT 2016, Västerås, Sweden, October 18-19, 2016, Revised Selected Papers / [ed] Mobyen Uddin Ahmed; Shahina Begum; Wasim Raad, Springer, 2016, p. 3-8Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents the motivation and challenges to developing semantic interoperability for an internet of things network that is used in the context of home based care. The paper describes a research environment which examines these challenges and illustrates the motivation through a scenario whereby a network of devices in the home is used to provide high-level information about elderly patients by leveraging from techniques in context awareness, automated reasoning, and configuration planning.

Place, publisher, year, edition, pages
Springer, 2016
Series
Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Lecture Notes, ISSN 1867-8211, E-ISSN 1867-822X
Keywords
Semantic interoperability, Configuration planning, Health and care, Internet of Things
National Category
Other Medical Engineering
Identifiers
urn:nbn:se:liu:diva-141052 (URN)10.1007/978-3-319-51234-1_1 (DOI)978-3-319-51233-4 (ISBN)978-3-319-51234-1 (ISBN)
Conference
Third International Conference, HealthyIoT 2016, Västerås, Sweden, October 18-19, 2016
Available from: 2017-09-21 Created: 2017-09-21 Last updated: 2019-07-02Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-4899-588X

Search in DiVA

Show all publications