liu.seSearch for publications in DiVA
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Experimenting with modeling-specific word embeddings
Linköping University, Department of Computer and Information Science, Software and Systems. Linköping University, Faculty of Science & Engineering.
Univ Murcia, Spain.
Univ Murcia, Spain.
2024 (English)In: Software and Systems Modeling, ISSN 1619-1366, E-ISSN 1619-1374Article in journal (Refereed) Epub ahead of print
Abstract [en]

The application of machine learning techniques to address MDE problems often requires transforming raw information (e.g., software models) to a numerical representation which can be used by machine learning algorithms. To this end, pretrained embeddings are a key technology to facilitate the construction of such applications. However, previous works have demonstrated that these embeddings struggle to generalize effectively in the MDE domain due to their training on general-purpose corpora. To tackle this issue, we developed WordE4MDE , which are specialized word embeddings trained specifically on modeling documents. In this study, we aim to overcome several limitations of WordE4MDE and conduct additional experiments to assess its efficacy. Key limitations we address include: (1) mitigating the out-of-vocabulary issue through the utilization of sub-word embeddings, (2) adding contextualization to the embeddings by training a BERT model on our specific modeling corpus and (3) addressing the constraint of limited training data by investigating the augmentation of our modeling corpus with StackOverflow and StackExchange data.

Place, publisher, year, edition, pages
SPRINGER HEIDELBERG , 2024.
Keywords [en]
Embeddings; Classification; Clustering; Recommendation; Machine Learning; Model-Driven Engineering
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:liu:diva-210726DOI: 10.1007/s10270-024-01250-5ISI: 001376228600001Scopus ID: 2-s2.0-85212045326OAI: oai:DiVA.org:liu-210726DiVA, id: diva2:1926131
Note

Funding Agencies|Agencia Estatal de Investigacin [TED2021-129381B-C22, MCIN/AEI/10.1 3039/501100011033, PID2022-140109NB-I00, MCIN/AEI/10.13039/5011000 11033]; FEDER/UE [CNS2022-135578, MICIU/AEI/10.13039/501100011033]

Available from: 2025-01-10 Created: 2025-01-10 Last updated: 2025-01-10

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Hernández López, José Antonio
By organisation
Software and SystemsFaculty of Science & Engineering
In the same journal
Software and Systems Modeling
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 37 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf