liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interpretable Word Embeddings via Informative Priors
Linköping University, Department of Management and Engineering, The Institute for Analytical Sociology, IAS. Linköping University, Faculty of Arts and Sciences.
Linköping University, The Institute for Analytical Sociology, IAS. Linköping University, Faculty of Arts and Sciences.ORCID iD: 0000-0003-4648-2829
Department of Computer Science, Aalto University, Finland.
2019 (English)In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) / [ed] Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan, Association for Computational Linguistics, 2019, Vol. D19-1, p. 6324-6330, article id D19-1661Conference paper, Published paper (Refereed)
Abstract [en]

Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensible priors can capture latent semantic concepts better than or on-par with the current state of the art, while retaining the simplicity and generalizability of using priors.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2019. Vol. D19-1, p. 6324-6330, article id D19-1661
National Category
Natural Language Processing Peace and Conflict Studies Other Social Sciences not elsewhere specified Sociology (excluding Social Work, Social Psychology and Social Anthropology)
Identifiers
URN: urn:nbn:se:liu:diva-161824ISI: 000854193306072Scopus ID: 2-s2.0-85084290483OAI: oai:DiVA.org:liu-161824DiVA, id: diva2:1369269
Conference
Empirical Methods in Natural Language Processing
Funder
Swedish Research Council, 2018–05170Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2025-02-20Bibliographically approved
In thesis
1. Beyond Generative Sufficiency: On Interactions, Heterogeneity & Middle-Range Dynamics
Open this publication in new window or tab >>Beyond Generative Sufficiency: On Interactions, Heterogeneity & Middle-Range Dynamics
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Explaining how properties at the level of individuals translate into properties at the level of collectives is a core objective of sociology. Because the social world is characterized by complex webs of social interdependencies, establishing how micro and macro are related to one another requires a detailed understanding of how individuals are influenced by their social environments and the consequences that such influences have for the dynamics of the social process. However, until very recently, it has been difficult to conduct detailed empirical investigations of micro-macro linkages due to the lack of large-scale data containing information on how individuals interact with one another. In the absence of such data, substantive research has tended to (a) focus its attention elsewhere: studying how social factors influence individual outcomes, rather than how actors in interaction with one another bring about collective outcomes, or (b) propose models of micro-macro linkages that—for reasons of parsimony and tractability—often assume artificially high levels of homogeneity. Against this background, this thesis sets out to investigate, first, how the data and tools that have emerged from the digital and computational revolution can help sociologists construct empirically well-founded mappings from the micro to the macro level, and second, how the conclusions about the role of social interdependencies and networks change when the analysis is informed by real-world heterogeneities.

In the introductory chapter, a conceptual and analytical framework for studying micro-macro processes is proposed that integrates the theoretical principles of analytical sociology with the data and methods of computational social science. This framework constitutes the foundation of the thesis. It is used in Essays I-III, and it is methodologically built upon in Essay IV.

In Essay I, the role of social networks in labor-market segregation processes is examined. Scholarship on labor-market segregation commonly assume that social networks have a segregating effect because of homophilous selection tendencies in network-based recruitment. Using large-scale register data and focusing attention on individuals’ heterogenous opportunities to form same-category ties in different workplaces, Essay I finds that opportunity structures often dominate homophilic preferences. In particular, a mechanism is identified which shows—in contradiction with the main tenet of previous research—that networks often reduce rather than increase segregation by triggering mobility events that counteract the impact of segregating mobility events.

Essay II examines the conditions under which social influence can decouple adoption behaviour from individual preferences and thereby bring about unexpected collective outcomes. Prior research has shown that such decoupling can occur, but conflicting evidence and implicit assumptions of strong homogeneity mean that we still know little about the conditions under which this is likely to occur in the real world. Addressing these limitations, this study uses fine-grained, real-world behavioural data from Spotify to estimate heterogeneous social influence effects conditional on properties of individuals’ social environments, and then examine their macro-implications in empirically calibrated simulations. It is found that partial overlap in preferences and strong social ties between the senders and receivers of social influence is needed for social influence to produce decoupling.

Essay III centers on the phenomenon of urban scaling and examines the relationship between within-city and between-city inequality. Previous urban scaling research has documented how cities’ total outputs increase more than proportionally with city size and has proposed theoretical models which demonstrate impressive predictive accuracy at aggregate levels. However, this research has overlooked the stark inequalities that exist within cities. Using microdata from multiple countries, it is found that between 36–80% of the previously reported scaling effects can be explained by differences in the distributional tails of cities. Providing explanatory depth to these findings, a cumulative advantage mechanism is identified which elucidates one important channel through which differences in the size of cities’ tails emerge.

In Essay IV, a method is proposed for inferring theoretically meaningful dimensions from complex high-dimensional data such as text. The results show that the method captures latent semantic concepts better than or on-par with the current state of the art. For the study of social interactions, the method constitutes a new and potentially important tool for inferring theoretically meaningful dimensions about individuals and their social environments, and in so doing, improves our ability to adjust for specific types of homophily and enables richer and more precise measures of heterogeneity in social interaction processes.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022. p. 69
Series
Linköping Studies in Arts and Sciences, ISSN 0282-9800 ; 836Institute for Analytical Sociology Dissertation Series, ISSN 2004-268X, E-ISSN 2004-2698 ; 04
Keywords
Collective dynamics, Social networks, Heterogeneity, Digital trace data, Agent-based simulation, Social mechanisms, Analytical sociology, Computational social science
National Category
Sociology
Identifiers
urn:nbn:se:liu:diva-184330 (URN)10.3384/9789179293314 (DOI)9789179293307 (ISBN)9789179293314 (ISBN)
Public defence
2022-05-11, Kåkenhus, K1 room, Campus Norrköping, Norrköping, 14:00 (English)
Opponent
Supervisors
Available from: 2022-04-13 Created: 2022-04-13 Last updated: 2024-04-10Bibliographically approved
2. Mining for Meaning: using computational text analysis for social inquiry
Open this publication in new window or tab >>Mining for Meaning: using computational text analysis for social inquiry
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

People interpret their surroundings through associations, determining what they perceive as belonging or not belonging together. For instance, one individual may view immigrants as a beneficial addition to the domestic labor market, while another may perceive them as a threat to job opportunities for native citizens. Despite differing viewpoints on immigration, these individuals share a similar economic interpretation of immigration as a concept. Explaining how these interpretations develop and evolve is a fundamental and open question related to the social world.

For a long time, people’s interpretations of the world have been hidden away in their minds, and researchers have primarily relied on surveys to try to measure them. However, individuals and groups leave behind traces of their understandings of the world in their communication and written expressions. Consequently, textual data hold immense potential for sociological research. This thesis pursues three primary objectives. First, to discuss the use of text data for social inquiry. Second, to introduce and explore intrinsically interpretable text models for sociological inquiry. Third, to explore rigorous ways of studying meaning and meaning-making in the Swedish immigration discourse using computational text analysis. The introductory chapter and four research articles presented in this thesis all speak to at least one of these aims.

Essay I addresses the question of how researchers can assess the data quality of a corpus to determine its suitability for addressing research questions. Drawing inspiration from survey research, this essay presents a general approach to evaluating the scientific value of a given text dataset. The framework outlined in this essay delineates potential errors that could affect the reliability and validity of any measures derived from a corpus, and offers methods for quantifying some of them.

Essay II presents a novel extension to standard word embedding models. Our extension gives researchers the ability to study how the meaning of words relates to pre-specified binary dimensions. We find that our proposed intrinsically interpretable model outperforms current standard approaches on classification tasks related to sentiment and gender. The methodology presented in Essay II will thus help sociologists to measure and test theories pertaining to binary concepts.

Essay III contributes to the ongoing discussions in sociology regarding the identification of more formal ways to measure aggregate-level meanings. This essay traces prevailing frames of immigration in Swedish national news media from the end of WorldWar II until 2019, providing an unprecedented macro-level perspective on immigration frames. The analysis indicates that the framing of immigration in the Swedish media changes following periods of rupture rather than single events.

Essay IV delves into the mechanisms that influence changes in online discussions on Flashback following Jihadist terrorist attacks. We examine two mechanisms: changes in discussion content (within-individual change) and changes in the composition of discussion participants (compositional change). Our findings reveal that interpretations of immigration related to culture and security become more prominent following terror attacks, and that both of the mechanisms examined play a role in shaping post-attack discussions.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2024. p. 78
Series
Linköping Studies in Arts and Sciences, ISSN 0282-9800 ; 879Institute for Analytical Sociology Dissertation Series, ISSN 2004-268X, E-ISSN 2004-2698 ; 8
Keywords
Text-as-data, Analytical sociology, Meaning-making, Computational text analysis, Computational social science
National Category
Sociology (excluding Social Work, Social Psychology and Social Anthropology)
Identifiers
urn:nbn:se:liu:diva-202422 (URN)10.3384/9789180756181 (DOI)9789180756174 (ISBN)9789180756181 (ISBN)
Public defence
2024-05-13, Online through Zoom (contact madelene.topfer@liu.se) and K4, Kåkenhus, Campus Norrköping, Norrköping, 14:00 (English)
Opponent
Supervisors
Note

Funding agency: The Swedish Research Council (2018–05170). The computations and data storage were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2018-05973 and no. 2022-06725.

Available from: 2024-04-10 Created: 2024-04-10 Last updated: 2024-04-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Hurtado Bodell, MiriamArvidsson, Martin

Search in DiVA

By author/editor
Hurtado Bodell, MiriamArvidsson, Martin
By organisation
The Institute for Analytical Sociology, IASFaculty of Arts and SciencesThe Institute for Analytical Sociology, IAS
Natural Language ProcessingPeace and Conflict StudiesOther Social Sciences not elsewhere specifiedSociology (excluding Social Work, Social Psychology and Social Anthropology)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 634 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf