liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Conceptualizing Treatment Leakage in Text-based Causal Inference
Linköping University, Department of Management and Engineering, The Institute for Analytical Sociology, IAS. Linköping University, Faculty of Arts and Sciences. Harvard Univ, MA 02138 USA; Chalmers Univ Technol, Sweden.
Linköping University, Department of Management and Engineering, The Institute for Analytical Sociology, IAS. Linköping University, Faculty of Arts and Sciences. Harvard Univ, MA 02138 USA.
Univ Gothenburg, Sweden; Chalmers Univ Technol, Sweden.
2022 (English)In: NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2022, p. 5638-5645Conference paper, Published paper (Refereed)
Abstract [en]

Causal inference methods that control for text-based confounders are becoming increasingly important in the social sciences and other disciplines where text is readily available. However, these methods rely on a critical assumption that there is no treatment leakage: that is, the text only contains information about the confounder and no information about treatment assignment. When this assumption does not hold, methods that control for text to adjust for confounders face the problem of posttreatment (collider) bias. However, the assumption that there is no treatment leakage may be unrealistic in real-world situations involving text, as human language is rich and flexible. Language appearing in a public policy document or health records may refer to the future and the past simultaneously, and thereby reveal information about the treatment assignment. In this article, we define the treatment-leakage problem, and discuss the identification as well as the estimation challenges it raises. Second, we delineate the conditions under which leakage can be addressed by removing the treatment-related signal from the text in a preprocessing step we define as text distillation. Lastly, using simulation, we show how treatment leakage introduces a bias in estimates of the average treatment effect (ATE) and how text distillation can mitigate this bias.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2022. p. 5638-5645
National Category
Information Studies
Identifiers
URN: urn:nbn:se:liu:diva-189966ISI: 000859869505055ISBN: 9781955917711 (print)OAI: oai:DiVA.org:liu-189966DiVA, id: diva2:1711230
Conference
Conference of the North-American-Chapter-of-the-Association-for-Computational-Linguistics (NAAACL) - Human Language Technologies, Seattle, WA, jul 10-15, 2022
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Royal Swedish Academy of Letters, History and Antiquities

Available from: 2022-11-16 Created: 2022-11-16 Last updated: 2022-11-16

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Daoud, AdelJerzak, Connor
By organisation
The Institute for Analytical Sociology, IASFaculty of Arts and Sciences
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 118 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf