liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Univ British Columbia, Canada.
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-2796-6820
Univ British Columbia, Canada.
2023 (English)In: PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), ASSOC COMPUTING MACHINERY , 2023Conference paper, Published paper (Refereed)
Abstract [en]

The work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows, there has been little research on data preparation in data journalism. We address this gap with a hybrid form of thematic analysis that combines deductive codes derived from existing accounts of data science workflows and inductive codes arising from an interview study with 36 professional data journalists. We extend a previous model of data science work to incorporate detailed activities of data preparation. We synthesize 60 dirty data issues from 16 taxonomies on dirty data and our interview data, and we provide a novel taxonomy to characterize these dirty data issues as discrepancies between mental models. We also identify four challenges faced by journalists: diachronic, regional, fragmented, and disparate data sources.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY , 2023.
Keywords [en]
data journalism; data science; data wrangling; data cleaning; thematic analysis
National Category
Media Engineering
Identifiers
URN: urn:nbn:se:liu:diva-198993DOI: 10.1145/3544548.3581271ISI: 001048393802036ISBN: 9781450394215 (print)OAI: oai:DiVA.org:liu-198993DiVA, id: diva2:1810275
Conference
CHI conference on Human Factors in Computing Systems (CHI), Hamburg, GERMANY, apr 23-28, 2023
Note

Funding Agencies|NSERC [RGPIN-2014-06309]; Wallenberg AI, Autonomous Systems and Software Program (WASP)

Available from: 2023-11-07 Created: 2023-11-07 Last updated: 2024-11-22

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Berret, Charles
By organisation
Media and Information TechnologyFaculty of Science & Engineering
Media Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 58 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf