liu.seSearch for publications in DiVA
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Ensembles of GANs for synthetic training data generation
Linköpings universitet, Institutionen för teknik och naturvetenskap, Medie- och Informationsteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-9217-9997
Linköpings universitet, Institutionen för teknik och naturvetenskap, Medie- och Informationsteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0003-0298-937X
Linköpings universitet, Institutionen för teknik och naturvetenskap, Medie- och Informationsteknik. Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.ORCID-id: 0000-0002-9368-0177
Linköpings universitet, Institutionen för teknik och naturvetenskap, Medie- och Informationsteknik. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0002-7765-1747
2021 (engelsk)Konferansepaper, Oral presentation with published abstract (Fagfellevurdert)
Abstract [en]

Insufficient training data is a major bottleneck for most deep learning practices, not least in medical imaging where data is difficult to collect and publicly available datasets are scarce due to ethics and privacy. This work investigates the use of synthetic images, created by generative adversarial networks (GANs), as the only source of training data. We demonstrate that for this application, it is of great importance to make use of multiple GANs to improve the diversity of the generated data, i.e. to sufficiently cover the data distribution. While a single GAN can generate seemingly diverse image content, training on this data in most cases lead to severe over-fitting. We test the impact of ensembled GANs on synthetic 2D data as well as common image datasets (SVHN and CIFAR-10), and using both DCGANs and progressively growing GANs. As a specific use case, we focus on synthesizing digital pathology patches to provide anonymized training data.

sted, utgiver, år, opplag, sider
2021.
HSV kategori
Identifikatorer
URN: urn:nbn:se:liu:diva-175900OAI: oai:DiVA.org:liu-175900DiVA, id: diva2:1557585
Konferanse
ICLR 2021 workshop on Synthetic Data Generation: Quality, Privacy, Bias
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)Vinnova, grant 2019-05144 and grant 2017-02447(AIDA)ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsTilgjengelig fra: 2021-05-26 Laget: 2021-05-26 Sist oppdatert: 2022-01-17
Inngår i avhandling
1. Synthetic data for visual machine learning: A data-centric approach
Åpne denne publikasjonen i ny fane eller vindu >>Synthetic data for visual machine learning: A data-centric approach
2022 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Deep learning allows computers to learn from observations, or else training data. Successful application development requires skills in neural network design, adequate computational resources, and a training data distribution that covers the application do-main. We are currently witnessing an artificial intelligence (AI) outbreak with enough computational power to train very deep networks and build models that achieve similar or better than human performance. The crucial factor for the algorithms to succeed has proven to be the training data fed to the learning process. Too little or low quality or out-of-the-target distribution data will lead to poorly performing models no matter the capacity and the data regularization methods.

This thesis is a data-centric approach to AI and presents a set of contributions related to synthesizing images for training supervised visual machine learning. It is motivated by the profound potential of synthetic data in cases of low availability of captured data, expensive acquisition and annotation, and privacy and ethical issues. The presented work aims to generate images similar to samples drawn from the target distribution and evaluate the generated data as the sole training data source and in conjunction with captured imagery. For this, two synthesis methods are explored: computer graphics and generative modeling. Computer graphics-based generation methods and synthetic datasets for computer vision tasks are thoroughly reviewed. In the same context, a system employing procedural modeling and physically-based rendering is introduced for data generation for urban scene understanding. The scheme is flexible, easily scalable, and produces complex and diverse images with pixel-perfect annotations at no cost. Generative Adversarial Networks (GANs) are also used to generate images for small data scenarios augmentation. The strategy advances the model’s performance and robustness. Finally, ensembles of independently trained GANs investigate ways to improve images’ diversity and create synthetic data to serve as the only training source.

The application areas of the presented contributions relate to two image modalities, natural and histopathology images, to cover different aspects in the generation methods and the tasks’ characteristics and requirements. There are showcased synthesized examples of natural images for automotive applications and weather classification, and histopathology images for breast cancer and colon adenocarcinoma metastasis detection. This thesis, as a whole, promotes data-centric supervised deep learning development by highlighting the potential of synthetic data as a training data resource. It emphasizes the control over the formation process, the ability of multi-modality formats, and the automatic generation of annotations.

sted, utgiver, år, opplag, sider
Linköping: Linköping University Electronic Press, 2022. s. 115
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2202
Emneord
Training data, Synthetic images, Computer graphics, Generative modeling, Natural images, Histopathology, Digital pathology, Machine learning, Deep learning
HSV kategori
Identifikatorer
urn:nbn:se:liu:diva-182336 (URN)10.3384/9789179291754 (DOI)9789179291747 (ISBN)9789179291754 (ISBN)
Disputas
2022-02-14, Domteatern, Visualiseringscenter C, Kungsgatan 54, Norrköping, 09:15 (engelsk)
Opponent
Veileder
Merknad

ISBN for PDF has been added in the PDF-version.

Tilgjengelig fra: 2022-01-17 Laget: 2022-01-17 Sist oppdatert: 2025-02-09bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

https://arxiv.org/abs/2104.11797

Person

Eilertsen, GabrielTsirikoglou, ApostoliaLundström, ClaesUnger, Jonas

Søk i DiVA

Av forfatter/redaktør
Eilertsen, GabrielTsirikoglou, ApostoliaLundström, ClaesUnger, Jonas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 608 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf