liu.seSearch for publications in DiVA
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0003-3428-6564
Arriver Sweden Software AB.
Arriver Sweden Software AB.
Linköping University, Department of Electrical Engineering, Automatic Control. Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0003-3749-5820
2024 (English)In: 35th British Machine Vision Conference 2024, Glasgow, UK, November 25-28, 2024, 2024Conference paper, Oral presentation only (Other academic)
Abstract [en]

A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads to the representation collapse problem. Regularizing the distribution of data points over the clusters is the prevalent strategy to avoid this issue. While this is sufficient to prevent full representation collapse, we show that a partial prototype collapse problem still exists in the DINO family of methods, that leads to significant redundancies in the prototypes. Such prototype redundancies serve as shortcuts for the method to achieve a marginal latent class distribution that matches the prescribed prior. We show that by encouraging the model to use diverse prototypes, the partial prototype collapse can be mitigated. We study the downstream impact of effective utilization of the prototypes during pre-training. We show that it enables the methods to learn more fine-grained clusters, encouraging more informative representations. We demonstrate that this is especially beneficial when pre-training on a long-tailed fine-grained dataset.

Place, publisher, year, edition, pages
2024.
Keywords [en]
self-supervised learning, vision transformers, long-tailed classification, dino, collapse, few-shot learning, clustering based methods, representation learning
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-211237OAI: oai:DiVA.org:liu-211237DiVA, id: diva2:1932134
Conference
British Machine Vision Conference (BMVC) 2024
Funder
Swedish Research Council, 2020-04122Wallenberg AI, Autonomous Systems and Software Program (WASP)ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsAvailable from: 2025-01-28 Created: 2025-01-28 Last updated: 2025-02-13

Open Access in DiVA

No full text in DiVA

Other links

Konferensens hemsida / Link to conference

Authority records

Govindarajan, HariprasathLindsten, Fredrik

Search in DiVA

By author/editor
Govindarajan, HariprasathLindsten, Fredrik
By organisation
The Division of Statistics and Machine LearningFaculty of Science & EngineeringAutomatic Control
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 228 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf