liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0001-9874-737X
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0009-0003-4516-5685
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6857-0152
Linköping University, Faculty of Science & Engineering. Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems.ORCID iD: 0000-0002-2492-9872
Show others and affiliations
2024 (English)In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIV / [ed] Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol, Springer Nature Switzerland , 2024, Vol. 15112, p. 307-327Conference paper, Published paper (Refereed)
Abstract [en]

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.

Place, publisher, year, edition, pages
Springer Nature Switzerland , 2024. Vol. 15112, p. 307-327
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15112
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:liu:diva-209223DOI: 10.1007/978-3-031-72949-2_18ISI: 001352860600018Scopus ID: 2-s2.0-85208545165ISBN: 9783031729485 (print)ISBN: 9783031729492 (electronic)OAI: oai:DiVA.org:liu-209223DiVA, id: diva2:1911046
Conference
18th European Conference, Milan, Italy, September 29–October 4, 2024
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2024-11-06 Created: 2024-11-06 Last updated: 2025-11-18
In thesis
1. Robust Visual Learning across Class Imbalance and Distributional Shift
Open this publication in new window or tab >>Robust Visual Learning across Class Imbalance and Distributional Shift
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computer vision aims to equip machines with perceptual understanding—detecting, recognizing, localizing, and relating visual entities to existing sources of knowledge. Machine learning provides the mechanism: models learn representations and decision rules from data and are expected to generalize beyond the training distribution. These systems already support biodiversity monitoring, autonomous driving, and geospatial mapping. In practice, however, textbook assumptions break down: the concept space is vast, data is sparse and imbalanced, many categories are rare, and high-quality annotations are costly. In addition, deployment conditions shift over time—class frequencies and visual domains evolve—biasing models toward frequent scenarios and eroding reliability.

In this work, we develop methods for training reliable visual recognition models under more realistic conditions: class imbalance, limited labeled data, and distribution shift. Our contributions span three themes: (1) debiasing strategies for imbalanced classification that remain reliable under changes in class priors; (2) semi-supervised learning techniques tailored to imbalanced data to reduce annotation cost while preserving minority-class performance; and (3) a unified multimodal retrieval approach for remote sensing (RS) that narrows the domain gap.

In Paper A, we study long-tailed image recognition, where skewed training data biases classifiers toward frequent classes. During deployment, changes in class priors can further amplify this bias. We propose an ensemble of skill-diverse experts, each trained under a distinct target prior, and aggregate their predictions to balance head and tail performance. We theoretically show that the ensemble’s prior bias equals the mean expert bias and that choosing complementary target priors cancels it, yielding an unbiased predictor that minimizes balanced error. With calibrated experts—achieved in practice via Mixup—the ensemble attains state-of-the-art accuracy and remains reliable under label shift.

In Paper B, we investigate long-tailed recognition in the semi-supervised setting, where a small, imbalanced labeled set is paired with a large unlabeled pool. Semi-supervised learning leverages unlabeled data to reduce annotation costs, typically through pseudo-labeling, but the unlabeled class distribution is often unknown and skewed. Naïve pseudo-labeling propagates the labeled bias, reinforcing head classes and overlooking rare ones. We propose a flexible distribution-alignment framework that estimates the unlabeled class mix online and reweights pseudo-labels accordingly, guiding the model first toward the unlabeled distribution to stabilize training and then toward a balanced classifier for fair inference. The proposed approach leverages unlabeled data more effectively, improving accuracy, calibration, and robustness to unknown unlabeled priors.

In Paper C, we move beyond recognition to unified multimodal retrieval for remote sensing—a domain with scarce image–text annotations and a challenging shift from natural images. Prior solutions are fragmented: RS dual encoders lack interleaved input support; universal embedders miss spatial metadata and degrade under domain shift; and RS generative assistants reason over regions but lack scalable retrieval. To overcome these limitations, we introduce VLM2GeoVec, a single-encoder, instruction-following embedder that aligns images, text, regions, and geocoordinates in a shared space. For comprehensive evaluation, we also propose RSMEB, a unified retrieval benchmark that spans conventional tasks (e.g., classification, cross-modal retrieval) and novel interleaved tasks (e.g., visual grounding, spatial localization, semantic geo-localization). In RSMEB, VLM2GeoVec narrows the domain gap relative to universal embedders and matches specialized baselines in conventional tasks in zero-shot settings. It further enables interleaved spatially-aware search, delivering several-fold gains in metadata-aware RS applications.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. p. 67
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2487
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-219564 (URN)10.3384/9789181183085 (DOI)9789181183078 (ISBN)9789181183085 (ISBN)
Public defence
2025-12-17, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agency: The Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by the Knut and Alice Wallenberg Foundation

Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2025-11-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Sanchez Aimar, EmanuelHelgesen, NathanielXu, YonghaoKuhlmann, MarcoFelsberg, Michael

Search in DiVA

By author/editor
Sanchez Aimar, EmanuelHelgesen, NathanielXu, YonghaoKuhlmann, MarcoFelsberg, Michael
By organisation
Computer VisionFaculty of Science & EngineeringArtificial Intelligence and Integrated Computer Systems
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 381 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf