liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Robust Visual Learning across Class Imbalance and Distributional Shift
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0001-9874-737X
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computer vision aims to equip machines with perceptual understanding—detecting, recognizing, localizing, and relating visual entities to existing sources of knowledge. Machine learning provides the mechanism: models learn representations and decision rules from data and are expected to generalize beyond the training distribution. These systems already support biodiversity monitoring, autonomous driving, and geospatial mapping. In practice, however, textbook assumptions break down: the concept space is vast, data is sparse and imbalanced, many categories are rare, and high-quality annotations are costly. In addition, deployment conditions shift over time—class frequencies and visual domains evolve—biasing models toward frequent scenarios and eroding reliability.

In this work, we develop methods for training reliable visual recognition models under more realistic conditions: class imbalance, limited labeled data, and distribution shift. Our contributions span three themes: (1) debiasing strategies for imbalanced classification that remain reliable under changes in class priors; (2) semi-supervised learning techniques tailored to imbalanced data to reduce annotation cost while preserving minority-class performance; and (3) a unified multimodal retrieval approach for remote sensing (RS) that narrows the domain gap.

In Paper A, we study long-tailed image recognition, where skewed training data biases classifiers toward frequent classes. During deployment, changes in class priors can further amplify this bias. We propose an ensemble of skill-diverse experts, each trained under a distinct target prior, and aggregate their predictions to balance head and tail performance. We theoretically show that the ensemble’s prior bias equals the mean expert bias and that choosing complementary target priors cancels it, yielding an unbiased predictor that minimizes balanced error. With calibrated experts—achieved in practice via Mixup—the ensemble attains state-of-the-art accuracy and remains reliable under label shift.

In Paper B, we investigate long-tailed recognition in the semi-supervised setting, where a small, imbalanced labeled set is paired with a large unlabeled pool. Semi-supervised learning leverages unlabeled data to reduce annotation costs, typically through pseudo-labeling, but the unlabeled class distribution is often unknown and skewed. Naïve pseudo-labeling propagates the labeled bias, reinforcing head classes and overlooking rare ones. We propose a flexible distribution-alignment framework that estimates the unlabeled class mix online and reweights pseudo-labels accordingly, guiding the model first toward the unlabeled distribution to stabilize training and then toward a balanced classifier for fair inference. The proposed approach leverages unlabeled data more effectively, improving accuracy, calibration, and robustness to unknown unlabeled priors.

In Paper C, we move beyond recognition to unified multimodal retrieval for remote sensing—a domain with scarce image–text annotations and a challenging shift from natural images. Prior solutions are fragmented: RS dual encoders lack interleaved input support; universal embedders miss spatial metadata and degrade under domain shift; and RS generative assistants reason over regions but lack scalable retrieval. To overcome these limitations, we introduce VLM2GeoVec, a single-encoder, instruction-following embedder that aligns images, text, regions, and geocoordinates in a shared space. For comprehensive evaluation, we also propose RSMEB, a unified retrieval benchmark that spans conventional tasks (e.g., classification, cross-modal retrieval) and novel interleaved tasks (e.g., visual grounding, spatial localization, semantic geo-localization). In RSMEB, VLM2GeoVec narrows the domain gap relative to universal embedders and matches specialized baselines in conventional tasks in zero-shot settings. It further enables interleaved spatially-aware search, delivering several-fold gains in metadata-aware RS applications.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2025. , p. 67
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2487
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-219564DOI: 10.3384/9789181183085ISBN: 9789181183078 (print)ISBN: 9789181183085 (electronic)OAI: oai:DiVA.org:liu-219564DiVA, id: diva2:2014470
Public defence
2025-12-17, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agency: The Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by the Knut and Alice Wallenberg Foundation

Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2025-11-18Bibliographically approved
List of papers
1. Balanced Product of Calibrated Experts for Long-Tailed Recognition
Open this publication in new window or tab >>Balanced Product of Calibrated Experts for Long-Tailed Recognition
2023 (English)In: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2023, p. 19967-19977Conference paper, Published paper (Refereed)
Abstract [en]

Many real-world recognition problems are characterized by long-tailed label distributions. These distributions make representation learning highly challenging due to limited generalization over the tail classes. If the test distribution differs from the training distribution, e.g. uniform versus long-tailed, the problem of the distribution shift needs to be addressed. A recent line of work proposes learning multiple diverse experts to tackle this issue. Ensemble diversity is encouraged by various techniques, e.g. by specializing different experts in the head and the tail classes. In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE). BalPoE combines a family of experts with different test-time target distributions, generalizing several previous approaches. We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions, by proving that the ensemble is Fisher-consistent for minimizing the balanced error. Our theoretical analysis shows that our balanced ensemble requires calibrated experts, which we achieve in practice using mixup. We conduct extensive experiments and our method obtains new state-of-the-art results on three long-tailed datasets: CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018. Our code is available at https://github.com/emasa/BalPoE-CalibratedLT.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2023
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-199347 (URN)10.1109/CVPR52729.2023.01912 (DOI)001062531304028 ()9798350301298 (ISBN)9798350301304 (ISBN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, CANADA, jun 17-24, 2023
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2023-11-28 Created: 2023-11-28 Last updated: 2025-11-18
2. Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Open this publication in new window or tab >>Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Show others...
2024 (English)In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIV / [ed] Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol, Springer Nature Switzerland , 2024, Vol. 15112, p. 307-327Conference paper, Published paper (Refereed)
Abstract [en]

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.

Place, publisher, year, edition, pages
Springer Nature Switzerland, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15112
National Category
Computer Systems
Identifiers
urn:nbn:se:liu:diva-209223 (URN)10.1007/978-3-031-72949-2_18 (DOI)001352860600018 ()2-s2.0-85208545165 (Scopus ID)9783031729485 (ISBN)9783031729492 (ISBN)
Conference
18th European Conference, Milan, Italy, September 29–October 4, 2024
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2024-11-06 Created: 2024-11-06 Last updated: 2025-11-18

Open Access in DiVA

fulltext(10062 kB)114 downloads
File information
File name FULLTEXT01.pdfFile size 10062 kBChecksum SHA-512
57dcc38f718de7a91efc0cd845e595a2067a0bc445efa1be14fc57c3f1f2e40f23ad60e81b87fa9dfc74f57d6e5e7f006a68f177b9a96d389dad0682a0fb09bc
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Sánchez Aimar, Emanuel

Search in DiVA

By author/editor
Sánchez Aimar, Emanuel
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1412 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf