liu.seSearch for publications in DiVA
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Robot Vision under Insufficient Data
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-3434-2522
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Machine learning is used today in a wide variety of applications, especially within computer vision, robotics, and autonomous systems. Example use cases include detecting people or other objects using cameras in autonomous vehicles, or navigating robots through collision-free paths to solve different tasks. The flexibility of machine learning is attractive as it can be applied to a wide variety of challenging tasks, without detailed prior knowledge of the problem domain. However, training machine learning models requires vast amounts of data, which leads to a significant manual effort, both for collecting the data and for annotating it. 

In this thesis, we study and develop methods for training machine learning models under in-sufficient data within computer vision, robotics, and autonomous systems, for the purpose of reducing the manual effort. In summary, we study (1) weakly-supervised learning for reducing the annotation cost, (2) methods for reducing model bias under highly imbalanced training data,(3) methods for obtaining trustworthy uncertainty estimates, and (4) the use of simulated and semi-virtual environments for reducing the amount of real-world data in reinforcement learning. 

In the first part of this thesis, we investigate how weakly-supervised learning can be used within image segmentation. In contrast to fully supervised learning, weakly-supervised learning uses a weaker form of annotation, which reduces the annotation effort. Typically, in image segmentation, each object needs to be precisely annotated in every image on the pixel level. Creating this type of annotation is both time consuming and costly. In weakly-supervised segmentation, however, the only information required is which objects are depicted in the images. This significantly reduces the annotation time. In Papers A and B, we propose two loss functions for improving the predicted object segmentations, especially their contours, in weakly-supervised segmentation. 

In the next part of the thesis, we tackle class imbalance in image classification. During data collection, some classes naturally occur more frequently than others, which leads to an imbalance in the amount of data between the different classes. Models trained on such datasets may become biased towards the more common classes. Overcoming this effect by collecting more data of the rare classes may take a very long time. Instead, we develop an ensemble method for image classification in Paper C, which is unbiased despite being trained on highly imbalanced data. 

When using machine learning models within autonomous systems, a desirable property for them is to predict trustworthy uncertainty estimates. This is especially important when the training data is limited, as the probability for encountering previously unseen cases is large. In short, a model making a prediction with a certain confidence should be correct with the corresponding probability. This is not the case in general, as machine learning models are notorious for predicting overconfident uncertainty estimates. We apply methods for improving the uncertainty estimates for classification in Paper C and for regression in Paper D. 

In the final part of this thesis, we utilize reinforcement learning for teaching a robot to perform coverage path planning, e.g. for lawn mowing or search-and-rescue. In reinforcement learning, the robot interacts with an environment and gets rewards based on how well it solves the task. Initially, its actions are random, which improve over time as it explores the environment and gathers data. It typically takes a long time for this learning process to converge. This is problematic in real-world environments where the robot needs to operate during the full duration, which may require human supervision. At the same time, a large variety in the training data is important for generalisation, which is difficult to achieve in real-world environments. Instead, we utilize a simulated environment in Paper E for accelerating the training process, where we procedurally generate random environments. To simplify the transfer from simulation to reality, we fine-tune the model in a semi-virtual indoor environment on the real robot in Paper F. 

Abstract [sv]

Maskininlärning används idag i bred utsträckning inom många områden, och i synnerhet in-om datorseende, robotik, och autonoma system. Det kan till exempel användas för att detektera människor och andra föremål med kameror i autonoma bilar, eller för att styra robotar längs kollisionsfria banor för att lösa diverse uppgifter. Flexibiliteten i maskininlärning är attraktiv då den kan tillämpas för att lösa svåra problem utan detaljkännedom inom problemdomänen i fråga. Dock krävs en stor mängd data för att träna maskininlärningsmodeller, vilket medför en stor manuell arbetsbörda, dels för att samla in data, och dels för att annotera insamlade data.

I denna avhandling undersöker och utvecklar vi metoder för att träna maskininlärningsmodeller med begränsad tillgång till data inom datorseende, robotik och autonoma system, i syfte att minska den manuella arbetsbördan. Sammanfattningsvis undersöker vi (1) svagt väglett läran-de för att minska annoteringstiden, (2) metoder som är opartiska under högt obalanserade data,(3) metoder för att erhålla pålitliga osäkerhetsskattningar, och (4) simulerings- och semivirtuella miljöer för att minska mängden riktiga data för förstärkningsinlärning.

I den första delen av avhandlingen undersöker vi hur svagt väglett lärande (eng. weakly-supervised learning) kan användas inom bildsegmentering. Till skillnad från fullt väglett lärande används en svagare annoteringsform, vilket medför en minskning i den manuella annoterings-bördan. För bildsegmentering krävs i vanliga fall en noggrann annotering av varje enskilt objekt i varje bild på pixelnivå. Att skapa denna typ av annotering är både tidskrävande och kostsam. Med svagt väglett lärande krävs endast kännedom om vilka typer av objekt som finns i varje bild, vilket avsevärt minskar annoteringstiden. I Artikel A och B utformar vi två målfunktioner som är anpassade för att bättre segmentera objekt av intresse, i synnerhet deras konturer.

I nästa del hanterar vi en oönskad effekt som kan uppstå under datainsamlingen. Vissa typer av klasser förekommer naturligt oftare än andra, vilket leder till att det blir en obalans av mängden data emellan olika klasser. En modell som är tränad på en sådan datamängd kan bli partisk mot de klasser som förekommer oftare. Om vissa klasser är sällsynta kan det ta väldigt lång tid att samla in tillräckligt mycket data för att överkomma den effekten. För att motverka effekten i bildklassificering utvecklar vi en ensemblemetod i Artikel C som är opartisk, trots att den är tränad på högt obalanserade data.

För att maskininlärningsmodeller ska vara användbara inom autonoma system är det fördelaktigt om de på ett pålitligt sätt kan skatta sin osäkerhet. Detta är särskilt viktigt vid begränsad träningsdata, eftersom sannolikheten ökar för att okända situationer uppstår som modellen inte har sett under träning. I korthet bör en modell som gör en skattning med en viss säkerhet vara korrekt med motsvarande sannolikhet. Detta är inte fallet generellt för maskininlärningsmodeller, utan de har en tendens att vara överdrivet självsäkra. Vi tillämpar metoder för att förbättra osäkerhetsskattningen för klassificering i Artikel C och för regression i Artikel D.

I den sista delen av avhandlingen undersöker vi hur förstärkningsinlärning (eng. reinforcement learning) kan tillämpas för att lära en robot yttäckningsplanering, exempelvis för gräsklippning eller för att hitta försvunna personer. Under förstärkningsinlärning interagerar roboten i den tilltänkta miljön, och får belöningar baserat på hur väl den utför uppgiften. Initialt är dess handlingar slumpmässiga som sedan förbättras över tid. I många fall tar detta väldigt lång tid, vilket är problematiskt i verkliga miljöer då roboten behöver hållas i drift under hela träningsprocessen. Samtidigt är varierande träningsmiljöer viktiga för generalisering till nya miljöer, vilket är svårt att åstadkomma. Istället använder vi en simulerad miljö i Artikel E för att påskynda tränings-processen där vi utnyttjar slumpmässigt genererade miljöer. För att sedan förenkla övergången från simulering till verklighet finjusterar vi modellen i en semivirtuell inomhusmiljö i Artikel F.  

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2024. , p. 57
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2397
National Category
Robotics
Identifiers
URN: urn:nbn:se:liu:diva-207606DOI: 10.3384/9789180757218ISBN: 9789180757201 (print)ISBN: 9789180757218 (electronic)OAI: oai:DiVA.org:liu-207606DiVA, id: diva2:1897561
Public defence
2024-10-11, Ada Lovelace, B-building, Campus Valla, Linköping, 10:15 (English)
Opponent
Supervisors
Available from: 2024-09-13 Created: 2024-09-13 Last updated: 2024-10-02Bibliographically approved
List of papers
1. IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION
Open this publication in new window or tab >>IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION
2022 (English)In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE , 2022, p. 2639-2643Conference paper, Published paper (Refereed)
Abstract [en]

Classification networks can be used to localize and segment objects in images by means of class activation maps (CAMs). However, without pixel-level annotations, classification networks are known to (1) mainly focus on discriminative regions, and (2) to produce diffuse CAMs without well-defined prediction contours. In this work, we approach both problems with two contributions for improving CAM learning. First, we incorporate importance sampling based on the class-wise probability mass function induced by the CAMs to produce stochastic image-level class predictions. This results in CAMs which activate over a larger extent of objects. Second, we formulate a feature similarity loss term which aims to match the prediction contours with edges in the image. As a third contribution, we conduct experiments on the PASCAL VOC 2012 benchmark dataset to demonstrate that these modifications significantly increase the performance in terms of contour accuracy, while being comparable to current state-of-the-art methods in terms of region similarity.

Place, publisher, year, edition, pages
IEEE, 2022
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords
weakly supervised; semantic segmentation; importance sampling; feature similarity; class activation maps
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-190963 (URN)10.1109/ICASSP43922.2022.9746641 (DOI)000864187902183 ()9781665405409 (ISBN)9781665405416 (ISBN)
Conference
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, SINGAPORE, may 22-27, 2022
Note

Funding Agencies|Wallenberg AI, Autonomous Systems and Software Program (WASP) - KAW foundation; SNIC - VR [2018-05973]

Available from: 2023-01-10 Created: 2023-01-10 Last updated: 2024-09-13
2. High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
Open this publication in new window or tab >>High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
2024 (English)In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 999-1008Conference paper, Published paper (Refereed)
Abstract [en]

Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset further show that our proposed add-on is well-suited for large-scale settings. Our code implementation is available at https://github.com/arvijj/hfpl.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
weakly supervised, semantic segmentation, importance sampling, feature similarity, class activation maps
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-202446 (URN)10.1109/WACV57701.2024.00105 (DOI)
Conference
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, jan 3-8, 2024
Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2024-09-13Bibliographically approved
3. Balanced Product of Calibrated Experts for Long-Tailed Recognition
Open this publication in new window or tab >>Balanced Product of Calibrated Experts for Long-Tailed Recognition
2023 (English)In: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2023, p. 19967-19977Conference paper, Published paper (Refereed)
Abstract [en]

Many real-world recognition problems are characterized by long-tailed label distributions. These distributions make representation learning highly challenging due to limited generalization over the tail classes. If the test distribution differs from the training distribution, e.g. uniform versus long-tailed, the problem of the distribution shift needs to be addressed. A recent line of work proposes learning multiple diverse experts to tackle this issue. Ensemble diversity is encouraged by various techniques, e.g. by specializing different experts in the head and the tail classes. In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE). BalPoE combines a family of experts with different test-time target distributions, generalizing several previous approaches. We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions, by proving that the ensemble is Fisher-consistent for minimizing the balanced error. Our theoretical analysis shows that our balanced ensemble requires calibrated experts, which we achieve in practice using mixup. We conduct extensive experiments and our method obtains new state-of-the-art results on three long-tailed datasets: CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018. Our code is available at https://github.com/emasa/BalPoE-CalibratedLT.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2023
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-199347 (URN)10.1109/CVPR52729.2023.01912 (DOI)001062531304028 ()9798350301298 (ISBN)9798350301304 (ISBN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, CANADA, jun 17-24, 2023
Note

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Available from: 2023-11-28 Created: 2023-11-28 Last updated: 2024-09-13
4. Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
Open this publication in new window or tab >>Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
2024 (English)In: Proceedings of the 41st International Conference on Machine Learning / [ed] Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, Felix Berkenkamp, PMLR , 2024, p. 22491-22508Conference paper, Published paper (Refereed)
Abstract [en]

Coverage path planning (CPP) is the problem of finding a path that covers the entire free space of a confined area, with applications ranging from robotic lawn mowing to search-and-rescue. When the environment is unknown, the path needs to be planned online while mapping the environment, which cannot be addressed by offline planning methods that do not allow for a flexible path space. We investigate how suitable reinforcement learning is for this challenging problem, and analyze the involved components required to efficiently learn coverage paths, such as action space, input feature representation, neural network architecture, and reward function. We propose a computationally feasible egocentric map representation based on frontiers, and a novel reward term based on total variation to promote complete coverage. Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations.

Place, publisher, year, edition, pages
PMLR, 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 235
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-207087 (URN)
Conference
International Conference on Machine Learning, 21-27 July 2024, Vienna, Austria
Note

Funding agencies: y the Wallenberg AI, Autonomous Systems and Software Program (WASP), fundedby the Knut and Alice Wallenberg (KAW) Foundation;  the Vinnova project, human centered autonomous regional airport, Dnr 2022-02678. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, provided by the KAW Foundation at the National Supercomputer Centre (NSC). 

Available from: 2024-08-30 Created: 2024-08-30 Last updated: 2024-09-13
5. Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks
Open this publication in new window or tab >>Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks
Show others...
2024 (English)In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE , 2024, Vol. abs/1803.04765, p. 3471-3480Conference paper, Published paper (Refereed)
Abstract [en]

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein – a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.

Place, publisher, year, edition, pages
IEEE, 2024
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-208088 (URN)10.1109/cvprw63382.2024.00351 (DOI)9798350365474 (ISBN)9798350365481 (ISBN)
Conference
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17-18 June 2024
Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2024-10-02

Open Access in DiVA

fulltext(9309 kB)105 downloads
File information
File name FULLTEXT01.pdfFile size 9309 kBChecksum SHA-512
bdd28a1dbe4f15245c944ce0561276e755dced1641ab70ab084ce1e1d5ef249501bc54a5ba6803d3a38485fca645b9df8e42ca07d31273a7fd2cb461a96d3437
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Authority records

Jonnarth, Arvi

Search in DiVA

By author/editor
Jonnarth, Arvi
By organisation
Computer VisionFaculty of Science & Engineering
Robotics

Search outside of DiVA

GoogleGoogle Scholar
Total: 105 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 462 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf