liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evidential Deep Learning for Class-Incremental Semantic Segmentation
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-8677-8715
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Office of the National Police Commissioner, The Swedish Police Authority, Sweden.ORCID iD: 0000-0001-5094-5844
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6096-3648
2023 (English)In: Image Analysis. SCIA 2023. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, p. 32-48Conference paper, Published paper (Refereed)
Abstract [en]

Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. Although the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges, since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exists a background-shift.

In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC and ADE20k benchmarks show that our method is superior to the state of the art, especially when repeatedly learning new classes with increasing number of increments.

Place, publisher, year, edition, pages
Springer, 2023. p. 32-48
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13886
Keywords [en]
Class-incremental learning, Continual-learning, Semantic Segmentation
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-193265DOI: 10.1007/978-3-031-31438-4_3ISBN: 9783031314377 (print)ISBN: 9783031314384 (electronic)OAI: oai:DiVA.org:liu-193265DiVA, id: diva2:1753366
Conference
SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023
Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2024-04-27Bibliographically approved
In thesis
1. Data-Driven Robot Perception in the Wild
Open this publication in new window or tab >>Data-Driven Robot Perception in the Wild
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

As technology continues to advance, the interest in the relief of humans from tedious or dangerous tasks through automation increases. Some of the tasks that have received increasing attention are autonomous driving, disaster relief, and forestry inspection. Developing and deploying an autonomous robotic system to this type of unconstrained environments —in a safe way— is highly challenging. The system requires precise control and high-level decision making. Both of which require a robust and reliable perception system to understand the surroundings correctly. 

The main purpose of perception is to extract meaningful information from the environment, be it in the form of 3D maps, dense classification of the type of object and surfaces, or high-level information about the position and direction of moving objects. Depending on the limitations and application of the system, various types of sensors can be used: lidars, to collect sparse depth information; cameras, to collect dense information for different parts of the visual spectra, of-ten the red-green-blue (RGB) bands; Inertial Measurements Units (IMUs), to estimate the ego motion; microphones, to interact and respond to humans; GPS receivers, to get global position information; just to mention a few. 

This thesis investigates some of the necessities to approach the requirements of this type of system. Specifically, focusing on data-driven approaches, that is, machine learning, which has been shown time and again to be the main competitor for high-performance perception tasks in recent years. Although precision requirements might be high in industrial production plants, the environment is relatively controlled and the task is fixed. Instead, this thesis is studying some of the aspects necessary for complex, unconstrained environments, primarily outdoors and potentially near humans or other systems. The term in the wild refers exactly to the unconstrained nature of these environments, where the system can easily encounter something previously unseen and where the system might interact with unknowing humans. Some examples of environments are: city traffic, disaster relief scenarios, and dense forests. 

This thesis will mainly focus on the following three key aspects necessary to handle the types of tasks and situations that could occur in the wild: 1) generalizing to a new environment, 2) adapting to new tasks and requirements, and 3) modeling uncertainty in the perception system. 

First, a robotic system should be able to generalize to new environments and still function reliably. Papers B and G address this by using an intermediate representation to allow the system to handle much more diverse types of environment than otherwise possible. Paper B also investigates how robust the proposed autonomous driving system was to incorrect predictions, which is one of the likely results of changing the environment. 

Second, a robot should be sufficiently adaptive to allow it to learn new tasks without forgetting the previous ones. Paper E proposed a way to allow incrementally adding new semantic classes to a trained model without access to the previous training data. The approach is based on utilizing the uncertainty in the predictions to model the unknown classes, marked as background. 

Finally, the perception system will always be partially flawed, either because of the lack of modeling capabilities or because of ambiguities in the sensor data. To properly take this into account, it is fundamental that the system has the ability to estimate the certainty in the predictions. Paper F proposed a method for predicting the uncertainty in the model predictions when interpolating sparse data. Paper G addresses the ambiguities that exist when estimating the 3D pose of a human from a single camera image. 

Abstract [sv]

Allt eftersom tekniken utvecklas ökar intresset av att underlätta för människan genom att automatisera vissa farliga eller slitsamma uppgifter. Några av de områden som har potential för att automatisera är: transporter, genom självkörande bilar; räddningsarbete i samband med katastrofer; samt inventering av skog och liknande. Den här typen av komplicerade och potentiellt farliga miljöer kräver avancerade beslutssystem samt precisa kontrollsystem. Båda dessa delar kräver en robust och tillförlitlig perception av omgivningen.

Perceptionens huvudsyfte är att extrahera meningsfull information från omgivning som kan underlätta för planering och utförande av olika typer av uppgifter. Informationen som sådan kan vara i form av 3D kartor, detaljerad information om typ av underlag samt information om enstaka objekt i form av deras position samt rörelser. Ett autonomt system kan vara konstruerat på flera sätt men några av de vanliga sensorerna som används är: lidar, för att samla in glesa 3D mätningar om underlag och hinder; kameror, för att samla in färg- eller temperaturinformation från objekt i omgivningen; IMU, för att skatta hur systemet förflyttar sig; samt GPS för att kunna positionera systemet utomhus i ett globalt koordinatsystem.

Den här avhandlingen undersöker en del av de komponenter som krävs för att uppfylla de krav på perception som finns. Fokuset i avhandlingen är på maskininlärning, vilket har påvisats kunna hantera många avancerade uppgifter på ett robust sätt. Avhandlingen fokuserar inte på de högprecisionskrav vilka finns inom industriell tillverkningsindustri, utan fokuset är på att kunna hantera de komplicerade och utmanande miljöerna som klassas som in the wild. Några exempel på den här typen av miljöer är: stadstrafik, katastrofområden, samt täta skogar.

Tre aspekter av problemet avhandlas i den här avhandlingen: 1) generaliserande till andra miljöer, 2) anpassning till nya uppgifter samt miljöer, och 3) modellera eventuella osäkerheter.

Ett autonomt system ska helst inte vara begränsad till en typ av miljö, till exempel ska inte en självkörande bil bara kunna hantera skinande sol på motorvägar i bra skick. Artikel B och G adresserar detta till viss del genom att separera uppgiften i två delproblem, där den första genererar input data till den andra delen. Träningsdatan för delproblem ett är lättare att samla från varierande miljöer, vilket gör den mer generell än om all enbart träningsdata för hela problem är tillgängligt. Artikel B undersöker även hur felkällor i den här representationen påverkar systemet som helhet.

Ett autonomt system bör även vara designat för att kunna anpassas till nya uppgifter på ett effektivt sätt. Artikel E undersökte det här problemet från perspektivet att kunna utöka den mängd av kända klasser som systemet känner till, utan att träna om det helt och hållet.

Slutligen behöver man acceptera att perceptionen aldrig kommer kunna bli perfekt i alla typer av miljöer utan det kommer alltid finnas viss osäkerhet. Den här osäkerheten kan dels komma från modellen som sådan, men det är också möjligt att sensor data inte räcker till för att kunna avgöra vilken av flera möjligheter som är den sanna. Artikel F designade ett system för att kunna skatta osäkerheten i dess estimat medan artikel G fokuserar på hur man kan hantera osäkerheten kring hur en människa står om en del av kroppen är skymd.  

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2023. p. 45
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2293
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-192087 (URN)10.3384/9789180750677 (DOI)9789180750660 (ISBN)9789180750677 (ISBN)
Public defence
2023-03-31, Ada Lovelace, B-building and online via: https://liu-se.zoom.us/j/63470801417, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: the European Union's Horizon 2020 Program; Sweden´s Innovation Agency (Vinnova); the Swedish Research Council (VR); and the Swedish Foundation for Strategic Research (SSF).

Available from: 2023-03-01 Created: 2023-03-01 Last updated: 2023-04-26Bibliographically approved

Open Access in DiVA

fulltext(2815 kB)19 downloads
File information
File name FULLTEXT01.pdfFile size 2815 kBChecksum SHA-512
d169b45d217707267d85ed4c9176cfb9f8ee265d8b6acce3fc814a770c246364a08a32dd1bdea83ca96f4561ceeaee12ebdcf55d8705b881bfa08695669246a3
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Holmquist, KarlKlasén, LenaFelsberg, Michael

Search in DiVA

By author/editor
Holmquist, KarlKlasén, LenaFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 134 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf