liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Publications (6 of 6) Show all publications
Holmquist, K. (2023). Data-Driven Robot Perception in the Wild. (Doctoral dissertation). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Data-Driven Robot Perception in the Wild
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

As technology continues to advance, the interest in the relief of humans from tedious or dangerous tasks through automation increases. Some of the tasks that have received increasing attention are autonomous driving, disaster relief, and forestry inspection. Developing and deploying an autonomous robotic system to this type of unconstrained environments —in a safe way— is highly challenging. The system requires precise control and high-level decision making. Both of which require a robust and reliable perception system to understand the surroundings correctly. 

The main purpose of perception is to extract meaningful information from the environment, be it in the form of 3D maps, dense classification of the type of object and surfaces, or high-level information about the position and direction of moving objects. Depending on the limitations and application of the system, various types of sensors can be used: lidars, to collect sparse depth information; cameras, to collect dense information for different parts of the visual spectra, of-ten the red-green-blue (RGB) bands; Inertial Measurements Units (IMUs), to estimate the ego motion; microphones, to interact and respond to humans; GPS receivers, to get global position information; just to mention a few. 

This thesis investigates some of the necessities to approach the requirements of this type of system. Specifically, focusing on data-driven approaches, that is, machine learning, which has been shown time and again to be the main competitor for high-performance perception tasks in recent years. Although precision requirements might be high in industrial production plants, the environment is relatively controlled and the task is fixed. Instead, this thesis is studying some of the aspects necessary for complex, unconstrained environments, primarily outdoors and potentially near humans or other systems. The term in the wild refers exactly to the unconstrained nature of these environments, where the system can easily encounter something previously unseen and where the system might interact with unknowing humans. Some examples of environments are: city traffic, disaster relief scenarios, and dense forests. 

This thesis will mainly focus on the following three key aspects necessary to handle the types of tasks and situations that could occur in the wild: 1) generalizing to a new environment, 2) adapting to new tasks and requirements, and 3) modeling uncertainty in the perception system. 

First, a robotic system should be able to generalize to new environments and still function reliably. Papers B and G address this by using an intermediate representation to allow the system to handle much more diverse types of environment than otherwise possible. Paper B also investigates how robust the proposed autonomous driving system was to incorrect predictions, which is one of the likely results of changing the environment. 

Second, a robot should be sufficiently adaptive to allow it to learn new tasks without forgetting the previous ones. Paper E proposed a way to allow incrementally adding new semantic classes to a trained model without access to the previous training data. The approach is based on utilizing the uncertainty in the predictions to model the unknown classes, marked as background. 

Finally, the perception system will always be partially flawed, either because of the lack of modeling capabilities or because of ambiguities in the sensor data. To properly take this into account, it is fundamental that the system has the ability to estimate the certainty in the predictions. Paper F proposed a method for predicting the uncertainty in the model predictions when interpolating sparse data. Paper G addresses the ambiguities that exist when estimating the 3D pose of a human from a single camera image. 

Abstract [sv]

Allt eftersom tekniken utvecklas ökar intresset av att underlätta för människan genom att automatisera vissa farliga eller slitsamma uppgifter. Några av de områden som har potential för att automatisera är: transporter, genom självkörande bilar; räddningsarbete i samband med katastrofer; samt inventering av skog och liknande. Den här typen av komplicerade och potentiellt farliga miljöer kräver avancerade beslutssystem samt precisa kontrollsystem. Båda dessa delar kräver en robust och tillförlitlig perception av omgivningen.

Perceptionens huvudsyfte är att extrahera meningsfull information från omgivning som kan underlätta för planering och utförande av olika typer av uppgifter. Informationen som sådan kan vara i form av 3D kartor, detaljerad information om typ av underlag samt information om enstaka objekt i form av deras position samt rörelser. Ett autonomt system kan vara konstruerat på flera sätt men några av de vanliga sensorerna som används är: lidar, för att samla in glesa 3D mätningar om underlag och hinder; kameror, för att samla in färg- eller temperaturinformation från objekt i omgivningen; IMU, för att skatta hur systemet förflyttar sig; samt GPS för att kunna positionera systemet utomhus i ett globalt koordinatsystem.

Den här avhandlingen undersöker en del av de komponenter som krävs för att uppfylla de krav på perception som finns. Fokuset i avhandlingen är på maskininlärning, vilket har påvisats kunna hantera många avancerade uppgifter på ett robust sätt. Avhandlingen fokuserar inte på de högprecisionskrav vilka finns inom industriell tillverkningsindustri, utan fokuset är på att kunna hantera de komplicerade och utmanande miljöerna som klassas som in the wild. Några exempel på den här typen av miljöer är: stadstrafik, katastrofområden, samt täta skogar.

Tre aspekter av problemet avhandlas i den här avhandlingen: 1) generaliserande till andra miljöer, 2) anpassning till nya uppgifter samt miljöer, och 3) modellera eventuella osäkerheter.

Ett autonomt system ska helst inte vara begränsad till en typ av miljö, till exempel ska inte en självkörande bil bara kunna hantera skinande sol på motorvägar i bra skick. Artikel B och G adresserar detta till viss del genom att separera uppgiften i två delproblem, där den första genererar input data till den andra delen. Träningsdatan för delproblem ett är lättare att samla från varierande miljöer, vilket gör den mer generell än om all enbart träningsdata för hela problem är tillgängligt. Artikel B undersöker även hur felkällor i den här representationen påverkar systemet som helhet.

Ett autonomt system bör även vara designat för att kunna anpassas till nya uppgifter på ett effektivt sätt. Artikel E undersökte det här problemet från perspektivet att kunna utöka den mängd av kända klasser som systemet känner till, utan att träna om det helt och hållet.

Slutligen behöver man acceptera att perceptionen aldrig kommer kunna bli perfekt i alla typer av miljöer utan det kommer alltid finnas viss osäkerhet. Den här osäkerheten kan dels komma från modellen som sådan, men det är också möjligt att sensor data inte räcker till för att kunna avgöra vilken av flera möjligheter som är den sanna. Artikel F designade ett system för att kunna skatta osäkerheten i dess estimat medan artikel G fokuserar på hur man kan hantera osäkerheten kring hur en människa står om en del av kroppen är skymd.  

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2023. p. 45
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2293
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-192087 (URN)10.3384/9789180750677 (DOI)9789180750660 (ISBN)9789180750677 (ISBN)
Public defence
2023-03-31, Ada Lovelace, B-building and online via: https://liu-se.zoom.us/j/63470801417, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Note

Funding agencies: the European Union's Horizon 2020 Program; Sweden´s Innovation Agency (Vinnova); the Swedish Research Council (VR); and the Swedish Foundation for Strategic Research (SSF).

Available from: 2023-03-01 Created: 2023-03-01 Last updated: 2025-02-07Bibliographically approved
Holmquist, K. & Wandt, B. (2023). Diffpose: Multi-hypothesis human pose estimation using diffusion models. In: : . Paper presented at ICCV 2023, Paris, France, October 4-6, 2023..
Open this publication in new window or tab >>Diffpose: Multi-hypothesis human pose estimation using diffusion models
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Traditionally, monocular 3D human pose estimation employs a machine learning model to predict the most likely 3D pose for a given input image. However, a single image can be highly ambiguous and induces multiple plausible solutions for the 2D-3D lifting step, which results in overly confident 3D pose predictors. To this end, we propose DiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. Compared to similar approaches, our diffusion model is straightforward and avoids intensive hyperparameter tuning, complex network structures, mode collapse, and unstable training. Moreover, we tackle the problem of over-simplification of the intermediate representation of the common two-step approaches which first estimate a distribution of 2D joint locations via joint-wise heatmaps and consecutively use their maximum argument for the 3D pose estimation step. Since such a simplification of the heatmaps removes valid information about possibly correct, though labeled unlikely, joint locations, we propose to represent the heatmaps as a set of 2D joint candidate samples. To extract information about the original distribution from these samples, we introduce our embedding transformer which conditions the diffusion model. Experimentally, we show that DiffPose improves upon the state of the art for multi-hypothesis pose estimation by 3-5% for simple poses and outperforms it by a large margin for highly ambiguous poses.

National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-198612 (URN)
Conference
ICCV 2023, Paris, France, October 4-6, 2023.
Available from: 2023-10-20 Created: 2023-10-20 Last updated: 2025-02-07Bibliographically approved
Holmquist, K., Klasén, L. & Felsberg, M. (2023). Evidential Deep Learning for Class-Incremental Semantic Segmentation. In: Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen (Ed.), Image Analysis. SCIA 2023.: . Paper presented at SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023 (pp. 32-48). Springer
Open this publication in new window or tab >>Evidential Deep Learning for Class-Incremental Semantic Segmentation
2023 (English)In: Image Analysis. SCIA 2023. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, p. 32-48Conference paper, Published paper (Refereed)
Abstract [en]

Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. Although the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges, since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exists a background-shift.

In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC and ADE20k benchmarks show that our method is superior to the state of the art, especially when repeatedly learning new classes with increasing number of increments.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13886
Keywords
Class-incremental learning, Continual-learning, Semantic Segmentation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-193265 (URN)10.1007/978-3-031-31438-4_3 (DOI)001592157300003 ()2-s2.0-85161371821 (Scopus ID)9783031314377 (ISBN)9783031314384 (ISBN)
Conference
SCIA 2023, 23rd Scandinavian Conference on Image Analysis. Sirkka, Finland, April 18–21, 2023
Note

Funding Agencies|Sweden's Innovation Agency (Vinnova)

Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2026-02-05Bibliographically approved
Holmquist, K., Klasén, L. & Felsberg, M. (2021). Class-Incremental Learning for Semantic Segmentation - A study. In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS): . Paper presented at 2021 Swedish Artificial Intelligence Society Workshop (SAIS), 14-15 June 2021, Sweden (pp. 25-28). IEEE
Open this publication in new window or tab >>Class-Incremental Learning for Semantic Segmentation - A study
2021 (English)In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS), IEEE , 2021, p. 25-28Conference paper, Published paper (Refereed)
Abstract [en]

One of the main challenges of applying deep learning for robotics is the difficulty of efficiently adapting to new tasks while still maintaining the same performance on previous tasks. The problem of incrementally learning new tasks commonly struggles with catastrophic forgetting in which the previous knowledge is lost.Class-incremental learning for semantic segmentation, addresses this problem in which we want to learn new semantic classes without having access to labeled data for previously learned classes. This is a problem in industry, where few pre-trained models and open datasets matches exactly the requisites. In these cases it is both expensive and labour intensive to collect an entirely new fully-labeled dataset. Instead, collecting a smaller dataset and only labeling the new classes is much more efficient in terms of data collection.In this paper we present the class-incremental learning problem for semantic segmentation, we discuss related work in terms of the more thoroughly studied classification task and experimentally validate the current state-of-the-art for semantic segmentation. This lays the foundation as we discuss some of the problems that still needs to be investigated and improved upon in order to reach a new state-of-the-art for class-incremental semantic segmentation.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Industries, Deep learning, Conferences, Semantics, Labeling, Task analysis, Artificial intelligence
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-189039 (URN)10.1109/sais53221.2021.9483955 (DOI)000855522600007 ()9781665442367 (ISBN)9781665442374 (ISBN)
Conference
2021 Swedish Artificial Intelligence Society Workshop (SAIS), 14-15 June 2021, Sweden
Funder
Vinnova
Note

Funding agencies: Vinnova [2020-02838]

Available from: 2022-10-08 Created: 2022-10-08 Last updated: 2023-03-01Bibliographically approved
Eldesokey, A., Felsberg, M., Holmquist, K. & Persson, M. (2020). Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR): . Paper presented at 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12011-12020). IEEE
Open this publication in new window or tab >>Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End
2020 (English)In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 12011-12020Conference paper, Published paper (Refereed)
Abstract [en]

The focus in deep learning research has been mostly to push the limits of prediction accuracy. However, this was often achieved at the cost of increased complexity, raising concerns about the interpretability and the reliability of deep networks. Recently, an increasing attention has been given to untangling the complexity of deep networks and quantifying their uncertainty for different computer vision tasks. Differently, the task of depth completion has not received enough attention despite the inherent noisy nature of depth sensors. In this work, we thus focus on modeling the uncertainty of depth data in depth completion starting from the sparse noisy input all the way to the final prediction. We propose a novel approach to identify disturbed measurements in the input by learning an input confidence estimator in a self-supervised manner based on the normalized convolutional neural networks (NCNNs). Further, we propose a probabilistic version of NCNNs that produces a statistically meaningful uncertainty measure for the final prediction. When we evaluate our approach on the KITTI dataset for depth completion, we outperform all the existing Bayesian Deep Learning approaches in terms of prediction accuracy, quality of the uncertainty measure, and the computational efficiency. Moreover, our small network with 670k parameters performs on-par with conventional approaches with millions of parameters. These results give strong evidence that separating the network into parallel uncertainty and prediction streams leads to state-of-the-art performance with accurate uncertainty estimates.

Place, publisher, year, edition, pages
IEEE, 2020
Series
Conference on Computer Vision and Pattern Recognition (CVPR), ISSN 1063-6919, E-ISSN 2575-7075
Keywords
Uncertainty, Task analysis, Probabilistic logic, Measurement uncertainty, Noise measurement, Convolution, Computer vision
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-169106 (URN)10.1109/CVPR42600.2020.01203 (DOI)001309199904086 ()978-1-7281-7168-5 (ISBN)978-1-7281-7169-2 (ISBN)
Conference
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Available from: 2020-09-09 Created: 2020-09-09 Last updated: 2025-02-07
Holmquist, K., Senel, D. & Felsberg, M. (2018). Computing a Collision-Free Path using the monogenic scale space. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): . Paper presented at IROS 2018, Madrid, Spain, October 1-5, 2018 (pp. 8097-8102). IEEE
Open this publication in new window or tab >>Computing a Collision-Free Path using the monogenic scale space
2018 (English)In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, p. 8097-8102Conference paper, Published paper (Refereed)
Abstract [en]

Mobile robots have been used for various purposes with different functionalities which require them to freely move in environments containing both static and dynamic obstacles to accomplish given tasks. One of the most relevant capabilities in terms of navigating a mobile robot in such an environment is to find a safe path to a goal position. This paper shows that there exists an accurate solution to the Laplace equation which allows finding a collision-free path and that it can be efficiently calculated for a rectangular bounded domain such as a map which is represented as an image. This is accomplished by the use of the monogenic scale space resulting in a vector field which describes the attracting and repelling forces from the obstacles and the goal. The method is shown to work in reasonably convex domains and by the use of tessellation of the environment map for non-convex environments.

Place, publisher, year, edition, pages
IEEE, 2018
Series
International Conference on Intelligent Robots and Systems (IROS), ISSN 2153-0858
National Category
Computer graphics and computer vision Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-152713 (URN)10.1109/IROS.2018.8593583 (DOI)000458872707044 ()978-1-5386-8094-0 (ISBN)978-1-5386-8095-7 (ISBN)978-1-5386-8093-3 (ISBN)
Conference
IROS 2018, Madrid, Spain, October 1-5, 2018
Note

Funding agencies:This work was founded by the European Union's Horizon 2020 Programme under grant agreement 644839 (CEN-TAURO).

Available from: 2018-11-16 Created: 2018-11-16 Last updated: 2025-02-01
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8677-8715

Search in DiVA

Show all publications