liu.seSearch for publications in DiVA
Change search
Refine search result
12345 1 - 50 of 213
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Sanchez Aimar, Emanuel
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Jonnarth, Arvi
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Medicine and Health Sciences. Husqvarna Grp, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Univ KwaZulu Natal, South Africa.
    Kuhlmann, Marco
    Linköping University, Department of Computer and Information Science, Artificial Intelligence and Integrated Computer Systems. Linköping University, Faculty of Science & Engineering.
    Balanced Product of Calibrated Experts for Long-Tailed Recognition2023In: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2023, p. 19967-19977Conference paper (Refereed)
    Abstract [en]

    Many real-world recognition problems are characterized by long-tailed label distributions. These distributions make representation learning highly challenging due to limited generalization over the tail classes. If the test distribution differs from the training distribution, e.g. uniform versus long-tailed, the problem of the distribution shift needs to be addressed. A recent line of work proposes learning multiple diverse experts to tackle this issue. Ensemble diversity is encouraged by various techniques, e.g. by specializing different experts in the head and the tail classes. In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE). BalPoE combines a family of experts with different test-time target distributions, generalizing several previous approaches. We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions, by proving that the ensemble is Fisher-consistent for minimizing the balanced error. Our theoretical analysis shows that our balanced ensemble requires calibrated experts, which we achieve in practice using mixup. We conduct extensive experiments and our method obtains new state-of-the-art results on three long-tailed datasets: CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018. Our code is available at https://github.com/emasa/BalPoE-CalibratedLT.

  • 2.
    Edstedt, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Athanasiadis, Ioannis
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Wadenbäck, Mårten
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    DKM: Dense Kernelized Feature Matching for Geometry Estimation2023In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Communications Society, 2023, p. 17765-17775Conference paper (Refereed)
    Abstract [en]

    Feature matching is a challenging computer vision task that involves finding correspondences between two images of a 3D scene. In this paper we consider the dense approach instead of the more common sparse paradigm, thus striving to find all correspondences. Perhaps counter-intuitively, dense methods have previously shown inferior performance to their sparse and semi-sparse counterparts for estimation of two-view geometry. This changes with our novel dense method, which outperforms both dense and sparse methods on geometry estimation. The novelty is threefold: First, we propose a kernel regression global matcher. Secondly, we propose warp refinement through stacked feature maps and depthwise convolution kernels. Thirdly, we propose learning dense confidence through consistent depth and a balanced sampling approach for dense confidence maps. Through extensive experiments we confirm that our proposed dense method, Dense Kernelized Feature Matching, sets a new state-of-the-art on multiple geometry estimation benchmarks. In particular, we achieve an improvement on MegaDepth-1500 of +4.9 and +8.9 AUC@5° compared to the best previous sparse method and dense method respectively. Our code is provided at the following repository: https://github.com/Parskatt/DKM.

  • 3.
    Holmquist, Karl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Klasén, Lena
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Office of the National Police Commissioner, The Swedish Police Authority, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Evidential Deep Learning for Class-Incremental Semantic Segmentation2023In: Image Analysis. SCIA 2023. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, p. 32-48Conference paper (Refereed)
    Abstract [en]

    Class-Incremental Learning is a challenging problem in machine learning that aims to extend previously trained neural networks with new classes. This is especially useful if the system is able to classify new objects despite the original training data being unavailable. Although the semantic segmentation problem has received less attention than classification, it poses distinct problems and challenges, since previous and future target classes can be unlabeled in the images of a single increment. In this case, the background, past and future classes are correlated and there exists a background-shift.

    In this paper, we address the problem of how to model unlabeled classes while avoiding spurious feature clustering of future uncorrelated classes. We propose to use Evidential Deep Learning to model the evidence of the classes as a Dirichlet distribution. Our method factorizes the problem into a separate foreground class probability, calculated by the expected value of the Dirichlet distribution, and an unknown class (background) probability corresponding to the uncertainty of the estimate. In our novel formulation, the background probability is implicitly modeled, avoiding the feature space clustering that comes from forcing the model to output a high background score for pixels that are not labeled as objects. Experiments on the incremental Pascal VOC and ADE20k benchmarks show that our method is superior to the state of the art, especially when repeatedly learning new classes with increasing number of increments.

    The full text will be freely available from 2024-04-27 20:26
  • 4.
    Zhang, Yushan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Robinson, Andreas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Magnusson, Maria
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Leveraging Optical Flow Features for Higher Generalization Power in Video Object Segmentation2023In: 2023 IEEEInternational Conferenceon Image Processing: Proceedings, 2023, p. 326-330Conference paper (Refereed)
    Abstract [en]

    We propose to leverage optical flow features for higher generalization power in semi-supervised video object segmentation. Optical flow is usually exploited as additional guidance information in many computer vision tasks. However, its relevance in video object segmentation was mainly in unsupervised settings or using the optical flow to warp or refine the previously predicted masks. Different from the latter, we propose to directly leverage the optical flow features in the target representation. We show that this enriched representation improves the encoder-decoder approach to the segmentation task. A model to extract the combined information from the optical flow and the image is proposed, which is then used as input to the target model and the decoder network. Unlike previous methods, e.g. in tracking where concatenation is used to integrate information from image data and optical flow, a simple yet effective attention mechanism is exploited in our work. Experiments on DAVIS 2017 and YouTube-VOS 2019 show that integrating the information extracted from optical flow into the original image branch results in a strong performance gain, especially in unseen classes which demonstrates its higher generalization power.

  • 5.
    Ljungbergh, William
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Gothenburg, Sweden.
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Gothenburg, Sweden.
    Petersson, Christoffer
    Zenseact, Gothenburg, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Raw or Cooked?: Object Detection on RAW Images2023In: Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I. / [ed] Rikke Gade, Michael Felsberg, Joni-Kristian Kämäräinen, Springer, 2023, Vol. 13885, p. 374-385Conference paper (Refereed)
    Abstract [en]

    Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

  • 6.
    Brissman, Emil
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Saab, Linkoping, Sweden.
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Sweden.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Univ KwaZulu Natal, South Africa.
    Recurrent Graph Neural Networks for Video Instance Segmentation2023In: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 131, p. 471-495Article in journal (Refereed)
    Abstract [en]

    Video instance segmentation is one of the core problems in computer vision. Formulating a purely learning-based method, which models the generic track management required to solve the video instance segmentation task, is a highly challenging problem. In this work, we propose a novel learning framework where the entire video instance segmentation problem is modeled jointly. To this end, we design a graph neural network that in each frame jointly processes all detections and a memory of previously seen tracks. Past information is considered and processed via a recurrent connection. We demonstrate the effectiveness of the proposed approach in comprehensive experiments. Our approach operates online at over 25 FPS and obtains 16.3 AP on the challenging OVIS benchmark, setting a new state-of-the-art. We further conduct detailed ablative experiments that validate the different aspects of our approach. Code is available at https://github.com/emibr948/RGNNVIS-PlusPlus.

    Download full text (pdf)
    fulltext
  • 7.
    Javed, Sajid
    et al.
    Khalifa Univ Sci & Technol, U Arab Emirates.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. MBZUAI, U Arab Emirates.
    Khan, Muhammad Haris
    MBZUAI, U Arab Emirates.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Matas, Jiri
    Czech Tech Univ, Czech Republic.
    Visual Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook2023In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, no 5, p. 6552-6574Article in journal (Refereed)
    Abstract [en]

    Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.

    Download full text (pdf)
    fulltext
  • 8.
    Johnander, Joakim
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact AB, Sweden.
    Edstedt, Johan
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed bin Zayed Univ AI, U Arab Emirates.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Dense Gaussian Processes for Few-Shot Segmentation2022In: COMPUTER VISION, ECCV 2022, PT XXIX, SPRINGER INTERNATIONAL PUBLISHING AG , 2022, Vol. 13689, p. 217-234Conference paper (Refereed)
    Abstract [en]

    Few-shot segmentation is a challenging dense prediction task, which entails segmenting a novel query image given only a small annotated support set. The key problem is thus to design a method that aggregates detailed information from the support set, while being robust to large variations in appearance and context. To this end, we propose a few-shot segmentation method based on dense Gaussian process (GP) regression. Given the support set, our dense GP learns the mapping from local deep image features to mask values, capable of capturing complex appearance distributions. Furthermore, it provides a principled means of capturing uncertainty, which serves as another powerful cue for the final segmentation, obtained by a CNN decoder. Instead of a one-dimensional mask output, we further exploit the end-to-end learning capabilities of our approach to learn a high-dimensional output space for the GP. Our approach sets a new state-of-the-art on the PASCAL-5(i) and COCO-20(i) benchmarks, achieving an absolute gain of +8.4 mIoU in the COCO-20(i) 5-shot setting. Furthermore, the segmentation quality of our approach scales gracefully when increasing the support set size, while achieving robust cross-dataset transfer.

  • 9.
    Bhunia, Ankan Kumar
    et al.
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Khan, Salman
    Mohamed bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
    Cholakkal, Hisham
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Anwer, Rao Muhammad
    Mohamed bin Zayed Univ AI, U Arab Emirates; Aalto Univ, Finland.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed bin Zayed Univ AI, U Arab Emirates.
    Laaksonen, Jorma
    Aalto Univ, Finland.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    DoodleFormer: Creative Sketch Drawing with Transformers2022In: COMPUTER VISION - ECCV 2022, PT XVII, SPRINGER INTERNATIONAL PUBLISHING AG , 2022, Vol. 13677, p. 338-355Conference paper (Refereed)
    Abstract [en]

    Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frechet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ ankanbhunia/doodleformer.

  • 10.
    Jonnarth, Arvi
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Husqvarna Grp, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Univ KwaZulu Natal, South Africa.
    IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION2022In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE , 2022, p. 2639-2643Conference paper (Refereed)
    Abstract [en]

    Classification networks can be used to localize and segment objects in images by means of class activation maps (CAMs). However, without pixel-level annotations, classification networks are known to (1) mainly focus on discriminative regions, and (2) to produce diffuse CAMs without well-defined prediction contours. In this work, we approach both problems with two contributions for improving CAM learning. First, we incorporate importance sampling based on the class-wise probability mass function induced by the CAMs to produce stochastic image-level class predictions. This results in CAMs which activate over a larger extent of objects. Second, we formulate a feature similarity loss term which aims to match the prediction contours with edges in the image. As a third contribution, we conduct experiments on the PASCAL VOC 2012 benchmark dataset to demonstrate that these modifications significantly increase the performance in terms of contour accuracy, while being comparable to current state-of-the-art methods in terms of region similarity.

  • 11.
    Stromann, Oliver
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Scania CV AB, Sweden.
    Razavi, Alireza
    Scania CV AB, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    LEARNING TO INTEGRATE VISION DATA INTO ROAD NETWORK DATA2022In: 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE , 2022, p. 4548-4552Conference paper (Refereed)
    Abstract [en]

    Road networks are the core infrastructure for connected and autonomous vehicles, but creating meaningful representations for machine learning applications is a challenging task. In this work, we propose to integrate remote sensing vision data into road network data for improved embeddings with graph neural networks. We present a segmentation of road edges based on spatio-temporal road and traffic characteristics, which allows enriching the attribute set of road networks with visual features of satellite imagery and digital surface models. We show that both, the segmentation and the integration of vision data can increase performance on a road type classification task, and we achieve state-of-the-art performance on the OSM+DiDi Chuxing dataset on Chengdu, China.

  • 12.
    Melnyk, Pavlo
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Wadenbäck, Mårten
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Steerable 3D Spherical Neurons2022In: Proceedings of the 39th International Conference on Machine Learning / [ed] Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato, PMLR , 2022, Vol. 162, p. 15330-15339Conference paper (Refereed)
    Abstract [en]

    Emerging from low-level vision theory, steerable filters found their counterpart in prior work on steerable convolutional neural networks equivariant to rigid transformations. In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. Such spherical neurons are obtained by conformal embedding of Euclidean space and have recently been revisited in the context of learning representations of point sets. Focusing on 3D geometry, we exploit the isometry property of spherical neurons and derive a 3D steerability constraint. After training spherical neurons to classify point clouds in a canonical orientation, we use a tetrahedron basis to quadruplicate the neurons and construct rotation-equivariant spherical filter banks. We then apply the derived constraint to interpolate the filter bank outputs and, thus, obtain a rotation-invariant network. Finally, we use a synthetic point set and real-world 3D skeleton data to verify our theoretical findings. The code is available at https://github.com/pavlo-melnyk/steerable-3d-neurons.

  • 13.
    Thawakar, Omkar
    et al.
    MBZUAI, U Arab Emirates.
    Narayan, Sanath
    IIAI, U Arab Emirates.
    Cao, Jiale
    Tianjin Univ, Peoples R China.
    Cholakkal, Hisham
    MBZUAI, U Arab Emirates.
    Anwer, Rao Muhammad
    MBZUAI, U Arab Emirates.
    Khan, Muhammad Haris
    MBZUAI, U Arab Emirates.
    Khan, Salman
    MBZUAI, U Arab Emirates.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. MBZUAI, U Arab Emirates.
    Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer2022In: COMPUTER VISION, ECCV 2022, PT XXIX, SPRINGER INTERNATIONAL PUBLISHING AG , 2022, Vol. 13689, p. 666-681Conference paper (Refereed)
    Abstract [en]

    State-of-the-art transformer-based video instance segmentation (VIS) approaches typically utilize either single-scale spatio-temporal features or per-frame multi-scale features during the attention computations. We argue that such an attention computation ignores the multiscale spatio-temporal feature relationships that are crucial to tackle target appearance deformations in videos. To address this issue, we propose a transformer-based VIS framework, named MS-STS VIS, that comprises a novel multi-scale spatio-temporal split (MS-STS) attention module in the encoder. The proposed MS-STS module effectively captures spatio-temporal feature relationships at multiple scales across frames in a video. We further introduce an attention block in the decoder to enhance the temporal consistency of the detected instances in different frames of a video. Moreover, an auxiliary discriminator is introduced during training to ensure better foreground-background separability within the multiscale spatio-temporal feature space. We conduct extensive experiments on two benchmarks: Youtube-VIS (2019 and 2021). Our MS-STS VIS achieves state-of-the-art performance on both benchmarks. When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50.1%, outperforming the best reported results in literature by 2.7% and by 4.8% at higher overlap threshold of AP75, while being comparable in model size and speed on Youtube-VIS 2019 val. set. When using the Swin Transformer backbone, MS-STS VIS achieves mask AP of 61.0% on Youtube-VIS 2019 val. set.

  • 14.
    Edstedt, Johan
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Berg, Amanda
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Karlsson, Johan
    Statens Medierad, Sweden.
    Benavente, Francisca
    Statens Medierad, Sweden.
    Novak, Anette
    Statens Medierad, Sweden.
    Pihlgren, Gustav Grund
    Lulea Univ Technol, Sweden.
    VidHarm: A Clip Based Dataset for Harmful Content Detection2022In: 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE , 2022, p. 1543-1549Conference paper (Refereed)
    Abstract [en]

    Automatically identifying harmful content in video is an important task with a wide range of applications. However, there is a lack of professionally labeled open datasets available. In this work VidHarm, an open dataset of 3589 video clips from film trailers annotated by professionals, is presented. An analysis of the dataset is performed, revealing among other things the relation between clip and trailer level annotations. Audiovisual models are trained on the dataset and an in-depth study of modeling choices conducted. The results show that performance is greatly improved by combining the visual and audio modality, pre-training on large-scale video recognition datasets, and class balanced sampling. Lastly, biases of the trained models are investigated using discrimination probing. VidHarm is openly available, and further details are available at the webpage https://vidharm.github.io/

  • 15.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Visual tracking: Tracking in scenes containing multiple moving objects2022In: Advanced Methods and Deep Learning in Computer Vision / [ed] E. R. Davies, Matthew A. Turk, London: Elsevier, 2022, p. 305-336Chapter in book (Refereed)
  • 16.
    Gharaee, Zahra
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Holmquist, Karl
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    He, Linbo
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    A Bayesian Approach to Reinforcement Learning of Vision-Based Vehicular Control2021In: 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE COMPUTER SOC , 2021, p. 3947-3954Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a state-of-the-art reinforcement learning method for autonomous driving. Our approach employs temporal difference learning in a Bayesian framework to learn vehicle control signals from sensor data. The agent has access to images from a forward facing camera, which are pre-processed to generate semantic segmentation maps. We trained our system using both ground truth and estimated semantic segmentation input. Based on our observations from a large set of experiments, we conclude that training the system on ground truth input data leads to better performance than training the system on estimated input even if estimated input is used for evaluation. The system is trained and evaluated in a realistic simulated urban environment using the CARLA simulator. The simulator also contains a benchmark that allows for comparing to other systems and methods. The required training time of the system is shown to be lower and the performance on the benchmark superior to competing approaches.

  • 17.
    Holmquist, Karl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Klasén, Lena
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Office of the National Police Commissioner, The Swedish Police Authority.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. University of KwaZulu-Natal, Durban, South Africa.
    Class-Incremental Learning for Semantic Segmentation - A study2021In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS), IEEE , 2021, p. 25-28Conference paper (Refereed)
    Abstract [en]

    One of the main challenges of applying deep learning for robotics is the difficulty of efficiently adapting to new tasks while still maintaining the same performance on previous tasks. The problem of incrementally learning new tasks commonly struggles with catastrophic forgetting in which the previous knowledge is lost.Class-incremental learning for semantic segmentation, addresses this problem in which we want to learn new semantic classes without having access to labeled data for previously learned classes. This is a problem in industry, where few pre-trained models and open datasets matches exactly the requisites. In these cases it is both expensive and labour intensive to collect an entirely new fully-labeled dataset. Instead, collecting a smaller dataset and only labeling the new classes is much more efficient in terms of data collection.In this paper we present the class-incremental learning problem for semantic segmentation, we discuss related work in terms of the more thoroughly studied classification task and experimentally validate the current state-of-the-art for semantic segmentation. This lays the foundation as we discuss some of the problems that still needs to be investigated and improved upon in order to reach a new state-of-the-art for class-incremental semantic segmentation.

  • 18.
    Robinson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Eldesokey, Abdelrahman
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Distractor-aware video object segmentation2021Conference paper (Refereed)
    Abstract [en]

    Semi-supervised video object segmentation is a challenging task that aims to segment a target throughout a video sequence given an initial mask at the first frame. Discriminative approaches have demonstrated competitive performance on this task at a sensible complexity. These approaches typically formulate the problem as a one-versus-one classification between the target and the background. However, in reality, a video sequence usually encompasses a target, background, and possibly other distracting objects. Those objects increase the risk of introducing false positives, especially if they share visual similarities with the target. Therefore, it is more effective to separate distractors from the background, and handle them independently.

    We propose a one-versus-many scheme to address this situation by separating distractors into their own class. This separation allows imposing special attention to challenging regions that are most likely to degrade the performance. We demonstrate the prominence of this formulation by modifying the learning-what-to-learn method to be distractor-aware. Our proposed approach sets a new state-of-the-art on the DAVIS val dataset, and improves over the baseline on the DAVIS test-dev benchmark by 4.8 percent points.

  • 19.
    Melnyk, Pavlo
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Wadenbäck, Mårten
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Embed Me If You Can: A Geometric Perceptron2021In: Proceedings 2021 IEEE/CVF International Conference on Computer Vision ICCV 2021, Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 1256-1264Conference paper (Refereed)
    Abstract [en]

    Solving geometric tasks involving point clouds by using machine learning is a challenging problem. Standard feed-forward neural networks combine linear or, if the bias parameter is included, affine layers and activation functions. Their geometric modeling is limited, which motivated the prior work introducing the multilayer hypersphere perceptron (MLHP). Its constituent part, i.e., the hypersphere neuron, is obtained by applying a conformal embedding of Euclidean space. By virtue of Clifford algebra, it can be implemented as the Cartesian dot product of inputs and weights. If the embedding is applied in a manner consistent with the dimensionality of the input space geometry, the decision surfaces of the model units become combinations of hyperspheres and make the decision-making process geometrically interpretable for humans. Our extension of the MLHP model, the multilayer geometric perceptron (MLGP), and its respective layer units, i.e., geometric neurons, are consistent with the 3D geometry and provide a geometric handle of the learned coefficients. In particular, the geometric neuron activations are isometric in 3D, which is necessary for rotation and translation equivariance. When classifying the 3D Tetris shapes, we quantitatively show that our model requires no activation function in the hidden layers other than the embedding to outperform the vanilla multilayer perceptron. In the presence of noise in the data, our model is also superior to the MLHP.

  • 20.
    Gharaee, Zahra
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Kowshik, Shreyas
    Department of Mathematics, Indian Institute of Technology Kharagpur, India.
    Stromann, Oliver
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Autonomous Transport Solutions Research, Scania CV AB, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Graph representation learning for road type classification2021In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 120, article id 108174Article in journal (Refereed)
    Abstract [en]

    We present a novel learning-based approach to graph representations of road networks employing state-of-the-art graph convolutional neural networks. Our approach is applied to realistic road networks of 17 cities from Open Street Map. While edge features are crucial to generate descriptive graph representations of road networks, graph convolutional networks usually rely on node features only. We show that the highly representative edge features can still be integrated into such networks by applying a line graph transformation. We also propose a method for neighborhood sampling based on a topological neighborhood composed of both local and global neighbors. We compare the performance of learning representations using different types of neighborhood aggregation functions in transductive and inductive tasks and in supervised and unsupervised learning. Furthermore, we propose a novel aggregation approach, Graph Attention Isomorphism Network, GAIN. Our results show that GAIN outperforms state-of-the-art methods on the road type classification problem.

    Download full text (pdf)
    fulltext
  • 21.
    Eldesokey, Abdelrahman
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Normalized Convolution Upsampling for Refined Optical Flow Estimation2021In: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, SciTePress , 2021, Vol. 5, p. 742-752Conference paper (Refereed)
    Abstract [en]

    Optical flow is a regression task where convolutional neural networks (CNNs) have led to major breakthroughs. However, this comes at major computational demands due to the use of cost-volumes and pyramidal representations. This was mitigated by producing flow predictions at quarter the resolution, which are upsampled using bilinear interpolation during test time. Consequently, fine details are usually lost and post-processing is needed to restore them. We propose the Normalized Convolution UPsampler (NCUP), an efficient joint upsampling approach to produce the full-resolution flow during the training of optical flow CNNs. Our proposed approach formulates the upsampling task as a sparse problem and employs the normalized convolutional neural networks to solve it. We evaluate our upsampler against existing joint upsampling approaches when trained end-to-end with a a coarse-to-fine optical flow CNN (PWCNet) and we show that it outperforms all other approaches on the FlyingChairs dataset  while having at least one order fewer parameters. Moreover, we test our upsampler with a recurrent optical flow CNN (RAFT) and we achieve state-of-the-art results on Sintel benchmark with ∼ 6% error reduction, and on-par on the KITTI dataset, while having 7.5% fewer parameters (see Figure 1). Finally, our upsampler shows better generalization capabilities than RAFT when trained and evaluated on different datasets.

  • 22.
    Häger, Gustav
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Persson, Mikael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Predicting Disparity Distributions2021In: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021Conference paper (Refereed)
    Abstract [en]

    We investigate a novel deep-learning-based approach to estimate uncertainty in stereo disparity prediction networks. Current state-of-the-art methods often formulate disparity prediction as a regression problem with a single scalar output in each pixel. This can be problematic in practical applications as in many cases there might not exist a single well defined disparity, for example in cases of occlusions or at depth-boundaries. While current neural-network-based disparity estimation approaches  obtain good performance on benchmarks, the disparity prediction is treated as a black box at inference time. In this paper we show that by formulating the learning problem as a regression with a distribution target, we obtain a robust estimate of the uncertainty in each pixel, while maintaining the performance of the original method. The proposed method is evaluated both on a large-scale standard benchmark, as well on our own data. We also show that the uncertainty estimate significantly improves by maximizing the uncertainty in those pixels that have no well defined disparity during learning.

    Download full text (pdf)
    fulltext
  • 23.
    Brissman, Emil
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Predicting Signed Distance Functions for Visual Instance Segmentation2021In: 33rd Annual Workshop of the Swedish-Artificial-Intelligence-Society (SAIS), Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 5-10Conference paper (Refereed)
    Abstract [en]

    Visual instance segmentation is a challenging problem and becomes even more difficult if objects of interest varies unconstrained in shape. Some objects are well described by a rectangle, however, this is hardly always the case. Consider for instance long, slender objects such as ropes. Anchor-based approaches classify predefined bounding boxes as either negative or positive and thus provide a limited set of shapes that can be handled. Defining anchor-boxes that fit well to all possible shapes leads to an infeasible number of prior boxes. We explore a different approach and propose to train a neural network to compute distance maps along different directions. The network is trained at each pixel to predict the distance to the closest object contour in a given direction. By pooling the distance maps we obtain an approximation to the signed distance function (SDF). The SDF may then be thresholded in order to obtain a foreground-background segmentation. We compare this segmentation to foreground segmentations obtained from the state-of-the-art instance segmentation method YOLACT. On the COCO dataset, our segmentation yields a higher performance in terms of foreground intersection over union (IoU). However, while the distance maps contain information on the individual instances, it is not straightforward to map them to the full instance segmentation. We still believe that this idea is a promising research direction for instance segmentation, as it better captures the different shapes found in the real world.

    Download full text (pdf)
    fulltext
  • 24.
    Kristan, Matej
    et al.
    Univ Ljubljana, Slovenia.
    Matas, Jiri
    Czech Tech Univ, Czech Republic.
    Leonardis, Ales
    Univ Birmingham, England.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pflugfelder, Roman
    Austrian Inst Technol, Austria; TU Wien, Austria.
    Kamarainen, Joni-Kristian
    Tampere Univ, Finland.
    Chang, Hyung Jin
    Univ Birmingham, England.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Zajc, Luka Cehovin
    Univ Ljubljana, Slovenia.
    Lukezic, Alan
    Univ Ljubljana, Slovenia.
    Drbohlav, Ondrej
    Czech Tech Univ, Czech Republic.
    Kapyla, Jani
    Tampere Univ, Finland.
    Häger, Gustav
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Yan, Song
    Tampere Univ, Finland.
    Yang, Jinyu
    Univ Birmingham, England.
    Zhang, Zhongqun
    Univ Birmingham, England.
    Fernandez, Gustavo
    Austrian Inst Technol, Austria.
    Abdelpakey, Mohamed
    Univ British Columbia, Canada.
    Bhat, Goutam
    Swiss Fed Inst Technol, Switzerland.
    Cerkezi, Llukman
    Istanbul Tech Univ, Turkey.
    Cevikalp, Hakan
    Eskisehir Osmangazi Univ, Turkey.
    Chen, Shengyong
    Tianjin Univ Technol, Peoples R China.
    Chen, Xin
    Dalian Univ Technol, Peoples R China.
    Cheng, Miao
    Zhejiang Dahua Technol CO, Peoples R China.
    Cheng, Ziyi
    Kyushu Univ, Japan.
    Chiu, Yu-Chen
    Tamkang Univ, Taiwan.
    Cirakman, Ozgun
    Istanbul Tech Univ, Turkey.
    Cui, Yutao
    Nanjing Univ, Peoples R China.
    Dai, Kenan
    Dalian Univ Technol, Peoples R China.
    Dasari, Mohana Murali
    Indian Inst Technol Tirupati, India.
    Deng, Qili
    ByteDance, Peoples R China.
    Dong, Xingping
    Incept Inst Artificial Intelligence, Peoples R China.
    Du, Daniel K.
    ByteDance, Peoples R China.
    Dunnhofer, Matteo
    Univ Udine, Italy.
    Feng, Zhen-Hua
    Univ Surrey, England.
    Feng, Zhiyong
    Tianjin Univ, Peoples R China.
    Fu, Zhihong
    Beihang Univ, Peoples R China.
    Ge, Shiming
    Univ Chinese Acad Sci, Peoples R China.
    Gorthi, Rama Krishna
    Indian Inst Technol Tirupati, India.
    Gu, Yuzhang
    SIMIT, Peoples R China.
    Gunsel, Bilge
    Istanbul Tech Univ, Turkey.
    Guo, Qing
    Nanyang Technol Univ, Singapore.
    Gurkan, Filiz
    Istanbul Tech Univ, Turkey.
    Han, Wencheng
    Beijing Inst Technol, Peoples R China.
    Huang, Yanyan
    Fuzhou Univ, Peoples R China.
    Järemo-Lawin, Felix
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Jhang, Shang-Jhih
    Tamkang Univ, Taiwan.
    Ji, Rongrong
    Xiamen Univ, Peoples R China.
    Jiang, Cheng
    Nanjing Univ, Peoples R China.
    Jiang, Yingjie
    Jiangnan Univ, Peoples R China.
    Juefei-Xu, Felix
    Xiamen Univ, Peoples R China.
    Jun, Yin
    Zhejiang Dahua Technol CO, Peoples R China.
    Ke, Xiao
    Fuzhou Univ, Peoples R China.
    Khan, Fahad Shahbaz
    Mohamed Bin Zayed Univ Artificial Intelligence, U Arab Emirates.
    Kim, Byeong Hak
    Korea Inst Ind Technol KITECH, South Korea.
    Kittler, Josef
    Univ Surrey, England.
    Lan, Xiangyuan
    Hong Kong Baptist Univ, Peoples R China.
    Lee, Jun Ha
    Korea Inst Ind Technol KITECH, South Korea.
    Leibe, Bastian
    Rhein Westfal TH Aachen, Germany.
    Li, Hui
    Jiangnan Univ, Peoples R China.
    Li, Jianhua
    Dalian Univ Technol, Peoples R China.
    Li, Xianxian
    Guangxi Normal Univ, Peoples R China.
    Li, Yuezhou
    Fuzhou Univ, Peoples R China.
    Liu, Bo
    JD Finance Amer Corp, CA USA.
    Liu, Chang
    Dalian Univ Technol, Peoples R China.
    Liu, Jingen
    JD Finance Amer Corp, CA USA.
    Liu, Li
    Shenzhen Res Inst Big Data, Peoples R China.
    Liu, Qingjie
    Beihang Univ, Peoples R China.
    Lu, Huchuan
    Dalian Univ Technol, Peoples R China; Peng Cheng Lab, Peoples R China.
    Lu, Wei
    Zhejiang Dahua Technol CO, Peoples R China.
    Luiten, Jonathon
    Rhein Westfal TH Aachen, Germany.
    Ma, Jie
    Huaqiao Univ, Peoples R China.
    Ma, Ziang
    Zhejiang Dahua Technol CO, Peoples R China.
    Martinel, Niki
    Univ Udine, Italy.
    Mayer, Christoph
    Swiss Fed Inst Technol, Switzerland.
    Memarmoghadam, Alireza
    Univ Isfahan, Iran.
    Micheloni, Christian
    Univ Udine, Italy.
    Niu, Yuzhen
    Fuzhou Univ, Peoples R China.
    Paudel, Danda
    Swiss Fed Inst Technol, Switzerland.
    Peng, Houwen
    Microsoft Res Asia, Peoples R China.
    Qiu, Shoumeng
    SIMIT, Peoples R China.
    Rajiv, Aravindh
    Indian Inst Technol Tirupati, India.
    Rana, Muhammad
    Univ Surrey, England.
    Robinson, Andreas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Saribas, Hasan
    Eskisehir Tech Univ, Turkey.
    Shao, Ling
    Incept Inst Artificial Intelligence, Peoples R China.
    Shehata, Mohamed
    Univ British Columbia, Canada.
    Shen, Furao
    Nanjing Univ, Peoples R China.
    Shen, Jianbing
    Incept Inst Artificial Intelligence, Peoples R China.
    Simonato, Kristian
    Univ Udine, Italy.
    Song, Xiaoning
    Jiangnan Univ, Peoples R China.
    Tang, Zhangyong
    Jiangnan Univ, Peoples R China.
    Timofte, Radu
    Swiss Fed Inst Technol, Switzerland.
    Torr, Philip
    Univ Oxford, England.
    Tsai, Chi-Yi
    Tamkang Univ, Taiwan.
    Uzun, Bedirhan
    Eskisehir Osmangazi Univ, Turkey.
    Van Gool, Luc
    Swiss Fed Inst Technol, Switzerland.
    Voigtlaender, Paul
    Rhein Westfal TH Aachen, Germany.
    Wang, Dong
    Dalian Univ Technol, Peoples R China.
    Wang, Guangting
    Univ Sci & Technol China, Peoples R China.
    Wang, Liangliang
    ByteDance, Peoples R China.
    Wang, Lijun
    Dalian Univ Technol, Peoples R China.
    Wang, Limin
    Nanjing Univ, Peoples R China.
    Wang, Linyuan
    Zhejiang Dahua Technol CO, Peoples R China.
    Wang, Yong
    Sun Yat Sen Univ, Peoples R China.
    Wang, Yunhong
    Beihang Univ, Peoples R China.
    Wu, Chenyan
    Penn State Univ, PA 16802 USA.
    Wu, Gangshan
    Nanjing Univ, Peoples R China.
    Wu, Xiao-Jun
    Jiangnan Univ, Peoples R China.
    Xie, Fei
    Southeast Univ, Peoples R China.
    Xu, Tianyang
    Jiangnan Univ, Peoples R China; Univ Surrey, England.
    Xu, Xiang
    Nanjing Univ, Peoples R China.
    Xue, Wanli
    Tianjin Univ Technol, Peoples R China.
    Yan, Bin
    Dalian Univ Technol, Peoples R China.
    Yang, Wankou
    Southeast Univ, Peoples R China.
    Yang, Xiaoyun
    Remark AI, England.
    Ye, Yu
    Fuzhou Univ, Peoples R China.
    Yin, Jun
    Zhejiang Dahua Technol CO, Peoples R China.
    Zhang, Chengwei
    Dalian Maritime Univ, Peoples R China.
    Zhang, Chunhui
    Univ Chinese Acad Sci, Peoples R China.
    Zhang, Haitao
    Zhejiang Dahua Technol CO, Peoples R China.
    Zhang, Kaihua
    Nanjing Univ Informat Sci & Technol, Peoples R China.
    Zhang, Kangkai
    Univ Chinese Acad Sci, Peoples R China.
    Zhang, Xiaohan
    Dalian Univ Technol, Peoples R China.
    Zhang, Xiaolin
    SIMIT, Peoples R China.
    Zhang, Xinyu
    Dalian Univ Technol, Peoples R China.
    Zhang, Zhibin
    Tianjin Univ Technol, Peoples R China.
    Zhao, Shaochuan
    Jiangnan Univ, Peoples R China.
    Zhen, Ming
    ByteDance, Peoples R China.
    Zhong, Bineng
    Guangxi Normal Univ, Peoples R China.
    Zhu, Jiawen
    Dalian Univ Technol, Peoples R China.
    Zhu, Xue-Feng
    Jiangnan Univ, Peoples R China.
    The Ninth Visual Object Tracking VOT2021 Challenge Results2021In: 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), IEEE COMPUTER SOC , 2021, p. 2711-2738Conference paper (Refereed)
    Abstract [en]

    The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website(1).

  • 25.
    Johnander, Joakim
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenseact, Gothenburg, Sweden.
    Brissman, Emil
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Saab, Linköping, Sweden.
    Danelljan, Martin
    Computer Vision Lab, ETH Zürich, Zürich, Switzerland.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. School of Engineering, University of KwaZulu-Natal, Durban, South Africa.
    Video Instance Segmentation with Recurrent Graph Neural Networks2021In: Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28 – October 1, 2021, Proceedings. / [ed] Bauckhage C., Gall J., Schwing A., Springer, 2021, p. 206-221Conference paper (Refereed)
    Abstract [en]

    Video instance segmentation is one of the core problems in computer vision. Formulating a purely learning-based method, which models the generic track management required to solve the video instance segmentation task, is a highly challenging problem. In this work, we propose a novel learning framework where the entire video instance segmentation problem is modeled jointly. To this end, we design a graph neural network that in each frame jointly processes all detections and a memory of previously seen tracks. Past information is considered and processed via a recurrent connection. We demonstrate the effectiveness of the proposed approach in comprehensive experiments. Our approach, operating at over 25 FPS, outperforms previous video real-time methods. We further conduct detailed ablative experiments that validate the different aspects of our approach.

  • 26.
    Eldesokey, Abdelrahman
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Confidence Propagation through CNNs for Guided Sparse Depth Regression2020In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, Vol. 42, no 10Article in journal (Refereed)
    Abstract [en]

    Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5% of the number of parameters compared to the state-of-the-art methods.

    Download full text (pdf)
    fulltext
  • 27.
    Grelsson, Bertil
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Saab Dynam, Dept Dev and Technol, Linkoping, Sweden.
    Robinson, Andreas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Inception Inst Artificial Intelligence, U Arab Emirates.
    GPS-level accurate camera localization with HorizonNet2020In: Journal of Field Robotics, ISSN 1556-4959, E-ISSN 1556-4967, Vol. 37, no 6, p. 951-971Article in journal (Refereed)
    Abstract [en]

    This paper investigates the problem of position estimation of unmanned surface vessels (USVs) operating in coastal areas or in the archipelago. We propose a position estimation method where the horizon line is extracted in a 360 degrees panoramic image around the USV. We design a convolutional neural network (CNN) architecture to determine an approximate horizon line in the image and implicitly determine the camera orientation (the pitch and roll angles). The panoramic image is warped to compensate for the camera orientation and to generate an image from an approximately level camera. A second CNN architecture is designed to extract the pixelwise horizon line in the warped image. The extracted horizon line is correlated with digital elevation model data in the Fourier domain using a minimum output sum of squared error correlation filter. Finally, we determine the location of the maximum correlation score over the search area to estimate the position of the USV. Comprehensive experiments are performed in field trials conducted over 3 days in the archipelago. Our approach provides excellent results by achieving robust position estimates with global positioning system (GPS)-level accuracy in previously unvisited test areas.

    Download full text (pdf)
    fulltext
  • 28.
    Robinson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Järemo-Lawin, Felix
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    CVL, ETH Zurich, Switzerland.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. IIAI, UAE.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Learning Fast and Robust Target Models for Video Object Segmentation2020In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 7404-7413, article id 9156406Conference paper (Refereed)
    Abstract [en]

    Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.

    Download full text (pdf)
    fulltext
  • 29.
    Goutam, Bhat
    et al.
    ETH Zürich, Switzerland.
    Järemo-Lawin, Felix
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    ETH Zürich, Switzerland.
    Robinson, Andreas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Van Gool, Luc
    ETH Zürich, Switzerland.
    Timofte, Radu
    ETH Zürich, Switzerland.
    Learning What to Learn for Video Object Segmentation2020In: Computer Vision: ECCV 2020 Workshop / [ed] Vedaldi A., Bischof H., Brox T., Frahm JM, 2020, p. 777-794Conference paper (Refereed)
    Abstract [en]

    Video object segmentation (VOS) is a highly challengingproblem, since the target object is only defined by a first-frame refer-ence mask during inference. The problem of how to capture and utilizethis limited information to accurately segment the target remains a fun-damental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shotlearner. Our learner is designed to predict a powerful parametric modelof the target by minimizing a segmentation error in the first frame. Wefurther go beyond the standard few-shot learning paradigm by learningwhat our target model should learn in order to maximize segmentationaccuracy. We perform extensive experiments on standard benchmarks.Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding toa 2.6% relative improvement over the previous best result. The code andmodels are available at https://github.com/visionml/pytracking.

    Download full text (pdf)
    fulltext
  • 30.
    Kristan, M.
    et al.
    University of Ljubljana, Ljubljana, Slovenia.
    Leonardis, A.
    University of Birmingham, Birmingham, United Kingdom.
    Matas, J.
    Czech Technical University, Prague, Czech Republic.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pflugfelder, R.
    Austrian Institute of Technology, Seibersdorf, Austria; TU Wien, Vienna, Austria.
    Kämäräinen, J.-K.
    Tampere University, Tampere, Finland.
    Danelljan, M.
    ETH Zürich, Zürich, Switzerland.
    Zajc, L.C.
    University of Ljubljana, Ljubljana, Slovenia.
    Lukežic, A.
    University of Ljubljana, Ljubljana, Slovenia.
    Drbohlav, O.
    Czech Technical University, Prague, Czech Republic.
    He, Linbo
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Zhang, Yushan
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Beijing Institute of Technology, Beijing, China.
    Yan, S.
    Tampere University, Tampere, Finland.
    Yang, J.
    University of Birmingham, Birmingham, United Kingdom.
    Fernández, G.
    Austrian Institute of Technology, Seibersdorf, Austria.
    Hauptmann, A.
    Carnegie Mellon University, Pittsburgh, United States.
    Memarmoghadam, A.
    University of Isfahan, Isfahan, Iran.
    García-Martín, Á.
    Universidad Autónoma de Madrid, Madrid, Spain.
    Robinson, Andreas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Varfolomieiev, A.
    National Technical University of Ukraine, Kiev, Ukraine.
    Gebrehiwot, A.H.
    Universidad Autónoma de Madrid, Madrid, Spain.
    Uzun, B.
    Eskisehir Osmangazi University, Eskisehir, Turkey.
    Yan, B.
    Dalian University of Technology, Dalian, China.
    Li, B.
    Institute of Automation, Chinese Academy of Sciences, Beijing, China.
    Qian, C.
    Sensetime, Taiwan, Hong Kong.
    Tsai, C.-Y.
    Tamkang University, New Taipei City, Taiwan.
    Micheloni, C.
    University of Udine, Udine, Italy.
    Wang, D.
    Dalian University of Technology, Dalian, China.
    Wang, F.
    Sensetime, Taiwan, Hong Kong.
    Xie, F.
    Southeast University, Nanjing, China.
    Järemo-Lawin, Felix
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Gustafsson, F.
    Uppsala University, Uppsala, Sweden.
    Foresti, G.L.
    University of Udine, Udine, Italy.
    Bhat, G.
    ETH Zürich, Zürich, Switzerland.
    Chen, G.
    Sensetime, Taiwan, Hong Kong.
    Ling, H.
    Stony Brook University, Stony Brook, United States.
    Zhang, H.
    Zhejiang Dahua Technology, Binjiang, China.
    Cevikalp, H.
    Eskisehir Osmangazi University, Eskisehir, Turkey.
    Zhao, H.
    Dalian University of Technology, Dalian, China.
    Bai, H.
    Sichuan University, Chengdu, China.
    Kuchibhotla, H.C.
    Indian Institute of Technology, Tirupati, Tirupati, India.
    Saribas, H.
    Eskisehir Technical University, Eskisehir, Turkey.
    Fan, H.
    Stony Brook University, Stony Brook, United States.
    Ghanei-Yakhdan, H.
    Yazd University, Yazd, Iran.
    Li, H.
    University of Science and Technology of China, Hefei, China.
    Peng, H.
    Microsoft Research, Redmond, United States.
    Lu, H.
    Dalian University of Technology, Dalian, China.
    Li, H.
    Jiangnan University, Wuxi, China.
    Khaghani, J.
    University of Alberta, Edmonton, Canada.
    Bescos, J.
    Universidad Autónoma de Madrid, Madrid, Spain.
    Li, J.
    Dalian University of Technology, Dalian, China.
    Fu, J.
    Microsoft Research, Redmond, United States.
    Yu, J.
    Samsung Research China-Beijing (SRC-B), Beijing, China.
    Xu, J.
    Samsung Research China-Beijing (SRC-B), Beijing, China.
    Kittler, J.
    University of Surrey, Guildford, United Kingdom.
    Yin, J.
    Zhejiang Dahua Technology, Binjiang, China.
    Lee, J.
    Korea University, Seoul, South Korea.
    Yu, K.
    High School Affiliated to Renmin University of China, Beijing, China.
    Liu, K.
    Institute of Automation, Chinese Academy of Sciences, Beijing, China.
    Yang, K.
    Nanjing University of Information Science and Technology, Nanjing, China.
    Dai, K.
    Dalian University of Technology, Dalian, China.
    Cheng, L.
    University of Alberta, Edmonton, Canada.
    Zhang, L.
    University of Oxford, Oxford, United Kingdom.
    Wang, L.
    Dalian University of Technology, Dalian, China.
    Wang, L.
    Zhejiang Dahua Technology, Binjiang, China.
    Van, Gool L.
    ETH Zürich, Zürich, Switzerland.
    Bertinetto, L.
    Five AI, London, United Kingdom.
    Dunnhofer, M.
    University of Udine, Udine, Italy.
    Cheng, M.
    Zhejiang Dahua Technology, Binjiang, China.
    Dasari, M.M.
    Indian Institute of Technology, Tirupati, Tirupati, India.
    Wang, N.
    Nanjing University of Information Science and Technology, Nanjing, China.
    Wang, N.
    University of Science and Technology of China, Hefei, China.
    Zhang, P.
    Dalian University of Technology, Dalian, China.
    Torr, P.H.S.
    University of Oxford, Oxford, United Kingdom.
    Wang, Q.
    NLP, Beijing, China.
    Timofte, R.
    ETH Zürich, Zürich, Switzerland.
    Gorthi, R.K.S.
    Indian Institute of Technology, Tirupati, Tirupati, India.
    Choi, S.
    KAIST, Daejeon, South Korea.
    Marvasti-Zadeh, S.M.
    University of Alberta, Edmonton, Canada.
    Zhao, S.
    Jiangnan University, Wuxi, China.
    Kasaei, S.
    Sharif University of Technology, Tehran, Iran.
    Qiu, S.
    Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, China.
    Chen, S.
    Dalian University of Technology, Dalian, China.
    Schön, T.B.
    Uppsala University, Uppsala, Sweden.
    Xu, T.
    University of Surrey, Guildford, United Kingdom.
    Lu, W.
    Zhejiang Dahua Technology, Binjiang, China.
    Hu, W.
    Institute of Automation, Chinese Academy of Sciences, Beijing, China; NLP, Beijing, China.
    Zhou, W.
    University of Science and Technology of China, Hefei, China.
    Qiu, X.
    Megvii, Beijing, China.
    Ke, X.
    Fuzhou University, Fuzhou, China.
    Wu, X.-J.
    Jiangnan University, Wuxi, China.
    Zhang, X.
    Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, China.
    Yang, X.
    Remark Holdings, London, United Kingdom.
    Zhu, X.
    Jiangnan University, Wuxi, China.
    Jiang, Y.
    Jiangnan University, Wuxi, China.
    Wang, Y.
    Dalian University of Technology, Dalian, China.
    Chen, Y.
    Samsung Research China-Beijing (SRC-B), Beijing, China.
    Ye, Y.
    Fuzhou University, Fuzhou, China.
    Li, Y.
    Fuzhou University, Fuzhou, China.
    Yao, Y.
    Southeast University, Nanjing, China.
    Lee, Y.
    Korea University, Seoul, South Korea.
    Gu, Y.
    Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, China.
    Wang, Z.
    Dalian University of Technology, Dalian, China.
    Tang, Z.
    Jiangnan University, Wuxi, China.
    Feng, Z.-H.
    University of Surrey, Guildford, United Kingdom.
    Mai, Z.
    University of Electronic Science and Technology of China, Chengdu, China.
    Zhang, Z.
    Institute of Automation, Chinese Academy of Sciences, Beijing, China.
    Wu, Z.
    Microsoft Research, Redmond, United States.
    Ma, Z.
    Zhejiang Dahua Technology, Binjiang, China.
    The Eighth Visual Object Tracking VOT2020 Challenge Results2020In: Computer Vision: ECCV 2020 Workshops, Glasgow, UK, August 23–28, 2020 / [ed] Adrien Bartoli; Andrea Fusiello, 2020, Vol. 12539, p. 547-601Conference paper (Refereed)
    Abstract [en]

    The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net ). 

  • 31.
    Eldesokey, Abdelrahman
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Holmquist, Karl
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Persson, Mikael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End2020In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2020, p. 12011-12020Conference paper (Refereed)
    Abstract [en]

    The focus in deep learning research has been mostly to push the limits of prediction accuracy. However, this was often achieved at the cost of increased complexity, raising concerns about the interpretability and the reliability of deep networks. Recently, an increasing attention has been given to untangling the complexity of deep networks and quantifying their uncertainty for different computer vision tasks. Differently, the task of depth completion has not received enough attention despite the inherent noisy nature of depth sensors. In this work, we thus focus on modeling the uncertainty of depth data in depth completion starting from the sparse noisy input all the way to the final prediction. We propose a novel approach to identify disturbed measurements in the input by learning an input confidence estimator in a self-supervised manner based on the normalized convolutional neural networks (NCNNs). Further, we propose a probabilistic version of NCNNs that produces a statistically meaningful uncertainty measure for the final prediction. When we evaluate our approach on the KITTI dataset for depth completion, we outperform all the existing Bayesian Deep Learning approaches in terms of prediction accuracy, quality of the uncertainty measure, and the computational efficiency. Moreover, our small network with 670k parameters performs on-par with conventional approaches with millions of parameters. These results give strong evidence that separating the network into parallel uncertainty and prediction streams leads to state-of-the-art performance with accurate uncertainty estimates.

    Download full text (pdf)
    fulltext
  • 32.
    Berg, Amanda
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Unsupervised Adversarial Learning of Anomaly Detection in the Wild2020In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI) / [ed] Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang, Amsterdam: IOS Press, 2020, Vol. 325, p. 1002-1008Conference paper (Refereed)
    Abstract [en]

    Unsupervised learning of anomaly detection in high-dimensional data, such as images, is a challenging problem recently subject to intense research. Through careful modelling of the data distribution of normal samples, it is possible to detect deviant samples, so called anomalies. Generative Adversarial Networks (GANs) can model the highly complex, high-dimensional data distribution of normal image samples, and have shown to be a suitable approach to the problem. Previously published GAN-based anomaly detection methods often assume that anomaly-free data is available for training. However, this assumption is not valid in most real-life scenarios, a.k.a. in the wild. In this work, we evaluate the effects of anomaly contaminations in the training data on state-of-the-art GAN-based anomaly detection methods. As expected, detection performance deteriorates. To address this performance drop, we propose to add an additional encoder network already at training time and show that joint generator-encoder training stratifies the latent space, mitigating the problem with contaminated data. We show experimentally that the norm of a query image in this stratified latent space becomes a highly significant cue to discriminate anomalies from normal data. The proposed method achieves state-of-the-art performance on CIFAR-10 as well as on a large, previously untested dataset with cell images.

    Download full text (pdf)
    fulltext
  • 33.
    Johnander, Joakim
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenuity, Sweden.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. ETH Zurich, Switzerland.
    Brissman, Emil
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Saab, Sweden.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. IIAI, UAE.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    A generative appearance model for end-to-end video object segmentation2019In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8945-8954Conference paper (Refereed)
    Abstract [en]

    One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

    Download full text (pdf)
    fulltext
  • 34.
    Danelljan, Martin
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Swiss Fed Inst Technol, Switzerland.
    Bhat, Goutam
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Swiss Fed Inst Technol, Switzerland.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Incept Inst Artificial Intelligence, U Arab Emirates.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    ATOM: Accurate tracking by overlap maximization2019In: 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), IEEE, 2019, p. 4655-4664Conference paper (Refereed)
    Abstract [en]

    While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring highlevel knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating targetspecific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking.

  • 35.
    Danelljan, Martin
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Bhat, Goutam
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Gladh, Susanna
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Deep motion and appearance cues for visual tracking2019In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 124, p. 74-81Article in journal (Refereed)
    Abstract [en]

    Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

  • 36.
    Robinson, Andreas
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Järemo-Lawin, Felix
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. ETH Zürich.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Discriminative Learning and Target Attention for the 2019 DAVIS Challenge onVideo Object Segmentation2019In: CVPR 2019 workshops: DAVIS Challenge on Video Object Segmentation, 2019Conference paper (Refereed)
    Abstract [en]

    In this work, we address the problem of semi-supervised video object segmentation, where the task is to segment a target object in every image of the video sequence, given a ground truth only in the first frame. To be successful it is crucial to robustly handle unpredictable target appearance changes and distracting objects in the background. In this work we obtain a robust and efficient representation of the target by integrating a fast and light-weight discriminative target model into a deep segmentation network. Trained during inference, the target model learns to discriminate between the local appearances of target and background image regions. Its predictions are enhanced to accurate segmentation masks in a subsequent refinement stage.To further improve the segmentation performance, we add a new module trained to generate global target attention vectors, given the input mask and image feature maps. The attention vectors add semantic information about thetarget from a previous frame to the refinement stage, complementing the predictions provided by the target appearance model. Our method is fast and requires no network fine-tuning. We achieve a combined J and F-score of 70.6 on the DAVIS 2019 test-challenge data

  • 37.
    Klamt, Tobias
    et al.
    Univ Bonn, Germany.
    Rodriguez, Diego
    Univ Bonn, Germany.
    Baccelliere, Lorenzo
    Ist Italiano Tecnol, Italy.
    Chen, Xi
    KTH Royal Inst Technol, Sweden.
    Chiaradia, Domenico
    St Anna Sch Adv Studies, Italy.
    Cichon, Torben
    Rheinisch Westfalische TH Aachen, Germany.
    Gabardi, Massimiliano
    St Anna Sch Adv Studies, Italy.
    Guria, Paolo
    Ist Italiano Tecnol, Italy.
    Holmquist, Karl
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Kamedula, Malgorzata
    Ist Italiano Tecnol, Italy.
    Karaoguz, Hakan
    KTH Royal Inst Technol, Sweden.
    Kashiri, Navvab
    Ist Italiano Tecnol, Italy.
    Laurenzi, Arturo
    Ist Italiano Tecnol, Italy.
    Lenz, Christian
    Univ Bonn, Germany.
    Leonardis, Daniele
    St Anna Sch Adv Studies, Italy.
    Hoffman, Enrico Mingo
    Ist Italiano Tecnol, Italy.
    Muratore, Luca
    Ist Italiano Tecnol, Italy; Univ Manchester, England.
    Pavlichenko, Dmytro
    Univ Bonn, Germany.
    Porcini, Francesco
    St Anna Sch Adv Studies, Italy.
    Ren, Zeyu
    Ist Italiano Tecnol, Italy.
    Schilling, Fabian
    Swiss Fed Inst Technol, Switzerland.
    Schwarz, Max
    Univ Bonn, Germany.
    Solazzi, Massimiliano
    St Anna Sch Adv Studies, Italy.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Frisoli, Antonio
    St Anna Sch Adv Studies, Italy.
    Gustmann, Michael
    Kerntech Hilfsdienst, Germany.
    Jensfelt, Patric
    KTH Royal Inst Technol, Sweden.
    Nordberg, Klas
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Rossmann, Juergen
    Rheinisch Westfalische TH Aachen, Germany.
    Suess, Uwe
    Kerntech Hilfsdienst, Germany.
    Tsagarakis, Nikos G.
    Ist Italiano Tecnol, Italy.
    Behnke, Sven
    Univ Bonn, Germany.
    Flexible Disaster Response of Tomorrow: Final Presentation and Evaluation of the CENTAURO System2019In: IEEE robotics & automation magazine, ISSN 1070-9932, E-ISSN 1558-223X, Vol. 26, no 4, p. 59-72Article in journal (Refereed)
    Abstract [en]

    Mobile manipulation robots have great potential for roles in support of rescuers on disaster-response missions. Robots can operate in places too dangerous for humans and therefore can assist in accomplishing hazardous tasks while their human operators work at a safe distance. We developed a disaster-response system that consists of the highly flexible Centauro robot and suitable control interfaces, including an immersive telepresence suit and support-operator controls offering different levels of autonomy.

    Download full text (pdf)
    fulltext
  • 38.
    Johnander, Joakim
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Bhat, Goutam
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    On the Optimization of Advanced DCF-Trackers2019In: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018, Proceedings, Part I / [ed] Laura Leal-TaixéStefan Roth, Cham: Springer Publishing Company, 2019, p. 54-69Conference paper (Refereed)
    Abstract [en]

    Trackers based on discriminative correlation filters (DCF) have recently seen widespread success and in this work we dive into their numerical core. DCF-based trackers interleave learning of the target detector and target state inference based on this detector. Whereas the original formulation includes a closed-form solution for the filter learning, recently introduced improvements to the framework no longer have known closed-form solutions. Instead a large-scale linear least squares problem must be solved each time the detector is updated. We analyze the procedure used to optimize the detector and let the popular scheme introduced with ECO serve as a baseline. The ECO implementation is revisited in detail and several mechanisms are provided with alternatives. With comprehensive experiments we show which configurations are superior in terms of tracking capabilities and optimization performance.

    Download full text (pdf)
    On the Optimization of Advanced DCF-Trackers
  • 39.
    Eldesokey, Abdelrahman
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Inception Institute of Artificial Intelligence Abu Dhabi, UAE.
    Propagating Confidences through CNNs for Sparse Data Regression2019In: British Machine Vision Conference 2018, BMVC 2018, BMVA Press , 2019Conference paper (Refereed)
    Abstract [en]

    In most computer vision applications, convolutional neural networks (CNNs) operate on dense image data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open problem with numerous applications in autonomous driving, robotics, and surveillance. To tackle this challenging problem, we introduce an algebraically-constrained convolution layer for CNNs with sparse input and demonstrate its capabilities for the scene depth completion task. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. Furthermore, we propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. Comprehensive experiments are performed on the KITTI depth benchmark and the results clearly demonstrate that the proposed approach achieves superior performance while requiring three times fewer parameters than the state-of-the-art methods. Moreover, our approach produces a continuous pixel-wise confidence map enabling information fusion, state inference, and decision support.

    Download full text (pdf)
    Propagating Confidences through CNNs for Sparse Data Regression
  • 40.
    Berg, Amanda
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenuity AB, Göteborg, Sweden.
    Durand de Gevigney, Flavie
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Grenoble INP, France.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Semi-automatic Annotation of Objects in Visual-Thermal Video2019In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Institute of Electrical and Electronics Engineers (IEEE), 2019Conference paper (Refereed)
    Abstract [en]

    Deep learning requires large amounts of annotated data. Manual annotation of objects in video is, regardless of annotation type, a tedious and time-consuming process. In particular, for scarcely used image modalities human annotationis hard to justify. In such cases, semi-automatic annotation provides an acceptable option.

    In this work, a recursive, semi-automatic annotation method for video is presented. The proposed method utilizesa state-of-the-art video object segmentation method to propose initial annotations for all frames in a video based on only a few manual object segmentations. In the case of a multi-modal dataset, the multi-modality is exploited to refine the proposed annotations even further. The final tentative annotations are presented to the user for manual correction.

    The method is evaluated on a subset of the RGBT-234 visual-thermal dataset reducing the workload for a human annotator with approximately 78% compared to full manual annotation. Utilizing the proposed pipeline, sequences are annotated for the VOT-RGBT 2019 challenge.

    Download full text (pdf)
    fulltext
  • 41.
    Kristanl, Matej
    et al.
    Univ Ljubljana, Slovenia.
    Matas, Jiri
    Czech Tech Univ, Czech Republic.
    Leonardis, Ales
    Univ Birmingham, England.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pflugfelder, Roman
    Austrian Acad Sci, Austria; TU Wien, Austria.
    Kamarainen, Joni-Kristian
    Tampere Univ, Finland.
    Zajc, Luka Cehovin
    Univ Ljubljana, Slovenia.
    Drbohlav, Ondrej
    Czech Tech Univ, Czech Republic.
    Lukezic, Alan
    Univ Ljubljana, Slovenia.
    Berg, Amanda
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemtekn AB, Sweden.
    Eldesokey, Abdelrahman
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Kapyla, Jani
    Tampere Univ, Finland.
    Fernandez, Gustavo
    Austrian Acad Sci, Austria.
    Gonzalez-Garcia, Abel
    Comp Vis Ctr, Spain.
    Memarrnoghadam, Alireza
    Univ Isfahan, Iran.
    Lu, Andong
    Anhui Univ, Peoples R China.
    He, Anfeng
    Univ Sci & Technol China, Peoples R China.
    Varfolomieiev, Anton
    NTUU Igor Sikorsky Kyiv Polytech Inst, Ukraine.
    Chan, Antoni
    City Univ Hong Kong, Peoples R China.
    Tripathi, Ardhendu Shekhar
    Swiss Fed Inst Technol, Switzerland.
    Smeulders, Arnold
    Univ Amsterdam, Netherlands.
    Pedasingu, Bala Suraj
    IIT Tirupati, India.
    Chen, Bao Xin
    York Univ, Canada.
    Zhang, Baopeng
    Beijing Jiaotong Univ, Peoples R China.
    Wu, Baoyuan
    Tencent AI Lab, Peoples R China.
    Li, Bi
    Chinese Acad Sci, Peoples R China; Huazhong Univ Sci & Technol, Peoples R China.
    He, Bin
    Baidu Inc, Peoples R China.
    Yan, Bin
    Dalian Univ Technol, Peoples R China.
    Bai, Bing
    Didi Chuxing, Peoples R China.
    Li, Bing
    Chinese Acad Sci, Peoples R China.
    Li, Bo
    SenseTime, Peoples R China.
    Kim, Bycong Hak
    Hanwha Syst Co, South Korea; Kyungpook Natl Univ, South Korea.
    Ma, Chao
    Shanghai Jiao Tong Univ, Peoples R China.
    Fang, Chen
    Nanjing Normal Univ, Peoples R China.
    Qian, Chen
    SenseTime, Peoples R China.
    Chen, Cheng
    Peking Univ, Peoples R China.
    Li, Chenglong
    Anhui Univ, Peoples R China.
    Zhang, Chengquan
    Baidu Inc, Peoples R China.
    Tsai, Chi-Yi
    Tamkang Univ, Taiwan.
    Luo, Chong
    Microsoft Res, Peoples R China.
    Micheloni, Christian
    Austrian Acad Sci, Austria.
    Zhang, Chunhui
    Chinese Acad Sci, Peoples R China.
    Tao, Dacheng
    Univ Sydney, Australia.
    Gupta, Deepak
    Univ Amsterdam, Netherlands.
    Song, Dejia
    Huazhong Univ Sci & Technol, Peoples R China.
    Wang, Dong
    Dalian Univ Technol, Peoples R China.
    Gavves, Efstratios
    Univ Amsterdam, Netherlands.
    Yi, Eunu
    Hanwha Syst Co, South Korea.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Inception Inst Artificial Intelligence, U Arab Emirates.
    Zhang, Fangyi
    Chinese Acad Sci, Peoples R China.
    Wang, Fei
    SenseTime, Peoples R China.
    Zhao, Fei
    Chinese Acad Sci, Peoples R China.
    De Ath, George
    Univ Exeter, England.
    Bhat, Goutam
    Swiss Fed Inst Technol, Switzerland.
    Chen, Guanqi
    SenseTime, Peoples R China.
    Wang, Guangting
    Univ Sci & Technol China, Peoples R China.
    Li, Guoxuan
    SenseTime, Peoples R China.
    Cevikalp, Hakan
    Eskisehir Osmangazi Univ, Turkey.
    Du, Hao
    Microsoft Res, Peoples R China.
    Zhao, Haojie
    Dalian Univ Technol, Peoples R China.
    Saribas, Hasan
    Eskisehir Tech Univ, Turkey.
    Jung, Ho Min
    Kyungpook Natl Univ, South Korea.
    Bai, Hongliang
    Beijing FaceAll Co, Peoples R China.
    Yu, Hongyuan
    Chinese Acad Sci, Peoples R China; Microsoft Res, Peoples R China.
    Peng, Houwen
    Microsoft Res, Peoples R China.
    Lu, Huchuan
    Dalian Univ Technol, Peoples R China.
    Li, Hui
    Jiangnan Univ, Peoples R China.
    Li, Jiakun
    Beijing Jiaotong Univ, Peoples R China.
    Li, Jianhu
    Dalian Univ Technol, Peoples R China.
    Fu, Jianlong
    Microsoft Res, Peoples R China.
    Chen, Jie
    Xidian Univ, Peoples R China.
    Gao, Jie
    Xidian Univ, Peoples R China.
    Zhao, Jie
    Dalian Univ Technol, Peoples R China.
    Tang, Jin
    Anhui Univ, Peoples R China.
    Li, Jing
    Harbin Inst Technol, Peoples R China.
    Wu, Jingjing
    Hefei Univ Technol, Peoples R China.
    Liu, Jingtuo
    Baidu Inc, Peoples R China.
    Wang, Jinqiao
    Chinese Acad Sci, Peoples R China.
    Qi, Jingqing
    Dalian Univ Technol, Peoples R China.
    Zhang, Jingyue
    Xidian Univ, Peoples R China.
    Tsotsos, John K.
    York Univ, Canada.
    Lee, John Hyuk
    Kyungpook Natl Univ, South Korea.
    van de Weijer, Joost
    Comp Vis Ctr, Spain.
    Kittler, Josef
    Univ Surrey, England.
    Lee, Jun Ha
    Kyungpook Natl Univ, South Korea.
    Zhuang, Junfei
    Beijing Univ Posts & Telecommun, Peoples R China.
    Zhang, Kangkai
    Chinese Acad Sci, Peoples R China.
    wang, Kangkang
    Baidu Inc, Peoples R China.
    Dai, Kenan
    Dalian Univ Technol, Peoples R China.
    Chen, Lei
    SenseTime, Peoples R China.
    Liu, Lei
    Anhui Univ, Peoples R China.
    Guo, Leida
    YouTu Lab, Peoples R China.
    Zhang, Li
    Comp Vis Ctr, Spain; Univ Oxford, England.
    Wang, Liang
    Chinese Acad Sci, Peoples R China; Huazhong Univ Sci & Technol, Peoples R China.
    Wang, Liangliang
    Huazhong Univ Sci & Technol, Peoples R China.
    Zhang, Lichao
    Comp Vis Ctr, Spain.
    Wang, Lijun
    Dalian Univ Technol, Peoples R China.
    Zhou, Lijun
    Univ Chinese Acad Sci, Peoples R China.
    Zheng, Linyu
    Chinese Acad Sci, Peoples R China.
    Rout, Litu
    SAC ISRO, India.
    Van Gool, Luc
    Swiss Fed Inst Technol, Switzerland.
    Bertinetto, Luca
    FiveAI, England.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Dunnhofer, Matteo
    Univ Udine, Italy.
    Ni, Meng
    Dalian Univ Technol, Peoples R China.
    Kim, Min Young
    Kyungpook Natl Univ, South Korea.
    Tang, Ming
    Chinese Acad Sci, Peoples R China.
    Yang, Ming-Hsuan
    Univ Calif Merced, CA USA.
    Paluru, Naveen
    IIT Tirupati, India.
    Martine, Niki
    Univ Udine, Italy.
    Xu, Pengfei
    Didi Chuxing, Peoples R China.
    Zhang, Pengfei
    Univ Sydney, Australia.
    Zheng, Pengkun
    Peking Univ, Peoples R China.
    Zhang, Pengyu
    Dalian Univ Technol, Peoples R China.
    Torr, Philip H. S.
    Univ Oxford, England.
    Wang, Qi Zhang Qiang
    Chinese Acad Sci, Peoples R China; IINTELLIMIND LTD, Peoples R China.
    Gua, Qing
    Tianjin Univ, Peoples R China.
    Timofte, Radu
    Swiss Fed Inst Technol, Switzerland.
    Gorthi, Rama Krishna
    IIT Tirupati, India.
    Everson, Richard
    Univ Exeter, England.
    Han, Ruize
    Tianjin Univ, Peoples R China.
    Zhang, Ruohan
    Xidian Univ, Peoples R China.
    You, Shan
    SenseTime, Peoples R China.
    Zhao, Shao-Chuan
    Jiangnan Univ, Peoples R China.
    Zhao, Shengwei
    Chinese Acad Sci, Peoples R China.
    Li, Shihu
    Baidu Inc, Peoples R China.
    Li, Shikun
    Chinese Acad Sci, Peoples R China.
    Ge, Shiming
    Chinese Acad Sci, Peoples R China.
    Bai, Shuai
    Beijing Univ Posts & Telecommun, Peoples R China.
    Guan, Shuosen
    YouTu Lab, Peoples R China.
    Xing, Tengfei
    Didi Chuxing, Peoples R China.
    Xu, Tianyang
    Jiangnan Univ, Peoples R China.
    Yang, Tianyu
    City Univ Hong Kong, Peoples R China.
    Zhang, Ting
    China Natl Elect Import Export Corp, Peoples R China.
    Vojir, Tomas
    Univ Cambridge, England.
    Feng, Wei
    Tianjin Univ, Peoples R China.
    Hu, Weiming
    Chinese Acad Sci, Peoples R China.
    Wang, Weizhao
    Peking Univ, Peoples R China.
    Tang, Wenjie
    China Natl Elect Import Export Corp, Peoples R China.
    Zeng, Wenjun
    Microsoft Res, Peoples R China.
    Liu, Wenyu
    Huazhong Univ Sci & Technol, Peoples R China.
    Chen, Xi
    Chinese Acad Sci, Peoples R China; Xidian Univ, Peoples R China; Zhejiang Univ, Peoples R China.
    Qiu, Xi
    Xianan JiaoTong Univ, Peoples R China.
    Bai, Xiang
    Huazhong Univ Sci & Technol, Peoples R China.
    Wu, Xiao-Jun
    Jiangnan Univ, Peoples R China.
    Yang, Xiaoyun
    Chinese Academy of Sciences, China.
    Chen, Xier
    Xidian Univ, Peoples R China.
    Li, Xin
    Harbin Inst Technol, Peoples R China.
    Sun, Xing
    YouTu Lab, Peoples R China.
    Chen, Xingyu
    Chinese Acad Sci, Peoples R China.
    Tian, Xinmei
    Univ Sci & Technol China, Peoples R China.
    Tang, Xu
    Baidu Inc, Peoples R China.
    Zhu, Xue-Feng
    Jiangnan Univ, Peoples R China.
    Huang, Yan
    Chinese Acad Sci, Peoples R China.
    Chen, Yanan
    Xidian Univ, Peoples R China.
    Lian, Yanchao
    Xidian Univ, Peoples R China.
    Gu, Yang
    Didi Chuxing, Peoples R China.
    Liu, Yang
    North China Elect Power Univ, Peoples R China.
    Chen, Yanjie
    SenseTime, Peoples R China.
    Zhang, Yi
    YouTu Lab, Peoples R China.
    Xu, Yinda
    Zhejiang Univ, Peoples R China.
    Wang, Yingming
    Dalian Univ Technol, Peoples R China.
    Li, Yingping
    Xidian Univ, Peoples R China.
    Zhou, Yu
    Huazhong Univ Sci & Technol, Peoples R China.
    Dong, Yuan
    Beijing Univ Posts & Telecommun, Peoples R China.
    Xu, Yufei
    Univ Sci & Technol China, Peoples R China.
    Zhang, Yunhua
    Dalian Univ Technol, Peoples R China.
    Li, Yunkun
    Jiangnan Univ, Peoples R China.
    Luo, Zeyu Wang Zhao
    Chinese Acad Sci, Peoples R China.
    Zhang, Zhaoliang
    China Natl Elect Import Export Corp, Peoples R China.
    Feng, Zhen-Hua
    Univ Surrey, England.
    He, Zhenyu
    Harbin Inst Technol, Peoples R China.
    Song, Zhichao
    Didi Chuxing, Peoples R China.
    Chen, Zhihao
    Tianjin Univ, Peoples R China.
    Zhang, Zhipeng
    Chinese Acad Sci, Peoples R China.
    Wu, Zhirong
    Microsoft Res, Peoples R China.
    Xiong, Zhiwei
    Univ Sci & Technol China, Peoples R China.
    Huang, Zhongjian
    Xidian Univ, Peoples R China.
    Teng, Zhu
    Beijing Jiaotong Univ, Peoples R China.
    Ni, Zihan
    Baidu Inc, Peoples R China.
    The Seventh Visual Object Tracking VOT2019 Challenge Results2019In: 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), IEEE COMPUTER SOC , 2019, p. 2206-2241Conference paper (Refereed)
    Abstract [en]

    The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOT-ST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2019 focused on long-term tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard short-term, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website(1).

    Download full text (pdf)
    fulltext
  • 42.
    Kristan, Matej
    et al.
    University of Ljubljana, Slovenia.
    Leonardis, Aleš
    University of Birmingham, United Kingdom.
    Matas, Jirí
    Czech Technical University, Czech Republic.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pflugfelder, Roman
    Austrian Institute of Technology, Austria / TU Wien, Austria.
    Zajc, Luka Cehovin
    University of Ljubljana, Slovenia.
    Vojírì, Tomáš
    Czech Technical University, Czech Republic.
    Bhat, Goutam
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Lukezič, Alan
    University of Ljubljana, Slovenia.
    Eldesokey, Abdelrahman
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Fernández, Gustavo
    García-Martín, Álvaro
    Iglesias-Arias, Álvaro
    Alatan, A. Aydin
    González-García, Abel
    Petrosino, Alfredo
    Memarmoghadam, Alireza
    Vedaldi, Andrea
    Muhič, Andrej
    He, Anfeng
    Smeulders, Arnold
    Perera, Asanka G.
    Li, Bo
    Chen, Boyu
    Kim, Changick
    Xu, Changsheng
    Xiong, Changzhen
    Tian, Cheng
    Luo, Chong
    Sun, Chong
    Hao, Cong
    Kim, Daijin
    Mishra, Deepak
    Chen, Deming
    Wang, Dong
    Wee, Dongyoon
    Gavves, Efstratios
    Gundogdu, Erhan
    Velasco-Salido, Erik
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Yang, Fan
    Zhao, Fei
    Li, Feng
    Battistone, Francesco
    De Ath, George
    Subrahmanyam, Gorthi R. K. S.
    Bastos, Guilherme
    Ling, Haibin
    Galoogahi, Hamed Kiani
    Lee, Hankyeol
    Li, Haojie
    Zhao, Haojie
    Fan, Heng
    Zhang, Honggang
    Possegger, Horst
    Li, Houqiang
    Lu, Huchuan
    Zhi, Hui
    Li, Huiyun
    Lee, Hyemin
    Chang, Hyung Jin
    Drummond, Isabela
    Valmadre, Jack
    Martin, Jaime Spencer
    Chahl, Javaan
    Choi, Jin Young
    Li, Jing
    Wang, Jinqiao
    Qi, Jinqing
    Sung, Jinyoung
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Henriques, Joao
    Choi, Jongwon
    van de Weijer, Joost
    Herranz, Jorge Rodríguez
    Martínez, José M.
    Kittler, Josef
    Zhuang, Junfei
    Gao, Junyu
    Grm, Klemen
    Zhang, Lichao
    Wang, Lijun
    Yang, Lingxiao
    Rout, Litu
    Si, Liu
    Bertinetto, Luca
    Chu, Lutao
    Che, Manqiang
    Maresca, Mario Edoardo
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Yang, Ming-Hsuan
    Abdelpakey, Mohamed
    Shehata, Mohamed
    Kang, Myunggu
    Lee, Namhoon
    Wang, Ning
    Miksik, Ondrej
    Moallem, P.
    Vicente-Moñivar, Pablo
    Senna, Pedro
    Li, Peixia
    Torr, Philip
    Raju, Priya Mariam
    Ruihe, Qian
    Wang, Qiang
    Zhou, Qin
    Guo, Qing
    Martín-Nieto, Rafael
    Gorthi, Rama Krishna
    Tao, Ran
    Bowden, Richard
    Everson, Richard
    Wang, Runling
    Yun, Sangdoo
    Choi, Seokeon
    Vivas, Sergio
    Bai, Shuai
    Huang, Shuangping
    Wu, Sihang
    Hadfield, Simon
    Wang, Siwen
    Golodetz, Stuart
    Ming, Tang
    Xu, Tianyang
    Zhang, Tianzhu
    Fischer, Tobias
    Santopietro, Vincenzo
    Štruc, Vitomir
    Wei, Wang
    Zuo, Wangmeng
    Feng, Wei
    Wu, Wei
    Zou, Wei
    Hu, Weiming
    Zhou, Wengang
    Zeng, Wenjun
    Zhang, Xiaofan
    Wu, Xiaohe
    Wu, Xiao-Jun
    Tian, Xinmei
    Li, Yan
    Lu, Yan
    Law, Yee Wei
    Wu, Yi
    Demiris, Yiannis
    Yang, Yicai
    Jiao, Yifan
    Li, Yuhong
    Zhang, Yunhua
    Sun, Yuxuan
    Zhang, Zheng
    Zhu, Zheng
    Feng, Zhen-Hua
    Wang, Zhihui
    He, Zhiqun
    The Sixth Visual Object Tracking VOT2018 Challenge Results2019In: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8–14, 2018 Proceedings, Part I / [ed] Laura Leal-Taixé and Stefan Roth, Cham: Springer Publishing Company, 2019, p. 3-53Conference paper (Refereed)
    Abstract [en]

    The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

    Download full text (pdf)
    The Sixth Visual Object Tracking VOT2018 Challenge Results
  • 43.
    Berg, Amanda
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Visual Spectrum Image Generation fromThermal Infrared2019Conference paper (Other academic)
    Abstract [en]

    We address short-term, single-object tracking, a topic that is currently seeing fast progress for visual video, for the case of thermal infrared (TIR) imagery. Tracking methods designed for TIR are often subject to a number of constraints, e.g., warm objects, low spatial resolution, and static camera. As TIR cameras become less noisy and get higher resolution these constraints are less relevant, and for emerging civilian applications, e.g., surveillance and automotive safety, new tracking methods are needed. Due to the special characteristics of TIR imagery, we argue that template-based trackers based on distribution fields should have an advantage over trackers based on spatial structure features. In this paper, we propose a templatebased tracking method (ABCD) designed specifically for TIR and not being restricted by any of the constraints above. The proposed tracker is evaluated on the VOT-TIR2015 and VOT2015 datasets using the VOT evaluation toolkit and a comparison of relative ranking of all common participating trackers in the challenges is provided. Experimental results show that the ABCD tracker performs particularly well on thermal infrared sequences.

  • 44.
    Öfjäll, Kristoffer
    et al.
    Visionists AB, Gothenburg, Sweden.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Approximative Coding Methods for Channel Representations2018In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 60, no 6, p. 929-940Article in journal (Refereed)
    Abstract [en]

    Most methods that address computer vision prob-lems require powerful visual features. Many successfulapproaches apply techniques motivated from nonparametricstatistics. The channel representation provides a frameworkfornonparametricdistributionrepresentation.Althoughearlywork has focused on a signal processing view of the rep-resentation, the channel representation can be interpretedin probabilistic terms, e.g., representing the distribution oflocal image orientation. In this paper, a variety of approxi-mative channel-based algorithms for probabilistic problemsare presented: a novel efficient algorithm for density recon-struction, a novel and efficient scheme for nonlinear griddingof densities, and finally a novel method for estimating Copuladensities. The experimental results provide evidence that byrelaxing the requirements for exact solutions, efficient algo-rithms are obtained

    Download full text (pdf)
    fulltext
  • 45.
    Bhat, Goutam
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Incept Inst Artificial Intelligence, U Arab Emirates.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Combining Local and Global Models for Robust Re-detection2018In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 25-30Conference paper (Refereed)
    Abstract [en]

    Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual tracking. However, these methods still struggle in occlusion and out-of-view scenarios due to the absence of a re-detection component. While such a component requires global knowledge of the scene to ensure robust re-detection of the target, the standard DCF is only trained on the local target neighborhood. In this paper, we augment the state-of-the-art DCF tracking framework with a re-detection component based on a global appearance model. First, we introduce a tracking confidence measure to detect target loss. Next, we propose a hard negative mining strategy to extract background distractors samples, used for training the global model. Finally, we propose a robust re-detection strategy that combines the global and local appearance model predictions. We perform comprehensive experiments on the challenging UAV123 and LTB35 datasets. Our approach shows consistent improvements over the baseline tracker, setting a new state-of-the-art on both datasets.

    Download full text (pdf)
    Combining Local and Global Models for Robust Re-detection
  • 46.
    Holmquist, Karl
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Senel, Deniz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Computing a Collision-Free Path using the monogenic scale space2018In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, p. 8097-8102Conference paper (Refereed)
    Abstract [en]

    Mobile robots have been used for various purposes with different functionalities which require them to freely move in environments containing both static and dynamic obstacles to accomplish given tasks. One of the most relevant capabilities in terms of navigating a mobile robot in such an environment is to find a safe path to a goal position. This paper shows that there exists an accurate solution to the Laplace equation which allows finding a collision-free path and that it can be efficiently calculated for a rectangular bounded domain such as a map which is represented as an image. This is accomplished by the use of the monogenic scale space resulting in a vector field which describes the attracting and repelling forces from the obstacles and the goal. The method is shown to work in reasonably convex domains and by the use of tessellation of the environment map for non-convex environments.

    Download full text (pdf)
    fulltext
  • 47.
    Häger, Gustav
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Countering bias in tracking evaluations2018In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications / [ed] Francisco Imai, Alain Tremeau and Jose Braz, Science and Technology Publications, Lda , 2018, Vol. 5, p. 581-587Conference paper (Refereed)
    Abstract [en]

    Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerfulfeatures, sophisticated learning methods and the introduction of benchmark datasets. Despite this significantimprovement, the evaluation of state-of-the-art object trackers still relies on the classical intersection overunion (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score aresub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of largetarget objects and favors over-estimated target prediction sizes. As our second contribution, we propose a newscore that is unbiased with respect to target prediction size. We systematically evaluate our proposed approachon benchmark tracking data with variations in relative target size. Our empirical results clearly suggest thatthe proposed score is unbiased in general.

    Download full text (pdf)
    Countering bias in tracking evaluations
  • 48.
    Järemo Lawin, Felix
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Forssén, Per-Erik
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Density Adaptive Point Set Registration2018In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, p. 3829-3837Conference paper (Refereed)
    Abstract [en]

    Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.