liu.seSearch for publications in DiVA
Endre søk
Begrens søket
12 1 - 50 of 64
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Bhat, Goutam
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Incept Inst Artificial Intelligence, U Arab Emirates.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Combining Local and Global Models for Robust Re-detection2018Inngår i: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 25-30Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual tracking. However, these methods still struggle in occlusion and out-of-view scenarios due to the absence of a re-detection component. While such a component requires global knowledge of the scene to ensure robust re-detection of the target, the standard DCF is only trained on the local target neighborhood. In this paper, we augment the state-of-the-art DCF tracking framework with a re-detection component based on a global appearance model. First, we introduce a tracking confidence measure to detect target loss. Next, we propose a hard negative mining strategy to extract background distractors samples, used for training the global model. Finally, we propose a robust re-detection strategy that combines the global and local appearance model predictions. We perform comprehensive experiments on the challenging UAV123 and LTB35 datasets. Our approach shows consistent improvements over the baseline tracker, setting a new state-of-the-art on both datasets.

    Fulltekst (pdf)
    Combining Local and Global Models for Robust Re-detection
  • 2.
    Bhat, Goutam
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Johnander, Joakim
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Unveiling the power of deep tracking2018Inngår i: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II / [ed] Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu and Yair Weiss, Cham: Springer Publishing Company, 2018, s. 493-509Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically study the characteristics of both deep and shallow features, and their relation to tracking accuracy and robustness. We identify the limited data and low spatial resolution as the main challenges, and propose strategies to counter these issues when integrating deep features for tracking. Furthermore, we propose a novel adaptive fusion approach that leverages the complementary properties of deep and shallow features to improve both robustness and accuracy. Extensive experiments are performed on four challenging datasets. On VOT2017, our approach significantly outperforms the top performing tracker from the challenge with a relative gain of >17% in EAO.

    Fulltekst (pdf)
    Unveiling the power of deep tracking
  • 3.
    Bhunia, Ankan Kumar
    et al.
    Mohamed bin Zayed University of AI, UAE.
    Khan, Salman
    Mohamed bin Zayed University of AI, UAE; 2Australian National University, Australia.
    Cholakkal, Hisham
    Mohamed bin Zayed University of AI, UAE.
    Anwer, Rao Muhammad
    Mohamed bin Zayed University of AI, UAE.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed bin Zayed University of AI, UAE.
    Shah, Mubarak
    University of Central Florida, USA.
    Handwriting Transformers2021Annet (Annet vitenskapelig)
    Abstract [en]

    We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images.

  • 4.
    Cao, Jiale
    et al.
    School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
    Pang, Yanwei
    School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
    Anwer, Rao Muhammad
    Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.
    Cholakkal, Hisham
    Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.
    Shao, Ling
    Terminus Group, Beijing, China.
    SipMaskv2: Enhanced Fast Image and Video Instance Segmentation2023Inngår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, nr 3, s. 3798-3812Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.

  • 5.
    Cao, Jiale
    et al.
    Tianjin University, China.
    Pang, Yanwei
    Tianjin University, China.
    Xie, Jin
    Tianjin University, China.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Shao, Ling
    School of Computing Sciences, University of East Anglia, UK.
    From Handcrafted to Deep Features for Pedestrian Detection: A Survey2022Inngår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 44, nr 9, s. 4913-4934Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.

  • 6.
    Cholakkal, Hisham
    et al.
    Incept Inst Artificial Intelligence, U Arab Emirates.
    Sun, Guolei
    Incept Inst Artificial Intelligence, U Arab Emirates.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Incept Inst Artificial Intelligence, U Arab Emirates.
    Shao, Ling
    Incept Inst Artificial Intelligence, U Arab Emirates.
    Object Counting and Instance Segmentation with Image-level Supervision2019Inngår i: 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), Long Beach, CA, JUN 16-20, 2019, IEEE , 2019, s. 12389-12397Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Common object counting in a natural scene is a challenging problem in computer vision with numerous real-world applications. Existing image-level supervised common object counting approaches only predict the global object count and rely on additional instance-level supervision to also determine object locations. We propose an image-level supervised approach that provides both the global object count and the spatial distribution of object instances by constructing an object category density map. Motivated by psychological studies, we further reduce image-level supervision using a limited object count information (up to four). To the best of our knowledge, we are the first to propose image-level supervised density map estimation for common object counting and demonstrate its effectiveness in image-level supervised instance segmentation. Comprehensive experiments are performed on the PASCAL VOC and COCO datasets. Our approach outperforms existing methods, including those using instance-level supervision, on both datasets for common object counting. Moreover, our approach improves state-of-the-art image-level supervised instance segmentation [34] with a relative gain of 17.8% in terms of average best overlap, on the PASCAL VOC 2012 dataset.

  • 7.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Gladh, Susanna
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Deep motion and appearance cues for visual tracking2019Inngår i: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 124, s. 74-81Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

  • 8.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Swiss Fed Inst Technol, Switzerland.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Swiss Fed Inst Technol, Switzerland.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Incept Inst Artificial Intelligence, U Arab Emirates.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    ATOM: Accurate tracking by overlap maximization2019Inngår i: 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), IEEE, 2019, s. 4655-4664Konferansepaper (Fagfellevurdert)
    Abstract [en]

    While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring highlevel knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating targetspecific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking.

  • 9.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    ECO: Efficient Convolution Operators for Tracking2017Inngår i: Proceedings 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 6931-6939Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and Temple-Color. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method [12] in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.

    Fulltekst (pdf)
    ECO: Efficient Convolution Operators for Tracking
  • 10.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Khan, Fahad
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Accurate Scale Estimation for Robust Visual Tracking2014Inngår i: Proceedings of the British Machine Vision Conference 2014 / [ed] Michel Valstar, Andrew French and Tony Pridmore, BMVA Press , 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Robust scale estimation is a challenging problem in visual object tracking. Most existing methods fail to handle large scale variations in complex image sequences. This paper presents a novel approach for robust scale estimation in a tracking-by-detection framework. The proposed approach works by learning discriminative correlation filters based on a scale pyramid representation. We learn separate filters for translation and scale estimation, and show that this improves the performance compared to an exhaustive scale search. Our scale estimation approach is generic as it can be incorporated into any tracking method with no inherent scale estimation.

    Experiments are performed on 28 benchmark sequences with significant scale variations. Our results show that the proposed approach significantly improves the performance by 18.8 % in median distance precision compared to our baseline. Finally, we provide both quantitative and qualitative comparison of our approach with state-of-the-art trackers in literature. The proposed method is shown to outperform the best existing tracker by 16.6 % in median distance precision, while operating at real-time.

    Fulltekst (pdf)
    fulltext
    Download (pdf)
    Extended Abstract
  • 11.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking2016Inngår i: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2016, s. 1430-1438Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Tracking-by-detection methods have demonstrated competitive performance in recent years. In these approaches, the tracking model heavily relies on the quality of the training set. Due to the limited amount of labeled training data, additional samples need to be extracted and labeled by the tracker itself. This often leads to the inclusion of corrupted training samples, due to occlusions, misalignments and other perturbations. Existing tracking-by-detection methods either ignore this problem, or employ a separate component for managing the training set. We propose a novel generic approach for alleviating the problem of corrupted training samples in tracking-by-detection frameworks. Our approach dynamically manages the training set by estimating the quality of the samples. Contrary to existing approaches, we propose a unified formulation by minimizing a single loss over both the target appearance model and the sample quality weights. The joint formulation enables corrupted samples to be down-weighted while increasing the impact of correct ones. Experiments are performed on three benchmarks: OTB-2015 with 100 videos, VOT-2015 with 60 videos, and Temple-Color with 128 videos. On the OTB-2015, our unified formulation significantly improves the baseline, with a gain of 3.8% in mean overlap precision. Finally, our method achieves state-of-the-art results on all three datasets.

    Fulltekst (pdf)
    Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking
  • 12.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Coloring Channel Representations for Visual Tracking2015Inngår i: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Rasmus R. Paulsen, Kim S. Pedersen, Springer, 2015, Vol. 9127, s. 117-129Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual object tracking is a classical, but still open research problem in computer vision, with many real world applications. The problem is challenging due to several factors, such as illumination variation, occlusions, camera motion and appearance changes. Such problems can be alleviated by constructing robust, discriminative and computationally efficient visual features. Recently, biologically-inspired channel representations \cite{felsberg06PAMI} have shown to provide promising results in many applications ranging from autonomous driving to visual tracking.

    This paper investigates the problem of coloring channel representations for visual tracking. We evaluate two strategies, channel concatenation and channel product, to construct channel coded color representations. The proposed channel coded color representations are generic and can be used beyond tracking.

    Experiments are performed on 41 challenging benchmark videos. Our experiments clearly suggest that a careful selection of color feature together with an optimal fusion strategy, significantly outperforms the standard luminance based channel representation. Finally, we show promising results compared to state-of-the-art tracking methods in the literature.

    Fulltekst (pdf)
    fulltext
  • 13.
    Danelljan, Martin
    et al.
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Häger, Gustav
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Khan, Fahad Shahbaz
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Felsberg, Michael
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Convolutional Features for Correlation Filter Based Visual Tracking2015Inngår i: 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE conference proceedings, 2015, s. 621-629Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual object tracking is a challenging computer vision problem with numerous real-world applications. This paper investigates the impact of convolutional features for the visual tracking problem. We propose to use activations from the convolutional layer of a CNN in discriminative correlation filter based tracking frameworks. These activations have several advantages compared to the standard deep features (fully connected layers). Firstly, they mitigate the need of task specific fine-tuning. Secondly, they contain structural information crucial for the tracking problem. Lastly, these activations have low dimensionality. We perform comprehensive experiments on three benchmark datasets: OTB, ALOV300++ and the recently introduced VOT2015. Surprisingly, different to image classification, our results suggest that activations from the first layer provide superior tracking performance compared to the deeper layers. Our results further show that the convolutional features provide improved results compared to standard handcrafted features. Finally, results comparable to state-of-theart trackers are obtained on all three benchmark datasets.

    Fulltekst (pdf)
    fulltext
  • 14.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Discriminative Scale Space Tracking2017Inngår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 39, nr 8, s. 1561-1575Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5 percent in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50 percent higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.

    Fulltekst (pdf)
    fulltext
  • 15.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Learning Spatially Regularized Correlation Filters for Visual Tracking2015Inngår i: Proceedings of the International Conference in Computer Vision (ICCV), 2015, IEEE Computer Society, 2015, s. 4310-4318Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the inherent lack of training data, a robust approach for constructing a target appearance model is crucial. Recently, discriminatively learned correlation filters (DCF) have been successfully applied to address this problem for tracking. These methods utilize a periodic assumption of the training samples to efficiently learn a classifier on all patches in the target neighborhood. However, the periodic assumption also introduces unwanted boundary effects, which severely degrade the quality of the tracking model.

    We propose Spatially Regularized Discriminative Correlation Filters (SRDCF) for tracking. A spatial regularization component is introduced in the learning to penalize correlation filter coefficients depending on their spatial location. Our SRDCF formulation allows the correlation filters to be learned on a significantly larger set of negative training samples, without corrupting the positive samples. We further propose an optimization strategy, based on the iterative Gauss-Seidel method, for efficient online learning of our SRDCF. Experiments are performed on four benchmark datasets: OTB-2013, ALOV++, OTB-2015, and VOT2014. Our approach achieves state-of-the-art results on all four datasets. On OTB-2013 and OTB-2015, we obtain an absolute gain of 8.0% and 8.2% respectively, in mean overlap precision, compared to the best existing trackers.

    Fulltekst (pdf)
    fulltext
  • 16.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Granström, Karl
    Linköpings universitet, Institutionen för systemteknik, Reglerteknik. Linköpings universitet, Tekniska högskolan.
    Heintz, Fredrik
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska högskolan.
    Rudol, Piotr
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska högskolan.
    Wzorek, Mariusz
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska högskolan.
    Kvarnström, Jonas
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska högskolan.
    Doherty, Patrick
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska högskolan.
    A Low-Level Active Vision Framework for Collaborative Unmanned Aircraft Systems2015Inngår i: COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I / [ed] Lourdes Agapito, Michael M. Bronstein and Carsten Rother, Springer Publishing Company, 2015, Vol. 8925, s. 223-237Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Micro unmanned aerial vehicles are becoming increasingly interesting for aiding and collaborating with human agents in myriads of applications, but in particular they are useful for monitoring inaccessible or dangerous areas. In order to interact with and monitor humans, these systems need robust and real-time computer vision subsystems that allow to detect and follow persons.

    In this work, we propose a low-level active vision framework to accomplish these challenging tasks. Based on the LinkQuad platform, we present a system study that implements the detection and tracking of people under fully autonomous flight conditions, keeping the vehicle within a certain distance of a person. The framework integrates state-of-the-art methods from visual detection and tracking, Bayesian filtering, and AI-based control. The results from our experiments clearly suggest that the proposed framework performs real-time detection and tracking of persons in complex scenarios

    Fulltekst (pdf)
    fulltext
  • 17.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Meneghetti, Giulia
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    A Probabilistic Framework for Color-Based Point Set Registration2016Inngår i: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2016, s. 1818-1826Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In recent years, sensors capable of measuring both color and depth information have become increasingly popular. Despite the abundance of colored point set data, state-of-the-art probabilistic registration techniques ignore the available color information. In this paper, we propose a probabilistic point set registration framework that exploits available color information associated with the points. Our method is based on a model of the joint distribution of 3D-point observations and their color information. The proposed model captures discriminative color information, while being computationally efficient. We derive an EM algorithm for jointly estimating the model parameters and the relative transformations. Comprehensive experiments are performed on the Stanford Lounge dataset, captured by an RGB-D camera, and two point sets captured by a Lidar sensor. Our results demonstrate a significant gain in robustness and accuracy when incorporating color information. On the Stanford Lounge dataset, our approach achieves a relative reduction of the failure rate by 78% compared to the baseline. Furthermore, our proposed model outperforms standard strategies for combining color and 3D-point information, leading to state-of-the-art results.

    Fulltekst (pdf)
    A Probabilistic Framework for Color-Based Point Set Registration
  • 18.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Meneghetti, Giulia
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Aligning the Dissimilar: A Probabilistic Feature-Based Point Set Registration Approach2016Inngår i: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR) 2016, IEEE, 2016, s. 247-252Konferansepaper (Fagfellevurdert)
    Abstract [en]

    3D-point set registration is an active area of research in computer vision. In recent years, probabilistic registration approaches have demonstrated superior performance for many challenging applications. Generally, these probabilistic approaches rely on the spatial distribution of the 3D-points, and only recently color information has been integrated into such a framework, significantly improving registration accuracy. Other than local color information, high-dimensional 3D shape features have been successfully employed in many applications such as action recognition and 3D object recognition. In this paper, we propose a probabilistic framework to integrate high-dimensional 3D shape features with color information for point set registration. The 3D shape features are distinctive and provide complementary information beneficial for robust registration. We validate our proposed framework by performing comprehensive experiments on the challenging Stanford Lounge dataset, acquired by a RGB-D sensor, and an outdoor dataset captured by a Lidar sensor. The results clearly demonstrate that our approach provides superior results both in terms of robustness and accuracy compared to state-of-the-art probabilistic methods.

    Fulltekst (pdf)
    fulltext
  • 19.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Robinson, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking2016Inngår i: Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V / [ed] Bastian Leibe, Jiri Matas, Nicu Sebe and Max Welling, Cham: Springer, 2016, s. 472-488Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate). Additionally, our approach is capable of sub-pixel localization, crucial for the task of accurate feature point tracking. We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments.

    Fulltekst (pdf)
    Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
  • 20.
    Danelljan, Martin
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Shahbaz Khan, Fahad
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    van de Weijer, Joost
    Computer Vision Center, CS Dept. Universitat Autonoma de Barcelona, Spain.
    Adaptive Color Attributes for Real-Time Visual Tracking2014Inngår i: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014, IEEE Computer Society, 2014, s. 1090-1097Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power.

    This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms state-of-the-art tracking methods while running at more than 100 frames per second.

    Fulltekst (pdf)
    fulltext
    Download (zip)
    programvara
  • 21.
    Eldesokey, Abdelrahman
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Confidence Propagation through CNNs for Guided Sparse Depth Regression2020Inngår i: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, Vol. 42, nr 10Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5% of the number of parameters compared to the state-of-the-art methods.

    Fulltekst (pdf)
    fulltext
  • 22.
    Eldesokey, Abdelrahman
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Ellipse Detection for Visual Cyclists Analysis “In the Wild”2017Inngår i: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, Vol. 10424, s. 319-331Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Autonomous driving safety is becoming a paramount issue due to the emergence of many autonomous vehicle prototypes. The safety measures ensure that autonomous vehicles are safe to operate among pedestrians, cyclists and conventional vehicles. While safety measures for pedestrians have been widely studied in literature, little attention has been paid to safety measures for cyclists. Visual cyclists analysis is a challenging problem due to the complex structure and dynamic nature of the cyclists. The dynamic model used for cyclists analysis heavily relies on the wheels. In this paper, we investigate the problem of ellipse detection for visual cyclists analysis in the wild. Our first contribution is the introduction of a new challenging annotated dataset for bicycle wheels, collected in real-world urban environment. Our second contribution is a method that combines reliable arcs selection and grouping strategies for ellipse detection. The reliable selection and grouping mechanism leads to robust ellipse detections when combined with the standard least square ellipse fitting approach. Our experiments clearly demonstrate that our method provides improved results, both in terms of accuracy and robustness in challenging urban environment settings.

    Fulltekst (pdf)
    fulltext
  • 23.
    Eldesokey, Abdelrahman
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Inception Institute of Artificial Intelligence Abu Dhabi, UAE.
    Propagating Confidences through CNNs for Sparse Data Regression2019Inngår i: British Machine Vision Conference 2018, BMVC 2018, BMVA Press , 2019Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In most computer vision applications, convolutional neural networks (CNNs) operate on dense image data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open problem with numerous applications in autonomous driving, robotics, and surveillance. To tackle this challenging problem, we introduce an algebraically-constrained convolution layer for CNNs with sparse input and demonstrate its capabilities for the scene depth completion task. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. Furthermore, we propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. Comprehensive experiments are performed on the KITTI depth benchmark and the results clearly demonstrate that the proposed approach achieves superior performance while requiring three times fewer parameters than the state-of-the-art methods. Moreover, our approach produces a continuous pixel-wise confidence map enabling information fusion, state inference, and decision support.

    Fulltekst (pdf)
    Propagating Confidences through CNNs for Sparse Data Regression
  • 24.
    Felsberg, Michael
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Berg, Amanda
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Termisk Systemteknik AB, Linköping, Sweden.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Ahlberg, Jörgen
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Termisk Systemteknik AB, Linköping, Sweden.
    Kristan, Matej
    University of Ljubljana, Slovenia.
    Matas, Jiri
    Czech Technical University, Czech Republic.
    Leonardis, Ales
    University of Birmingham, United Kingdom.
    Cehovin, Luka
    University of Ljubljana, Slovenia.
    Fernandez, Gustavo
    Austrian Institute of Technology, Austria.
    Vojır, Tomas
    Czech Technical University, Czech Republic.
    Nebehay, Georg
    Austrian Institute of Technology, Austria.
    Pflugfelder, Roman
    Austrian Institute of Technology, Austria.
    Lukezic, Alan
    University of Ljubljana, Slovenia.
    Garcia-Martin8, Alvaro
    Universidad Autonoma de Madrid, Spain.
    Saffari, Amir
    Affectv, United Kingdom.
    Li, Ang
    Xi’an Jiaotong University.
    Solıs Montero, Andres
    University of Ottawa, Canada.
    Zhao, Baojun
    Beijing Institute of Technology, China.
    Schmid, Cordelia
    INRIA Grenoble Rhˆone-Alpes, France.
    Chen, Dapeng
    Xi’an Jiaotong University.
    Du, Dawei
    University at Albany, USA.
    Shahbaz Khan, Fahad
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Porikli, Fatih
    Australian National University, Australia.
    Zhu, Gao
    Australian National University, Australia.
    Zhu, Guibo
    NLPR, Chinese Academy of Sciences, China.
    Lu, Hanqing
    NLPR, Chinese Academy of Sciences, China.
    Kieritz, Hilke
    Fraunhofer IOSB, Germany.
    Li, Hongdong
    Australian National University, Australia.
    Qi, Honggang
    University at Albany, USA.
    Jeong, Jae-chan
    Electronics and Telecommunications Research Institute, Korea.
    Cho, Jae-il
    Electronics and Telecommunications Research Institute, Korea.
    Lee, Jae-Yeong
    Electronics and Telecommunications Research Institute, Korea.
    Zhu, Jianke
    Zhejiang University, China.
    Li, Jiatong
    University of Technology, Australia.
    Feng, Jiayi
    Institute of Automation, Chinese Academy of Sciences, China.
    Wang, Jinqiao
    NLPR, Chinese Academy of Sciences, China.
    Kim, Ji-Wan
    Electronics and Telecommunications Research Institute, Korea.
    Lang, Jochen
    University of Ottawa, Canada.
    Martinez, Jose M.
    Universidad Aut´onoma de Madrid, Spain.
    Xue, Kai
    INRIA Grenoble Rhˆone-Alpes, France.
    Alahari, Karteek
    INRIA Grenoble Rhˆone-Alpes, France.
    Ma, Liang
    Harbin Engineering University, China.
    Ke, Lipeng
    University at Albany, USA.
    Wen, Longyin
    University at Albany, USA.
    Bertinetto, Luca
    Oxford University, United Kingdom.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Arens, Michael
    Fraunhofer IOSB, Germany.
    Tang, Ming
    Institute of Automation, Chinese Academy of Sciences, China.
    Chang, Ming-Ching
    University at Albany, USA.
    Miksik, Ondrej
    Oxford University, United Kingdom.
    Torr, Philip H S
    Oxford University, United Kingdom.
    Martin-Nieto, Rafael
    Universidad Aut´onoma de Madrid, Spain.
    Laganiere, Robert
    University of Ottawa, Canada.
    Hare, Sam
    Obvious Engineering, United Kingdom.
    Lyu, Siwei
    University at Albany, USA.
    Zhu, Song-Chun
    University of California, USA.
    Becker, Stefan
    Fraunhofer IOSB, Germany.
    Hicks, Stephen L
    Oxford University, United Kingdom.
    Golodetz, Stuart
    Oxford University, United Kingdom.
    Choi, Sunglok
    Electronics and Telecommunications Research Institute, Korea.
    Wu, Tianfu
    University of California, USA.
    Hubner, Wolfgang
    Fraunhofer IOSB, Germany.
    Zhao, Xu
    Institute of Automation, Chinese Academy of Sciences, China.
    Hua, Yang
    INRIA Grenoble Rhˆone-Alpes, France.
    Li, Yang
    Zhejiang University, China.
    Lu, Yang
    University of California, USA.
    Li, Yuezun
    University at Albany, USA.
    Yuan, Zejian
    Xi’an Jiaotong University.
    Hong, Zhibin
    University of Technology, Australia.
    The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results2015Inngår i: Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers (IEEE), 2015, s. 639-651Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Thermal Infrared Visual Object Tracking challenge 2015, VOTTIR2015, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply prelearned models of object appearance. VOT-TIR2015 is the first benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2015 challenge is based on the VOT2013 challenge, but introduces the following novelties: (i) the newly collected LTIR (Linköping TIR) dataset is used, (ii) the VOT2013 attributes are adapted to TIR data, (iii) the evaluation is performed using insights gained during VOT2013 and VOT2014 and is similar to VOT2015.

    Fulltekst (pdf)
    fulltext
  • 25.
    Felsberg, Michael
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Kristan, Matej
    University of Ljubljana, Slovenia.
    Matas, Jiri
    Czech Technical University, Czech Republic.
    Leonardis, Ales
    University of Birmingham, England.
    Pflugfelder, Roman
    Austrian Institute Technology, Austria.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Berg, Amanda
    Linköpings universitet, Tekniska fakulteten. Linköpings universitet, Institutionen för systemteknik, Datorseende. Termisk Syst Tekn AB, Linkoping, Sweden.
    Eldesokey, Abdelrahman
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Ahlberg, Jörgen
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Termisk Syst Tekn AB, Linkoping, Sweden.
    Cehovin, Luka
    University of Ljubljana, Slovenia.
    Vojir, Tomas
    Czech Technical University, Czech Republic.
    Lukezic, Alan
    University of Ljubljana, Slovenia.
    Fernandez, Gustavo
    Austrian Institute Technology, Austria.
    Petrosino, Alfredo
    Parthenope University of Naples, Italy.
    Garcia-Martin, Alvaro
    University of Autonoma Madrid, Spain.
    Solis Montero, Andres
    University of Ottawa, Canada.
    Varfolomieiev, Anton
    Kyiv Polytech Institute, Ukraine.
    Erdem, Aykut
    Hacettepe University, Turkey.
    Han, Bohyung
    POSTECH, South Korea.
    Chang, Chang-Ming
    University of Albany, GA USA.
    Du, Dawei
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Erdem, Erkut
    Hacettepe University, Turkey.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Porikli, Fatih
    ARC Centre Excellence Robot Vis, Australia; CSIRO, Australia.
    Zhao, Fei
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Bunyak, Filiz
    University of Missouri, MO 65211 USA.
    Battistone, Francesco
    Parthenope University of Naples, Italy.
    Zhu, Gao
    University of Missouri, Columbia, USA.
    Seetharaman, Guna
    US Navy, DC 20375 USA.
    Li, Hongdong
    ARC Centre Excellence Robot Vis, Australia.
    Qi, Honggang
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Bischof, Horst
    Graz University of Technology, Austria.
    Possegger, Horst
    Graz University of Technology, Austria.
    Nam, Hyeonseob
    NAVER Corp, South Korea.
    Valmadre, Jack
    University of Oxford, England.
    Zhu, Jianke
    Zhejiang University, Peoples R China.
    Feng, Jiayi
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Lang, Jochen
    University of Ottawa, Canada.
    Martinez, Jose M.
    University of Autonoma Madrid, Spain.
    Palaniappan, Kannappan
    University of Missouri, MO 65211 USA.
    Lebeda, Karel
    University of Surrey, England.
    Gao, Ke
    University of Missouri, MO 65211 USA.
    Mikolajczyk, Krystian
    Imperial Coll London, England.
    Wen, Longyin
    University of Albany, GA USA.
    Bertinetto, Luca
    University of Oxford, England.
    Poostchi, Mahdieh
    University of Missouri, MO 65211 USA.
    Maresca, Mario
    Parthenope University of Naples, Italy.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Arens, Michael
    Fraunhofer IOSB, Germany.
    Tang, Ming
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Baek, Mooyeol
    POSTECH, South Korea.
    Fan, Nana
    Harbin Institute Technology, Peoples R China.
    Al-Shakarji, Noor
    University of Missouri, MO 65211 USA.
    Miksik, Ondrej
    University of Oxford, England.
    Akin, Osman
    Hacettepe University, Turkey.
    Torr, Philip H. S.
    University of Oxford, England.
    Huang, Qingming
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Martin-Nieto, Rafael
    University of Autonoma Madrid, Spain.
    Pelapur, Rengarajan
    University of Missouri, MO 65211 USA.
    Bowden, Richard
    University of Surrey, England.
    Laganiere, Robert
    University of Ottawa, Canada.
    Krah, Sebastian B.
    Fraunhofer IOSB, Germany.
    Li, Shengkun
    University of Albany, GA USA.
    Yao, Shizeng
    University of Missouri, MO 65211 USA.
    Hadfield, Simon
    University of Surrey, England.
    Lyu, Siwei
    University of Albany, GA USA.
    Becker, Stefan
    Fraunhofer IOSB, Germany.
    Golodetz, Stuart
    University of Oxford, England.
    Hu, Tao
    Australian National University, Australia; Chinese Academic Science, Peoples R China.
    Mauthner, Thomas
    Graz University of Technology, Austria.
    Santopietro, Vincenzo
    Parthenope University of Naples, Italy.
    Li, Wenbo
    Lehigh University, PA 18015 USA.
    Huebner, Wolfgang
    Fraunhofer IOSB, Germany.
    Li, Xin
    Harbin Institute Technology, Peoples R China.
    Li, Yang
    Zhejiang University, Peoples R China.
    Xu, Zhan
    Zhejiang University, Peoples R China.
    He, Zhenyu
    Harbin Institute Technology, Peoples R China.
    The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results2016Inngår i: Computer Vision – ECCV 2016 Workshops. ECCV 2016. / [ed] Hua G., Jégou H., SPRINGER INT PUBLISHING AG , 2016, s. 824-849Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Thermal Infrared Visual Object Tracking challenge 2016, VOT-TIR2016, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2016 is the second benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2016 challenge is similar to the 2015 challenge, the main difference is the introduction of new, more difficult sequences into the dataset. Furthermore, VOT-TIR2016 evaluation adopted the improvements regarding overlap calculation in VOT2016. Compared to VOT-TIR2015, a significant general improvement of results has been observed, which partly compensate for the more difficult sequences. The dataset, the evaluation kit, as well as the results are publicly available at the challenge website.

    Fulltekst (pdf)
    fulltext
  • 26.
    Gladh, Susanna
    et al.
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Danelljan, Martin
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Khan, Fahad Shahbaz
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Felsberg, Michael
    Linköpings universitet, Tekniska högskolan. Linköpings universitet, Institutionen för systemteknik, Datorseende.
    Deep motion features for visual tracking2016Inngår i: Proceedings of the 23rd International Conference on, Pattern Recognition (ICPR), 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, s. 1243-1248Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

    Fulltekst (pdf)
    fulltext
  • 27.
    Grelsson, Bertil
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Robinson, Andreas
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    HorizonNet for visual terrain navigation2018Inngår i: Proceedings of 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), Institute of Electrical and Electronics Engineers (IEEE), 2018, s. 149-155Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper investigates the problem of position estimation of unmanned surface vessels (USVs) operating in coastal areas or in the archipelago. We propose a position estimation method where the horizon line is extracted in a 360 degree panoramic image around the USV. We design a CNN architecture to determine an approximate horizon line in the image and implicitly determine the camera orientation (the pitch and roll angles). The panoramic image is warped to compensate for the camera orientation and to generate an image from an approximately level camera. A second CNN architecture is designed to extract the pixelwise horizon line in the warped image. The extracted horizon line is correlated with digital elevation model (DEM) data in the Fourier domain using a MOSSE correlation filter. Finally, we determine the location of the maximum correlation score over the search area to estimate the position of the USV. Comprehensive experiments are performed in a field trial in the archipelago. Our approach provides promising results by achieving position estimates with GPS-level accuracy.

    Fulltekst (pdf)
    fulltext
  • 28.
    Häger, Gustav
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Rudol, Piotr
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.
    Doherty, Patrick
    Linköpings universitet, Institutionen för datavetenskap, Artificiell intelligens och integrerade datorsystem. Linköpings universitet, Tekniska fakulteten.
    Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV2016Inngår i: Proceedings of the 12th International Symposium on Advances in Visual Computing, Springer, 2016Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual object tracking performance has improved significantly in recent years. Most trackers are based on either of two paradigms: online learning of an appearance model or the use of a pre-trained object detector. Methods based on online learning provide high accuracy, but are prone to model drift. The model drift occurs when the tracker fails to correctly estimate the tracked object’s position. Methods based on a detector on the other hand typically have good long-term robustness, but reduced accuracy compared to online methods.

    Despite the complementarity of the aforementioned approaches, the problem of fusing them into a single framework is largely unexplored. In this paper, we propose a novel fusion between an online tracker and a pre-trained detector for tracking humans from a UAV. The system operates at real-time on a UAV platform. In addition we present a novel dataset for long-term tracking in a UAV setting, that includes scenarios that are typically not well represented in standard visual tracking datasets.

    Fulltekst (pdf)
    fulltext
  • 29.
    Häger, Gustav
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Countering bias in tracking evaluations2018Inngår i: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications / [ed] Francisco Imai, Alain Tremeau and Jose Braz, Science and Technology Publications, Lda , 2018, Vol. 5, s. 581-587Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerfulfeatures, sophisticated learning methods and the introduction of benchmark datasets. Despite this significantimprovement, the evaluation of state-of-the-art object trackers still relies on the classical intersection overunion (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score aresub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of largetarget objects and favors over-estimated target prediction sizes. As our second contribution, we propose a newscore that is unbiased with respect to target prediction size. We systematically evaluate our proposed approachon benchmark tracking data with variations in relative target size. Our empirical results clearly suggest thatthe proposed score is unbiased in general.

    Fulltekst (pdf)
    Countering bias in tracking evaluations
  • 30.
    Johnander, Joakim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    On the Optimization of Advanced DCF-Trackers2019Inngår i: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018, Proceedings, Part I / [ed] Laura Leal-TaixéStefan Roth, Cham: Springer Publishing Company, 2019, s. 54-69Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Trackers based on discriminative correlation filters (DCF) have recently seen widespread success and in this work we dive into their numerical core. DCF-based trackers interleave learning of the target detector and target state inference based on this detector. Whereas the original formulation includes a closed-form solution for the filter learning, recently introduced improvements to the framework no longer have known closed-form solutions. Instead a large-scale linear least squares problem must be solved each time the detector is updated. We analyze the procedure used to optimize the detector and let the popular scheme introduced with ECO serve as a baseline. The ECO implementation is revisited in detail and several mechanisms are provided with alternatives. With comprehensive experiments we show which configurations are superior in terms of tracking capabilities and optimization performance.

    Fulltekst (pdf)
    On the Optimization of Advanced DCF-Trackers
  • 31.
    Johnander, Joakim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Zenuity, Sweden.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. ETH Zurich, Switzerland.
    Brissman, Emil
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Saab, Sweden.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. IIAI, UAE.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    A generative appearance model for end-to-end video object segmentation2019Inngår i: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2019, s. 8945-8954Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance. The best performing approaches resort to extensive fine-tuning of a convolutional neural network for this purpose. Besides being prohibitively expensive, this strategy cannot be truly trained end-to-end since the online fine-tuning procedure is not integrated into the offline training of the network. To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. The introduced appearance module learns a probabilistic generative model of target and background feature distributions. Given a new image, it predicts the posterior class probabilities, providing a highly discriminative cue, which is processed in later network modules. Both the learning and prediction stages of our appearance module are fully differentiable, enabling true end-to-end training of the entire segmentation pipeline. Comprehensive experiments demonstrate the effectiveness of the proposed approach on three video object segmentation benchmarks. We close the gap to approaches based on online fine-tuning on DAVIS17, while operating at 15 FPS on a single GPU. Furthermore, our method outperforms all published approaches on the large-scale YouTube-VOS dataset.

    Fulltekst (pdf)
    fulltext
  • 32.
    Johnander, Joakim
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    DCCO: Towards Deformable Continuous Convolution Operators for Visual Tracking2017Inngår i: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, Vol. 10424, s. 55-67Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Discriminative Correlation Filter (DCF) based methods have shown competitive performance on tracking benchmarks in recent years. Generally, DCF based trackers learn a rigid appearance model of the target. However, this reliance on a single rigid appearance model is insufficient in situations where the target undergoes non-rigid transformations. In this paper, we propose a unified formulation for learning a deformable convolution filter. In our framework, the deformable filter is represented as a linear combination of sub-filters. Both the sub-filter coefficients and their relative locations are inferred jointly in our formulation. Experiments are performed on three challenging tracking benchmarks: OTB-2015, TempleColor and VOT2016. Our approach improves the baseline method, leading to performance comparable to state-of-the-art.

    Fulltekst (pdf)
    fulltext
  • 33.
    Joseph, KJ
    et al.
    Indian Institute of Technology Hyderabad, India.
    Khan, Salman
    Mohamed bin Zayed University of AI, UAE, Australian National University, Australia.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed bin Zayed University of AI, UAE.
    Balasubramanian, Vineeth N
    Indian Institute of Technology Hyderabad, India.
    Towards Open World Object Detection2021Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosityabout these unknown instances aids in learning about them,when the corresponding knowledge is eventually available.This motivates us to propose a novel computer vision problem called: ‘Open World Object Detection’, where a modelis tasked to: 1) identify objects that have not been introduced to it as ‘unknown’, without explicit supervision to doso, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, whenthe corresponding labels are progressively received. Weformulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE:Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyse the efficacyof ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterisingunknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-ofthe-art performance, with no extra methodological effort.We hope that our work will attract further research into thisnewly identified, yet crucial research direction.

  • 34.
    Joseph, KJ
    et al.
    Indian Institute of Technology Hyderabad, India.
    Khan, Salman
    Mohamed bin Zayed University of AI, UAE, Australian National University, Australia.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed bin Zayed University of AI, UAE.
    Balasubramanian, Vineeth N
    Indian Institute of Technology Hyderabad, India.
    Towards Open World Object Detection2021Annet (Annet vitenskapelig)
    Abstract [en]

    Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosityabout these unknown instances aids in learning about them,when the corresponding knowledge is eventually available.This motivates us to propose a novel computer vision problem called: ‘Open World Object Detection’, where a modelis tasked to: 1) identify objects that have not been introduced to it as ‘unknown’, without explicit supervision to doso, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, whenthe corresponding labels are progressively received. Weformulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE:Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyse the efficacyof ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterisingunknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-ofthe-art performance, with no extra methodological effort.We hope that our work will attract further research into thisnewly identified, yet crucial research direction.

  • 35.
    Järemo Lawin, Felix
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Forssén, Per-Erik
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Density Adaptive Point Set Registration2018Inngår i: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2018, s. 3829-3837Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets.    We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling.

    Fulltekst (pdf)
    Density Adaptive Point Set Registration
  • 36.
    Järemo-Lawin, Felix
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Tosteberg, Patrik
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Deep Projective 3D Semantic Segmentation2017Inngår i: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part I / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, s. 95-107Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets.

    Fulltekst (pdf)
    Deep Projective 3D Semantic Segmentation
  • 37.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Anwer, Rao Muhammad
    Universitat Autonoma de Barcelona, Spain.
    van de Weijer, Joost
    Universitat Autonoma de Barcelona, Spain.
    Bagdanov, Andrew D.
    Universitat Autonoma de Barcelona, Spain.
    Vanrell, Maria
    Universitat Autonoma de Barcelona, Spain.
    Lopez, Antonio M.
    Universitat Autonoma de Barcelona, Spain.
    Color Attributes for Object Detection2012Inngår i: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, IEEE , 2012, s. 3306-3313Konferansepaper (Fagfellevurdert)
    Abstract [en]

    State-of-the-art object detectors typically use shape information as a low level feature representation to capture the local structure of an object. This paper shows that early fusion of shape and color, as is popular in image classification, leads to a significant drop in performance for object detection. Moreover, such approaches also yields suboptimal results for object categories with varying importance of color and shape. In this paper we propose the use of color attributes as an explicit color representation for object detection. Color attributes are compact, computationally efficient, and when combined with traditional shape features provide state-of-the-art results for object detection. Our method is tested on the PASCAL VOC 2007 and 2009 datasets and results clearly show that our method improves over state-of-the-art techniques despite its simplicity. We also introduce a new dataset consisting of cartoon character images in which color plays a pivotal role. On this dataset, our approach yields a significant gain of 14% in mean AP over conventional state-of-the-art methods.

  • 38.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Beigpour, Shida
    Norwegian Colour and Visual Computing Laboratory, Gjovik University College, Gjøvik, Norway.
    van de Weijer, Joost
    Computer Vision Center, CS Dept. Universitat Autonoma de Barcelona, Spain.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Painting-91: a large scale database for computational painting categorization2014Inngår i: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 25, nr 6, s. 1385-1397Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Computer analysis of visual art, especially paintings, is an interesting cross-disciplinary research domain. Most of the research in the analysis of paintings involve medium to small range datasets with own specific settings. Interestingly, significant progress has been made in the field of object and scene recognition lately. A key factor in this success is the introduction and availability of benchmark datasets for evaluation. Surprisingly, such a benchmark setup is still missing in the area of computational painting categorization. In this work, we propose a novel large scale dataset of digital paintings. The dataset consists of paintings from 91 different painters. We further show three applications of our dataset namely: artist categorization, style classification and saliency detection. We investigate how local and global features popular in image classification perform for the tasks of artist and style categorization. For both categorization tasks, our experimental results suggest that combining multiple features significantly improves the final performance. We show that state-of-the-art computer vision methods can correctly classify 50 % of unseen paintings to its painter in a large dataset and correctly attribute its artistic style in over 60 % of the cases. Additionally, we explore the task of saliency detection on paintings and show experimental findings using state-of-the-art saliency estimation algorithms.

    Fulltekst (pdf)
    fulltext
  • 39.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Muhammad Anwer, Rao
    Department of Information and Computer Science, Aalto University School of Science, Finland.
    van de Weijer, Joost
    Computer Vision Center, CS Dept. Universitat Autonoma de Barcelona, Spain.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV. Linköpings universitet, Tekniska högskolan.
    Laaksonen, Jorma
    Department of Information and Computer Science, Aalto University School of Science, Finland.
    Compact color–texture description for texture classification2015Inngår i: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 51, s. 16-22Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Describing textures is a challenging problem in computer vision and pattern recognition. The classification problem involves assigning a category label to the texture class it belongs to. Several factors such as variations in scale, illumination and viewpoint make the problem of texture description extremely challenging. A variety of histogram based texture representations exists in literature. However, combining multiple texture descriptors and assessing their complementarity is still an open research problem. In this paper, we first show that combining multiple local texture descriptors significantly improves the recognition performance compared to using a single best method alone. This gain in performance is achieved at the cost of high-dimensional final image representation. To counter this problem, we propose to use an information-theoretic compression technique to obtain a compact texture description without any significant loss in accuracy. In addition, we perform a comprehensive evaluation of pure color descriptors, popular in object recognition, for the problem of texture classification. Experiments are performed on four challenging texture datasets namely, KTH-TIPS-2a, KTH-TIPS-2b, FMD and Texture-10. The experiments clearly demonstrate that our proposed compact multi-texture approach outperforms the single best texture method alone. In all cases, discriminative color names outperforms other color features for texture classification. Finally, we show that combining discriminative color names with compact texture representation outperforms state-of-the-art methods by 7.8%,4.3%7.8%,4.3% and 5.0%5.0% on KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets respectively.

  • 40.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Rao, Muhammad Anwer
    Computer vision Center Barcelona, Universitat Autonoma de Barcelona, Spain.
    van de Weijer, Joost
    Computer vision Center Barcelona, Universitat Autonoma de Barcelona, Spain.
    Bagdanov, Andrew
    Media Integration and Communication Center, University of Florence, Florence, Italy.
    Lopez, Antonio
    Computer vision Center Barcelona, Universitat Autonoma de Barcelona, Spain.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Coloring Action Recognition in Still Images2013Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 105, nr 3, s. 205-221Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color–shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

    Fulltekst (pdf)
    fulltext
  • 41.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Rao, Muhammad Anwer
    Department of Information and Computer Science, Aalto University School of Science, Aalto, Finland.
    van de Weijer, Joost
    Computer Vision Center, CS Department, Universitet Autonoma de Barcelona, Barcelona, Spain.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Laaksonen, Jorma
    Department of Information and Computer Science, Aalto University School of Science, Aalto, Finland.
    Deep Semantic Pyramids for Human Attributes and Action Recognition2015Inngår i: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer, 2015, Vol. 9127, s. 341-353Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features.

    We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of 17.2%, 13.9%, 24.3% and 22.6% compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.

  • 42.
    Khan, Fahad Shahbaz
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Van de Weijer, Joost
    Universitat Autonoma de Barcelona, Spain .
    Ali, Sadiq
    Universitat Autonoma de Barcelona, Spain .
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Evaluating the Impact of Color on Texture Recognition2013Inngår i: Computer Analysis of Images and Patterns: 15th International Conference, CAIP 2013, York, UK, August 27-29, 2013, Proceedings, Part I / [ed] Richard Wilson, Edwin Hancock, Adrian Bors, William Smith, Springer Berlin/Heidelberg, 2013, s. 154-162Konferansepaper (Fagfellevurdert)
    Abstract [en]

    State-of-the-art texture descriptors typically operate on grey scale images while ignoring color information. A common way to obtain a joint color-texture representation is to combine the two visual cues at the pixel level. However, such an approach provides sub-optimal results for texture categorisation task.

    In this paper we investigate how to optimally exploit color information for texture recognition. We evaluate a variety of color descriptors, popular in image classification, for texture categorisation. In addition we analyze different fusion approaches to combine color and texture cues. Experiments are conducted on the challenging scenes and 10 class texture datasets. Our experiments clearly suggest that in all cases color names provide the best performance. Late fusion is the best strategy to combine color and texture. By selecting the best color descriptor with optimal fusion strategy provides a gain of 5% to 8% compared to texture alone on scenes and texture datasets.

    Fulltekst (pdf)
    fulltext
  • 43.
    Khan, Fahad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Van, De Weijer J.
    Computer Vision Center, CS Department, Universitat Autonoma de Barcelona, Spain.
    Bagdanov, A.D.
    Computer Vision Center, CS Department, Universitat Autonoma de Barcelona, Spain.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan. Linköpings universitet, Centrum för medicinsk bildvetenskap och visualisering, CMIV.
    Scale coding bag-of-words for action recognition2014Inngår i: Pattern Recognition (ICPR), 2014 22nd International Conference on, Institute of Electrical and Electronics Engineers Inc. , 2014, nr 6976979, s. 1514-1519Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recognizing human actions in still images is a challenging problem in computer vision due to significant amount of scale, illumination and pose variation. Given the bounding box of a person both at training and test time, the task is to classify the action associated with each bounding box in an image. Most state-of-the-art methods use the bag-of-words paradigm for action recognition. The bag-of-words framework employing a dense multi-scale grid sampling strategy is the de facto standard for feature detection. This results in a scale invariant image representation where all the features at multiple-scales are binned in a single histogram. We argue that such a scale invariant strategy is sub-optimal since it ignores the multi-scale information available with each bounding box of a person. This paper investigates alternative approaches to scale coding for action recognition in still images. We encode multi-scale information explicitly in three different histograms for small, medium and large scale visual-words. Our first approach exploits multi-scale information with respect to the image size. In our second approach, we encode multi-scale information relative to the size of the bounding box of a person instance. In each approach, the multi-scale histograms are then concatenated into a single representation for action classification. We validate our approaches on the Willow dataset which contains seven action categories: interacting with computer, photography, playing music, riding bike, riding horse, running and walking. Our results clearly suggest that the proposed scale coding approaches outperform the conventional scale invariant technique. Moreover, we show that our approach obtains promising results compared to more complex state-of-the-art methods.

    Fulltekst (pdf)
    fulltext
  • 44.
    Khan, Fahad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    van de Weijer, Joost
    Comp Vis Centre, Spain .
    Muhammad Anwer, Rao
    Aalto University, Finland .
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Gatta, Carlo
    Comp Vis Centre, Spain .
    Semantic Pyramids for Gender and Action Recognition2014Inngår i: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 23, nr 8, s. 3633-3645Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.

    Fulltekst (pdf)
    fulltext
  • 45.
    Khan, Fahad
    et al.
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Xu, Jiaolong
    Comp Vis Centre Barcelona, Spain.
    van de Weijer, Joost
    Comp Vis Centre Barcelona, Spain.
    Bagdanov, Andrew D.
    Comp Vis Centre Barcelona, Spain.
    Muhammad Anwer, Rao
    Aalto University, Finland.
    Lopez, Antonio M.
    Comp Vis Centre Barcelona, Spain.
    Recognizing Actions Through Action-Specific Person Detection2015Inngår i: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 24, nr 11, s. 4422-4432Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

  • 46.
    Khan, Rahat
    et al.
    Université de Saint- Étienne, France.
    Van de Weijer, Joost
    Computer Vision Center, Barcelona, Spain.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska högskolan.
    Muselet, Damien
    Université de Saint- Étienne, France.
    Ducottet, Christophe
    Université de Saint- Étienne, France.
    Barat, Cecile
    Université de Saint- Étienne, France.
    Discriminative Color Descriptors2013Inngår i: Computer Vision and Pattern Recognition (CVPR), 2013, IEEE Computer Society, 2013, s. 2866-2873Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Color description is a challenging task because of large variations in RGB values which occur due to scene accidental events, such as shadows, shading, specularities, illuminant color changes, and changes in viewing geometry. Traditionally, this challenge has been addressed by capturing the variations in physics-based models, and deriving invariants for the undesired variations. The drawback of this approach is that sets of distinguishable colors in the original color space are mapped to the same value in the photometric invariant space. This results in a drop of discriminative power of the color description. In this paper we take an information theoretic approach to color description. We cluster color values together based on their discriminative power in a classification problem. The clustering has the explicit objective to minimize the drop of mutual information of the final representation. We show that such a color description automatically learns a certain degree of photometric invariance. We also show that a universal color representation, which is based on other data sets than the one at hand, can obtain competing performance. Experiments show that the proposed descriptor outperforms existing photometric invariants. Furthermore, we show that combined with shape description these color descriptors obtain excellent results on four challenging datasets, namely, PASCAL VOC 2007, Flowers-102, Stanford dogs-120 and Birds-200.

  • 47.
    Kristan, Matej
    et al.
    University of Ljubljana, Slovenia.
    Leonardis, Aleš
    University of Birmingham, United Kingdom.
    Matas, Jirí
    Czech Technical University, Czech Republic.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Pflugfelder, Roman
    Austrian Institute of Technology, Austria / TU Wien, Austria.
    Zajc, Luka Cehovin
    University of Ljubljana, Slovenia.
    Vojírì, Tomáš
    Czech Technical University, Czech Republic.
    Bhat, Goutam
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Lukezič, Alan
    University of Ljubljana, Slovenia.
    Eldesokey, Abdelrahman
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Fernández, Gustavo
    García-Martín, Álvaro
    Iglesias-Arias, Álvaro
    Alatan, A. Aydin
    González-García, Abel
    Petrosino, Alfredo
    Memarmoghadam, Alireza
    Vedaldi, Andrea
    Muhič, Andrej
    He, Anfeng
    Smeulders, Arnold
    Perera, Asanka G.
    Li, Bo
    Chen, Boyu
    Kim, Changick
    Xu, Changsheng
    Xiong, Changzhen
    Tian, Cheng
    Luo, Chong
    Sun, Chong
    Hao, Cong
    Kim, Daijin
    Mishra, Deepak
    Chen, Deming
    Wang, Dong
    Wee, Dongyoon
    Gavves, Efstratios
    Gundogdu, Erhan
    Velasco-Salido, Erik
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Yang, Fan
    Zhao, Fei
    Li, Feng
    Battistone, Francesco
    De Ath, George
    Subrahmanyam, Gorthi R. K. S.
    Bastos, Guilherme
    Ling, Haibin
    Galoogahi, Hamed Kiani
    Lee, Hankyeol
    Li, Haojie
    Zhao, Haojie
    Fan, Heng
    Zhang, Honggang
    Possegger, Horst
    Li, Houqiang
    Lu, Huchuan
    Zhi, Hui
    Li, Huiyun
    Lee, Hyemin
    Chang, Hyung Jin
    Drummond, Isabela
    Valmadre, Jack
    Martin, Jaime Spencer
    Chahl, Javaan
    Choi, Jin Young
    Li, Jing
    Wang, Jinqiao
    Qi, Jinqing
    Sung, Jinyoung
    Johnander, Joakim
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Henriques, Joao
    Choi, Jongwon
    van de Weijer, Joost
    Herranz, Jorge Rodríguez
    Martínez, José M.
    Kittler, Josef
    Zhuang, Junfei
    Gao, Junyu
    Grm, Klemen
    Zhang, Lichao
    Wang, Lijun
    Yang, Lingxiao
    Rout, Litu
    Si, Liu
    Bertinetto, Luca
    Chu, Lutao
    Che, Manqiang
    Maresca, Mario Edoardo
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Yang, Ming-Hsuan
    Abdelpakey, Mohamed
    Shehata, Mohamed
    Kang, Myunggu
    Lee, Namhoon
    Wang, Ning
    Miksik, Ondrej
    Moallem, P.
    Vicente-Moñivar, Pablo
    Senna, Pedro
    Li, Peixia
    Torr, Philip
    Raju, Priya Mariam
    Ruihe, Qian
    Wang, Qiang
    Zhou, Qin
    Guo, Qing
    Martín-Nieto, Rafael
    Gorthi, Rama Krishna
    Tao, Ran
    Bowden, Richard
    Everson, Richard
    Wang, Runling
    Yun, Sangdoo
    Choi, Seokeon
    Vivas, Sergio
    Bai, Shuai
    Huang, Shuangping
    Wu, Sihang
    Hadfield, Simon
    Wang, Siwen
    Golodetz, Stuart
    Ming, Tang
    Xu, Tianyang
    Zhang, Tianzhu
    Fischer, Tobias
    Santopietro, Vincenzo
    Štruc, Vitomir
    Wei, Wang
    Zuo, Wangmeng
    Feng, Wei
    Wu, Wei
    Zou, Wei
    Hu, Weiming
    Zhou, Wengang
    Zeng, Wenjun
    Zhang, Xiaofan
    Wu, Xiaohe
    Wu, Xiao-Jun
    Tian, Xinmei
    Li, Yan
    Lu, Yan
    Law, Yee Wei
    Wu, Yi
    Demiris, Yiannis
    Yang, Yicai
    Jiao, Yifan
    Li, Yuhong
    Zhang, Yunhua
    Sun, Yuxuan
    Zhang, Zheng
    Zhu, Zheng
    Feng, Zhen-Hua
    Wang, Zhihui
    He, Zhiqun
    The Sixth Visual Object Tracking VOT2018 Challenge Results2019Inngår i: Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8–14, 2018 Proceedings, Part I / [ed] Laura Leal-Taixé and Stefan Roth, Cham: Springer Publishing Company, 2019, s. 3-53Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

    Fulltekst (pdf)
    The Sixth Visual Object Tracking VOT2018 Challenge Results
  • 48.
    Kristan, Matej
    et al.
    University of Ljubljana, Ljubljana, Slovenia.
    Pflugfelder, Roman P.
    Austrian Institute of Technology, Vienna, Austria.
    Leonardis, Ales
    University of Birmingham, Birmingham, UK.
    Matas, Jiri
    Czech Technical University, Prague, Czech Republic.
    Cehovin, Luka
    University of Ljubljana, Ljubljana, Slovenia.
    Nebehay, Georg
    Austrian Institute of Technology, Vienna, Austria.
    Vojir, Tomas
    Czech Technical University, Prague, Czech Republic.
    Fernandez, Gustavo
    Austrian Institute of Technology, Vienna, Austria.
    Lukezi, Alan
    University of Ljubljana, Ljubljana, Slovenia.
    Dimitriev, Aleksandar
    University of Ljubljana, Ljubljana, Slovenia.
    Petrosino, Alfredo
    Parthenope University of Naples, Naples, Italy.
    Saffari, Amir
    Affectv Limited, London, UK.
    Li, Bo
    Panasonic R&D Center, Singapore, Singapore.
    Han, Bohyung
    POSTECH, Pohang, Korea.
    Heng, CherKeng
    Panasonic R&D Center, Singapore, Singapore.
    Garcia, Christophe
    LIRIS, Lyon, France.
    Pangersic, Dominik
    University of Ljubljana, Ljubljana, Slovenia.
    Häger, Gustav
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Oven, Franci
    University of Ljubljana, Ljubljana, Slovenia.
    Possegger, Horst
    Graz University of Technology, Graz, Austria.
    Bischof, Horst
    Graz University of Technology, Graz, Austria.
    Nam, Hyeonseob
    POSTECH, Pohang, Korea.
    Zhu, Jianke
    Zhejiang University, Hangzhou, China.
    Li, JiJia
    Shanghai Jiao Tong University, Shanghai, China.
    Choi, Jin Young
    ASRI Seoul National University, Gwanak, Korea.
    Choi, Jin-Woo
    Electronics and Telecommunications Research Institute, Daejeon, Korea.
    Henriques, Joao F.
    University of Coimbra, Coimbra, Portugal.
    van de Weijer, Joost
    Universitat Autonoma de Barcelona, Barcelona, Spain.
    Batista, Jorge
    University of Coimbra, Coimbra, Portugal.
    Lebeda, Karel
    University of Surrey, Surrey, UK.
    Ofjall, Kristoffer
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Yi, Kwang Moo
    EPFL CVLab, Lausanne, Switzerland.
    Qin, Lei
    ICT CAS, Beijing, China.
    Wen, Longyin
    Chinese Academy of Sciences, Beijing, China.
    Maresca, Mario Edoardo
    Parthenope University of Naples, Naples, Italy.
    Danelljan, Martin
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Cheng, Ming-Ming
    University of Oxford, Oxford, UK.
    Torr, Philip
    University of Oxford, Oxford, UK.
    Huang, Qingming
    Harbin Institute of Technology, Harbin, China.
    Bowden, Richard
    University of Surrey, Surrey, UK.
    Hare, Sam
    Obvious Engineering Limited, London, UK.
    YueYing Lim, Samantha
    Panasonic R&D Center, Singapore, Singapore.
    Hong, Seunghoon
    POSTECH, Pohang, Korea.
    Liao, Shengcai
    Chinese Academy of Sciences, Beijing, China.
    Hadfield, Simon
    University of Surrey, Surrey, UK.
    Li, Stan Z.
    Chinese Academy of Sciences, Beijing, China.
    Duffner, Stefan
    LIRIS, Lyon, France.
    Golodetz, Stuart
    University of Oxford, Oxford, UK.
    Mauthner, Thomas
    Graz University of Technology, Graz, Austria.
    Vineet, Vibhav
    University of Oxford, Oxford, UK.
    Lin, Weiyao
    Shanghai Jiao Tong University, Shanghai, China.
    Li, Yang
    Zhejiang University, Hangzhou, China.
    Qi, Yuankai
    Harbin Institute of Technology, Harbin, China.
    Lei, Zhen
    Chinese Academy of Sciences, Beijing, China.
    Niu, ZhiHeng
    Panasonic R&D Center, Singapore, Singapore.
    The Visual Object Tracking VOT2014 Challenge Results2015Inngår i: COMPUTER VISION - ECCV 2014 WORKSHOPS, PT II, Springer, 2015, Vol. 8926, s. 191-217Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2014 challenge that go beyond its VOT2013 predecessor are introduced: (i) a new VOT2014 dataset with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2013 evaluation methodology, (iii) a new unit for tracking speed assessment less dependent on the hardware and (iv) the VOT2014 evaluation toolkit that significantly speeds up execution of experiments. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://​votchallenge.​net).

    Fulltekst (pdf)
    fulltext
  • 49.
    Kristanl, Matej
    et al.
    Univ Ljubljana, Slovenia.
    Matas, Jiri
    Czech Tech Univ, Czech Republic.
    Leonardis, Ales
    Univ Birmingham, England.
    Felsberg, Michael
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Pflugfelder, Roman
    Austrian Acad Sci, Austria; TU Wien, Austria.
    Kamarainen, Joni-Kristian
    Tampere Univ, Finland.
    Zajc, Luka Cehovin
    Univ Ljubljana, Slovenia.
    Drbohlav, Ondrej
    Czech Tech Univ, Czech Republic.
    Lukezic, Alan
    Univ Ljubljana, Slovenia.
    Berg, Amanda
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Termisk Systemtekn AB, Sweden.
    Eldesokey, Abdelrahman
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.
    Kapyla, Jani
    Tampere Univ, Finland.
    Fernandez, Gustavo
    Austrian Acad Sci, Austria.
    Gonzalez-Garcia, Abel
    Comp Vis Ctr, Spain.
    Memarrnoghadam, Alireza
    Univ Isfahan, Iran.
    Lu, Andong
    Anhui Univ, Peoples R China.
    He, Anfeng
    Univ Sci & Technol China, Peoples R China.
    Varfolomieiev, Anton
    NTUU Igor Sikorsky Kyiv Polytech Inst, Ukraine.
    Chan, Antoni
    City Univ Hong Kong, Peoples R China.
    Tripathi, Ardhendu Shekhar
    Swiss Fed Inst Technol, Switzerland.
    Smeulders, Arnold
    Univ Amsterdam, Netherlands.
    Pedasingu, Bala Suraj
    IIT Tirupati, India.
    Chen, Bao Xin
    York Univ, Canada.
    Zhang, Baopeng
    Beijing Jiaotong Univ, Peoples R China.
    Wu, Baoyuan
    Tencent AI Lab, Peoples R China.
    Li, Bi
    Chinese Acad Sci, Peoples R China; Huazhong Univ Sci & Technol, Peoples R China.
    He, Bin
    Baidu Inc, Peoples R China.
    Yan, Bin
    Dalian Univ Technol, Peoples R China.
    Bai, Bing
    Didi Chuxing, Peoples R China.
    Li, Bing
    Chinese Acad Sci, Peoples R China.
    Li, Bo
    SenseTime, Peoples R China.
    Kim, Bycong Hak
    Hanwha Syst Co, South Korea; Kyungpook Natl Univ, South Korea.
    Ma, Chao
    Shanghai Jiao Tong Univ, Peoples R China.
    Fang, Chen
    Nanjing Normal Univ, Peoples R China.
    Qian, Chen
    SenseTime, Peoples R China.
    Chen, Cheng
    Peking Univ, Peoples R China.
    Li, Chenglong
    Anhui Univ, Peoples R China.
    Zhang, Chengquan
    Baidu Inc, Peoples R China.
    Tsai, Chi-Yi
    Tamkang Univ, Taiwan.
    Luo, Chong
    Microsoft Res, Peoples R China.
    Micheloni, Christian
    Austrian Acad Sci, Austria.
    Zhang, Chunhui
    Chinese Acad Sci, Peoples R China.
    Tao, Dacheng
    Univ Sydney, Australia.
    Gupta, Deepak
    Univ Amsterdam, Netherlands.
    Song, Dejia
    Huazhong Univ Sci & Technol, Peoples R China.
    Wang, Dong
    Dalian Univ Technol, Peoples R China.
    Gavves, Efstratios
    Univ Amsterdam, Netherlands.
    Yi, Eunu
    Hanwha Syst Co, South Korea.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Inception Inst Artificial Intelligence, U Arab Emirates.
    Zhang, Fangyi
    Chinese Acad Sci, Peoples R China.
    Wang, Fei
    SenseTime, Peoples R China.
    Zhao, Fei
    Chinese Acad Sci, Peoples R China.
    De Ath, George
    Univ Exeter, England.
    Bhat, Goutam
    Swiss Fed Inst Technol, Switzerland.
    Chen, Guanqi
    SenseTime, Peoples R China.
    Wang, Guangting
    Univ Sci & Technol China, Peoples R China.
    Li, Guoxuan
    SenseTime, Peoples R China.
    Cevikalp, Hakan
    Eskisehir Osmangazi Univ, Turkey.
    Du, Hao
    Microsoft Res, Peoples R China.
    Zhao, Haojie
    Dalian Univ Technol, Peoples R China.
    Saribas, Hasan
    Eskisehir Tech Univ, Turkey.
    Jung, Ho Min
    Kyungpook Natl Univ, South Korea.
    Bai, Hongliang
    Beijing FaceAll Co, Peoples R China.
    Yu, Hongyuan
    Chinese Acad Sci, Peoples R China; Microsoft Res, Peoples R China.
    Peng, Houwen
    Microsoft Res, Peoples R China.
    Lu, Huchuan
    Dalian Univ Technol, Peoples R China.
    Li, Hui
    Jiangnan Univ, Peoples R China.
    Li, Jiakun
    Beijing Jiaotong Univ, Peoples R China.
    Li, Jianhu
    Dalian Univ Technol, Peoples R China.
    Fu, Jianlong
    Microsoft Res, Peoples R China.
    Chen, Jie
    Xidian Univ, Peoples R China.
    Gao, Jie
    Xidian Univ, Peoples R China.
    Zhao, Jie
    Dalian Univ Technol, Peoples R China.
    Tang, Jin
    Anhui Univ, Peoples R China.
    Li, Jing
    Harbin Inst Technol, Peoples R China.
    Wu, Jingjing
    Hefei Univ Technol, Peoples R China.
    Liu, Jingtuo
    Baidu Inc, Peoples R China.
    Wang, Jinqiao
    Chinese Acad Sci, Peoples R China.
    Qi, Jingqing
    Dalian Univ Technol, Peoples R China.
    Zhang, Jingyue
    Xidian Univ, Peoples R China.
    Tsotsos, John K.
    York Univ, Canada.
    Lee, John Hyuk
    Kyungpook Natl Univ, South Korea.
    van de Weijer, Joost
    Comp Vis Ctr, Spain.
    Kittler, Josef
    Univ Surrey, England.
    Lee, Jun Ha
    Kyungpook Natl Univ, South Korea.
    Zhuang, Junfei
    Beijing Univ Posts & Telecommun, Peoples R China.
    Zhang, Kangkai
    Chinese Acad Sci, Peoples R China.
    wang, Kangkang
    Baidu Inc, Peoples R China.
    Dai, Kenan
    Dalian Univ Technol, Peoples R China.
    Chen, Lei
    SenseTime, Peoples R China.
    Liu, Lei
    Anhui Univ, Peoples R China.
    Guo, Leida
    YouTu Lab, Peoples R China.
    Zhang, Li
    Comp Vis Ctr, Spain; Univ Oxford, England.
    Wang, Liang
    Chinese Acad Sci, Peoples R China; Huazhong Univ Sci & Technol, Peoples R China.
    Wang, Liangliang
    Huazhong Univ Sci & Technol, Peoples R China.
    Zhang, Lichao
    Comp Vis Ctr, Spain.
    Wang, Lijun
    Dalian Univ Technol, Peoples R China.
    Zhou, Lijun
    Univ Chinese Acad Sci, Peoples R China.
    Zheng, Linyu
    Chinese Acad Sci, Peoples R China.
    Rout, Litu
    SAC ISRO, India.
    Van Gool, Luc
    Swiss Fed Inst Technol, Switzerland.
    Bertinetto, Luca
    FiveAI, England.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Dunnhofer, Matteo
    Univ Udine, Italy.
    Ni, Meng
    Dalian Univ Technol, Peoples R China.
    Kim, Min Young
    Kyungpook Natl Univ, South Korea.
    Tang, Ming
    Chinese Acad Sci, Peoples R China.
    Yang, Ming-Hsuan
    Univ Calif Merced, CA USA.
    Paluru, Naveen
    IIT Tirupati, India.
    Martine, Niki
    Univ Udine, Italy.
    Xu, Pengfei
    Didi Chuxing, Peoples R China.
    Zhang, Pengfei
    Univ Sydney, Australia.
    Zheng, Pengkun
    Peking Univ, Peoples R China.
    Zhang, Pengyu
    Dalian Univ Technol, Peoples R China.
    Torr, Philip H. S.
    Univ Oxford, England.
    Wang, Qi Zhang Qiang
    Chinese Acad Sci, Peoples R China; IINTELLIMIND LTD, Peoples R China.
    Gua, Qing
    Tianjin Univ, Peoples R China.
    Timofte, Radu
    Swiss Fed Inst Technol, Switzerland.
    Gorthi, Rama Krishna
    IIT Tirupati, India.
    Everson, Richard
    Univ Exeter, England.
    Han, Ruize
    Tianjin Univ, Peoples R China.
    Zhang, Ruohan
    Xidian Univ, Peoples R China.
    You, Shan
    SenseTime, Peoples R China.
    Zhao, Shao-Chuan
    Jiangnan Univ, Peoples R China.
    Zhao, Shengwei
    Chinese Acad Sci, Peoples R China.
    Li, Shihu
    Baidu Inc, Peoples R China.
    Li, Shikun
    Chinese Acad Sci, Peoples R China.
    Ge, Shiming
    Chinese Acad Sci, Peoples R China.
    Bai, Shuai
    Beijing Univ Posts & Telecommun, Peoples R China.
    Guan, Shuosen
    YouTu Lab, Peoples R China.
    Xing, Tengfei
    Didi Chuxing, Peoples R China.
    Xu, Tianyang
    Jiangnan Univ, Peoples R China.
    Yang, Tianyu
    City Univ Hong Kong, Peoples R China.
    Zhang, Ting
    China Natl Elect Import Export Corp, Peoples R China.
    Vojir, Tomas
    Univ Cambridge, England.
    Feng, Wei
    Tianjin Univ, Peoples R China.
    Hu, Weiming
    Chinese Acad Sci, Peoples R China.
    Wang, Weizhao
    Peking Univ, Peoples R China.
    Tang, Wenjie
    China Natl Elect Import Export Corp, Peoples R China.
    Zeng, Wenjun
    Microsoft Res, Peoples R China.
    Liu, Wenyu
    Huazhong Univ Sci & Technol, Peoples R China.
    Chen, Xi
    Chinese Acad Sci, Peoples R China; Xidian Univ, Peoples R China; Zhejiang Univ, Peoples R China.
    Qiu, Xi
    Xianan JiaoTong Univ, Peoples R China.
    Bai, Xiang
    Huazhong Univ Sci & Technol, Peoples R China.
    Wu, Xiao-Jun
    Jiangnan Univ, Peoples R China.
    Yang, Xiaoyun
    Chinese Academy of Sciences, China.
    Chen, Xier
    Xidian Univ, Peoples R China.
    Li, Xin
    Harbin Inst Technol, Peoples R China.
    Sun, Xing
    YouTu Lab, Peoples R China.
    Chen, Xingyu
    Chinese Acad Sci, Peoples R China.
    Tian, Xinmei
    Univ Sci & Technol China, Peoples R China.
    Tang, Xu
    Baidu Inc, Peoples R China.
    Zhu, Xue-Feng
    Jiangnan Univ, Peoples R China.
    Huang, Yan
    Chinese Acad Sci, Peoples R China.
    Chen, Yanan
    Xidian Univ, Peoples R China.
    Lian, Yanchao
    Xidian Univ, Peoples R China.
    Gu, Yang
    Didi Chuxing, Peoples R China.
    Liu, Yang
    North China Elect Power Univ, Peoples R China.
    Chen, Yanjie
    SenseTime, Peoples R China.
    Zhang, Yi
    YouTu Lab, Peoples R China.
    Xu, Yinda
    Zhejiang Univ, Peoples R China.
    Wang, Yingming
    Dalian Univ Technol, Peoples R China.
    Li, Yingping
    Xidian Univ, Peoples R China.
    Zhou, Yu
    Huazhong Univ Sci & Technol, Peoples R China.
    Dong, Yuan
    Beijing Univ Posts & Telecommun, Peoples R China.
    Xu, Yufei
    Univ Sci & Technol China, Peoples R China.
    Zhang, Yunhua
    Dalian Univ Technol, Peoples R China.
    Li, Yunkun
    Jiangnan Univ, Peoples R China.
    Luo, Zeyu Wang Zhao
    Chinese Acad Sci, Peoples R China.
    Zhang, Zhaoliang
    China Natl Elect Import Export Corp, Peoples R China.
    Feng, Zhen-Hua
    Univ Surrey, England.
    He, Zhenyu
    Harbin Inst Technol, Peoples R China.
    Song, Zhichao
    Didi Chuxing, Peoples R China.
    Chen, Zhihao
    Tianjin Univ, Peoples R China.
    Zhang, Zhipeng
    Chinese Acad Sci, Peoples R China.
    Wu, Zhirong
    Microsoft Res, Peoples R China.
    Xiong, Zhiwei
    Univ Sci & Technol China, Peoples R China.
    Huang, Zhongjian
    Xidian Univ, Peoples R China.
    Teng, Zhu
    Beijing Jiaotong Univ, Peoples R China.
    Ni, Zihan
    Baidu Inc, Peoples R China.
    The Seventh Visual Object Tracking VOT2019 Challenge Results2019Inngår i: 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), IEEE COMPUTER SOC , 2019, s. 2206-2241Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOT-ST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2019 focused on long-term tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard short-term, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website(1).

    Fulltekst (pdf)
    fulltext
  • 50.
    Narayan, Sanath
    et al.
    Inception Institute of Artificial Intelligence, UAE.
    Cholakkal, Hisham
    Mohamed Bin Zayed University of AI, UAE.
    Hayat, Munawar
    Monash University, Australia.
    Khan, Fahad Shahbaz
    Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten. Mohamed Bin Zayed University of AI, UAE.
    Yang, Ming-Hsuan
    University of California, Merced; Google Research; Yonsei University.
    Shao, Ling
    Inception Institute of Artificial Intelligence, UAE.
    D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddingsand Denoised Activations2021Annet (Annet vitenskapelig)
    Abstract [en]

    This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on multiple benchmarks, including THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on all datasets, achieving gains as high as 2.3% in terms of mAP at IoU=0.5 on THUMOS14

12 1 - 50 of 64
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf