liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
Khan, Fahad Shahbaz
Alternative names
Publications (10 of 65) Show all publications
Naseer, M., Khan, S., Porikli, F. & Khan, F. S. (2024). Guidance Through Surrogate: Toward a Generic Diagnostic Attack. IEEE Transactions on Neural Networks and Learning Systems, 35(2), 2042-2053
Open this publication in new window or tab >>Guidance Through Surrogate: Toward a Generic Diagnostic Attack
2024 (English)In: IEEE Transactions on Neural Networks and Learning Systems, ISSN 2162-237X, E-ISSN 2162-2388, Vol. 35, no 2, p. 2042-2053Article in journal (Refereed) Published
Abstract [en]

Adversarial training (AT) is an effective approach to making deep neural networks robust against adversarial attacks. Recently, different AT defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well-studied adversarial attacks, such as projected gradient descent (PGD). High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as "gradient masking." In this work, we analyze the effect of label smoothing on AT as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed guided projected gradient attack (G-PGA). Our attack approach is based on a "match and deceive" loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts a large number of attack iterations or a search for optimal step size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate in the case of auto-attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2024
Keywords
Smoothing methods; Robustness; Training; Optimization; Behavioral sciences; Computational modeling; Perturbation methods; Adversarial attack; gradient masking; guided optimization; image classification; label smoothing
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-187408 (URN)10.1109/TNNLS.2022.3186278 (DOI)000826080200001 ()35816520 (PubMedID)
Available from: 2022-08-22 Created: 2022-08-22 Last updated: 2024-12-23Bibliographically approved
Cao, J., Pang, Y., Anwer, R. M., Cholakkal, H., Khan, F. S. & Shao, L. (2023). SipMaskv2: Enhanced Fast Image and Video Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3798-3812
Open this publication in new window or tab >>SipMaskv2: Enhanced Fast Image and Video Instance Segmentation
Show others...
2023 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, no 3, p. 3798-3812Article in journal (Refereed) Published
Abstract [en]

We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Image instance segmentation; video instance segmentation; real-time; single-stage method; spatial information preservation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-186695 (URN)10.1109/tpami.2022.3180564 (DOI)000966968900001 ()
Note

Funding agencies:

National Key Research and Development Program of China (Grant Number: 2018AAA0102800 and 2018AAA0102802)

10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 61906131 and 61929104)

10.13039/501100019065-Tianjin Science and Technology Program (Grant Number: 19ZXZNGX00050)

10.13039/501100006606-Natural Science Foundation of Tianjin City (Grant Number: 21JCQNJC00420)CAAI-Huawei MindSpore Open Fund

Available from: 2022-06-30 Created: 2022-06-30 Last updated: 2025-02-07
Cao, J., Pang, Y., Xie, J., Khan, F. S. & Shao, L. (2022). From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 4913-4934
Open this publication in new window or tab >>From Handcrafted to Deep Features for Pedestrian Detection: A Survey
Show others...
2022 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 44, no 9, p. 4913-4934Article in journal (Refereed) Published
Abstract [en]

Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.

Place, publisher, year, edition, pages
New York: IEEE, 2022
Keywords
Feature extraction; Proposals; Cameras; Deep learning; Task analysis; Object detection; Support vector machines; Pedestrian detection; handcrafted features based methods; deep features based methods; multi-spectral pedestrian detection
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-179899 (URN)10.1109/TPAMI.2021.3076733 (DOI)000836666600033 ()33929956 (PubMedID)2-s2.0-85105102179 (Scopus ID)
Note

Funding agencies:10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 61906130, 61632018), National Key Research and Development Program of China (Grant Number: 2018AAA0102800)

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2025-02-07Bibliographically approved
Naseer, M., Ranasinghe, K., Khan, S., Khan, F. S. & Porikli, F. (2022). ON IMPROVING ADVERSARIAL TRANSFERABILITY OF VISION TRANSFORMERS. In: : . Paper presented at The Tenth International Conference on Learning Representations (Virtual)Mon Apr 25th through Fri the 29th.
Open this publication in new window or tab >>ON IMPROVING ADVERSARIAL TRANSFERABILITY OF VISION TRANSFORMERS
Show others...
2022 (English)Conference paper, Poster (with or without abstract) (Other academic)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-186698 (URN)
Conference
The Tenth International Conference on Learning Representations (Virtual)Mon Apr 25th through Fri the 29th
Available from: 2022-06-30 Created: 2022-06-30 Last updated: 2025-02-07
Narayan, S., Cholakkal, H., Hayat, M., Khan, F. S., Yang, M.-H. & Shao, L. (2021). D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddingsand Denoised Activations.
Open this publication in new window or tab >>D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddingsand Denoised Activations
Show others...
2021 (English)Other (Other academic)
Abstract [en]

This work proposes a weakly-supervised temporal action localization framework, called D2-Net, which strives to temporally localize actions using video-level supervision. Our main contribution is the introduction of a novel loss formulation, which jointly enhances the discriminability of latent embeddings and robustness of the output temporal class activations with respect to foreground-background noise caused by weak supervision. The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization. The discriminative term incorporates a classification loss and utilizes a top-down attention mechanism to enhance the separability of latent foreground-background embeddings. The denoising loss term explicitly addresses the foreground-background noise in class activations by simultaneously maximizing intra-video and inter-video mutual information using a bottom-up attention mechanism. As a result, activations in the foreground regions are emphasized whereas those in the background regions are suppressed, thereby leading to more robust predictions. Comprehensive experiments are performed on multiple benchmarks, including THUMOS14 and ActivityNet1.2. Our D2-Net performs favorably in comparison to the existing methods on all datasets, achieving gains as high as 2.3% in terms of mAP at IoU=0.5 on THUMOS14

Series
arXiv.org ; 2012.06440
Identifiers
urn:nbn:se:liu:diva-179907 (URN)
Note

ICCV 2021

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12
Narayan, S., Gupta, A., Khan, S., Khan, F. S., Shao, L. & Shah, M. (2021). Discriminative Region-based Multi-Label Zero-Shot Learning.
Open this publication in new window or tab >>Discriminative Region-based Multi-Label Zero-Shot Learning
Show others...
2021 (English)Other (Other academic)
Abstract [en]

Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL since several objects can co-exist in a natural image. However, the occurrence of multiple objects complicates the reasoning and requires region-specific processing of visual features to preserve their contextual cues. We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes. Such shared maps lead to diffused attention, which does not discriminatively focus on relevant locations when the number of classes are large. Moreover, mapping spatially-pooled visual features to the class semantics leads to inter-class feature entanglement, thus hampering the classification. Here, we propose an alternate approach towards region-based discriminability-preserving multi-label zero-shot classification. Our approach maintains the spatial resolution to preserve region-level characteristics and utilizes a bi-level attention module (BiAM) to enrich the features by incorporating both region and scene context information. The enriched region-level features are then mapped to the class semantics and only their class predictions are spatially pooled to obtain image-level predictions, thereby keeping the multi-class features disentangled. Our approach sets a new state of the art on two large-scale multi-label zero-shot benchmarks: NUS-WIDE and Open Images. On NUS-WIDE, our approach achieves an absolute gain of 6.9% mAP for ZSL, compared to the best published results.

Series
arXiv.org ; 2108.09301
Identifiers
urn:nbn:se:liu:diva-179904 (URN)
Note

Accepted to ICCV 2021

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12
Bhunia, A. K., Khan, S., Cholakkal, H., Anwer, R. M., Khan, F. S. & Shah, M. (2021). Handwriting Transformers.
Open this publication in new window or tab >>Handwriting Transformers
Show others...
2021 (English)Other (Other academic)
Abstract [en]

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images.

Series
arXiv.org ; 2104.03964
Identifiers
urn:nbn:se:liu:diva-179902 (URN)
Note

ICCV 2021

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12
Naseer, M., Khan, S., Hayat, M., Khan, F. S. & Porikli, F. (2021). On Generating Transferable Targeted Perturbations.
Open this publication in new window or tab >>On Generating Transferable Targeted Perturbations
Show others...
2021 (English)Other (Other academic)
Abstract [en]

While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific `targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (\ours). We note that the existing methods are less suitable for this task due to their reliance on class-boundary information that changes from one model to another, thus reducing transferability. In contrast, our approach matches the perturbed image `distribution' with that of the target class, leading to high targeted transferability rates. To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains. Based on the proposed objective, we train a generator function that can adaptively synthesize perturbations specific to a given input. Our generative approach is independent of the source or target domain labels, while consistently performs well against state-of-the-art methods on a wide range of attack settings. As an example, we achieve 32.63% target transferability from (an adversarially weak) VGG19BN to (a strong) WideResNet on ImageNet val. set, which is 4× higher than the previous best generative attack and 16× better than instance-specific iterative attack. 

Series
arXiv.org ; 2103.14641
Identifiers
urn:nbn:se:liu:diva-179905 (URN)
Note

ICCV 2021

Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12
Ranasinghe, K., Naseer, M., Hayat, M., Khan, S. & Khan, F. S. (2021). Orthogonal Projection Loss.
Open this publication in new window or tab >>Orthogonal Projection Loss
Show others...
2021 (English)Other (Other academic)
Abstract [en]

Deep neural networks have achieved remarkable performance on a range of classification tasks, with softmax cross-entropy (CE) loss emerging as the de-facto objective function. The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes. However, this is a relative constraint and does not explicitly force different class features to be well-separated. Motivated by the observation that ground-truth class representations in CE loss are orthogonal (one-hot encoded vectors), we develop a novel loss function termed `Orthogonal Projection Loss' (OPL) which imposes orthogonality in the feature space. OPL augments the properties of CE loss and directly enforces inter-class separation alongside intra-class clustering in the feature space through orthogonality constraints on the mini-batch level. As compared to other alternatives of CE, OPL offers unique advantages e.g., no additional learnable parameters, does not require careful negative mining and is not sensitive to the batch size. Given the plug-and-play nature of OPL, we evaluate it on a diverse range of tasks including image recognition (CIFAR-100), large-scale classification (ImageNet), domain generalization (PACS) and few-shot learning (miniImageNet, CIFAR-FS, tiered-ImageNet and Meta-dataset) and demonstrate its effectiveness across the board. Furthermore, OPL offers better robustness against practical nuisances such as adversarial attacks and label noise. 

Series
arXiv.org ; 2103.14021
Identifiers
urn:nbn:se:liu:diva-179906 (URN)
Available from: 2021-10-05 Created: 2021-10-05 Last updated: 2021-10-12
Joseph, K., Khan, S., Khan, F. S. & Balasubramanian, V. N. (2021). Towards Open World Object Detection. In: : . Paper presented at CVPR 2021, June 19-25 2021.
Open this publication in new window or tab >>Towards Open World Object Detection
2021 (English)Conference paper, Oral presentation only (Other academic)
Abstract [en]

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosityabout these unknown instances aids in learning about them,when the corresponding knowledge is eventually available.This motivates us to propose a novel computer vision problem called: ‘Open World Object Detection’, where a modelis tasked to: 1) identify objects that have not been introduced to it as ‘unknown’, without explicit supervision to doso, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, whenthe corresponding labels are progressively received. Weformulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE:Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyse the efficacyof ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterisingunknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-ofthe-art performance, with no extra methodological effort.We hope that our work will attract further research into thisnewly identified, yet crucial research direction.

Identifiers
urn:nbn:se:liu:diva-180212 (URN)
Conference
CVPR 2021, June 19-25 2021
Note

Based on manuscript (preprint) in Arxiv: https://arxiv.org/abs/2103.02603

Available from: 2021-10-12 Created: 2021-10-12 Last updated: 2021-10-12
Organisations

Search in DiVA

Show all publications