liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SSMTL plus plus : Revisiting self-supervised multi-task learning for video anomaly detection
Univ Bucharest, Romania.
Univ Bucharest, Romania; SecurifAI, Romania; MBZ Univ Artificial Intelligence, U Arab Emirates.
Univ Bucharest, Romania; SecurifAI, Romania.
Aalborg Univ, Denmark; Milestone Syst, Denmark.
Show others and affiliations
2023 (English)In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 229, article id 103656Article in journal (Refereed) Published
Abstract [en]

A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.

Place, publisher, year, edition, pages
ACADEMIC PRESS INC ELSEVIER SCIENCE , 2023. Vol. 229, article id 103656
Keywords [en]
Anomaly detection; Self-supervised learning; Multi-task learning; Neural networks; Transformers
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-192689DOI: 10.1016/j.cviu.2023.103656ISI: 000944170600001OAI: oai:DiVA.org:liu-192689DiVA, id: diva2:1746634
Note

Funding Agencies|Romanian Ministry of Education and Research, CNCS-UEFISCDI [PN-III-P2- 2.1-PED-2021-0195]; Milestone Research Programme at AAU; SecurifAI

Available from: 2023-03-29 Created: 2023-03-29 Last updated: 2025-02-07

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Search in DiVA

By author/editor
Khan, Fahad
By organisation
Computer VisionFaculty of Science & Engineering
In the same journal
Computer Vision and Image Understanding
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 187 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf