liu.seSearch for publications in DiVA
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SipMaskv2: Enhanced Fast Image and Video Instance Segmentation
School of Electrical and Information Engineering, Tianjin University, Tianjin, China.ORCID iD: 0000-0002-5160-6841
School of Electrical and Information Engineering, Tianjin University, Tianjin, China.ORCID iD: 0000-0001-6670-3727
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.ORCID iD: 0000-0002-9041-2214
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.ORCID iD: 0000-0002-8230-9065
Show others and affiliations
2023 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 45, no 3, p. 3798-3812Article in journal (Refereed) Published
Abstract [en]

We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.

Place, publisher, year, edition, pages
IEEE, 2023. Vol. 45, no 3, p. 3798-3812
Keywords [en]
Image instance segmentation; video instance segmentation; real-time; single-stage method; spatial information preservation
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-186695DOI: 10.1109/tpami.2022.3180564ISI: 000966968900001OAI: oai:DiVA.org:liu-186695DiVA, id: diva2:1679024
Note

Funding agencies:

National Key Research and Development Program of China (Grant Number: 2018AAA0102800 and 2018AAA0102802)

10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 61906131 and 61929104)

10.13039/501100019065-Tianjin Science and Technology Program (Grant Number: 19ZXZNGX00050)

10.13039/501100006606-Natural Science Foundation of Tianjin City (Grant Number: 21JCQNJC00420)CAAI-Huawei MindSpore Open Fund

Available from: 2022-06-30 Created: 2022-06-30 Last updated: 2024-01-10

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Khan, Fahad Shahbaz

Search in DiVA

By author/editor
Cao, JialePang, YanweiAnwer, Rao MuhammadCholakkal, HishamKhan, Fahad ShahbazShao, Ling
By organisation
Computer VisionFaculty of Science & Engineering
In the same journal
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 115 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • oxford
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf