We tackle the problem of partially supervised instance segmentation in which we are given box annotations for all classes, but masks for only a subset of classes. We show that the architecture of the mask head plays a surprisingly important role in generalizing to masks of unseen classes. The figure below shows improved mask predictions for unseen classes as we use better mask-head architectures.
Just by using better mask-head architectures (no extra losses or modules) we achieve state-of-the-art performance in the partially supervised instance segmentation task. We call our model DeepMAC, which is short for Deep mask-heads above CenterNet.
Code
- Deep-MAC code - Used for most experiments with the CenterNet architecture.
- Deep-MARC code - Used for our Mask-RCNN based model.
Demos
- Colab for interactively trying out a pre-trained model.
- iWildCam Notebook to visualize instance masks generated by DeepMAC on the iWildCam dataset.
Main Results
In this table (X→Y) indicates that we train on masks from ‘X’ classes and evaluate with masks from ‘Y’ classes. These experiments are done on the COCO dataset. The VOC split contains 20 classes and the non-VOC split contains 60 classes. Bounding boxes are provided for all classes.
Train → Eval | Model | Mask mAP | Config |
Deep-MAC (CetnerNet based) | VOC→Non-VOC | 35.5 | Link |
Non-VOC→VOC | 39.1 | Link | |
Deep-MARC (Mask R-CNN based) | VOC→Non-VOC | 38.7 | Link |
Non-VOC→VOC | 41.0 | Link |
Checkpoints
Both these models take Image + boxes as input and produce per-box instance masks as output.
Citation
@misc{birodkar2021surprising,
title={The surprising impact of mask-head architecture on novel class segmentation},
author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang},
year={2021},
eprint={2104.00613},
archivePrefix={arXiv},
primaryClass={cs.CV}
}