The surprising impact of mask-head architecture on novel class segmentation

We tackle the problem of partially supervised instance segmentation in which we are given box annotations for all classes, but masks for only a subset of classes. We show that the architecture of the mask head plays a surprisingly important role in generalizing to masks of unseen classes. The figure below shows improved mask predictions for unseen classes as we use better mask-head architectures.

Just by using better mask-head architectures (no extra losses or modules) we achieve state-of-the-art performance in the partially supervised instance segmentation task. We call our model DeepMAC, which is short for Deep mask-heads above CenterNet.

Code

Deep-MAC code - Used for most experiments with the CenterNet architecture.
Deep-MARC code - Used for our Mask-RCNN based model.

Demos

Colab for interactively trying out a pre-trained model.
iWildCam Notebook to visualize instance masks generated by DeepMAC on the iWildCam dataset.

Main Results

In this table (X→Y) indicates that we train on masks from ‘X’ classes and evaluate with masks from ‘Y’ classes. These experiments are done on the COCO dataset. The VOC split contains 20 classes and the non-VOC split contains 60 classes. Bounding boxes are provided for all classes.

Train → Eval	Model	Mask mAP	Config
Deep-MAC (CetnerNet based)	VOC→Non-VOC	35.5	Link
Deep-MAC (CetnerNet based)	Non-VOC→VOC	39.1	Link
Deep-MARC (Mask R-CNN based)	VOC→Non-VOC	38.7	Link
Deep-MARC (Mask R-CNN based)	Non-VOC→VOC	41.0	Link

Checkpoints

Both these models take Image + boxes as input and produce per-box instance masks as output.

Citation

@misc{birodkar2021surprising,
      title={The surprising impact of mask-head architecture on novel class segmentation}, 
      author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang},
      year={2021},
      eprint={2104.00613},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}