 In Facebook AI research, the Coco dataset is used. It's a large-scale dataset for object detection, segmentation, and captioning. There are over 200,000 labeled images consisting of 1.5 billion objects. It's a large-scale dataset for object detection, segmentation, and captioning. Last, RCNN takes about 1-2 days to train on this dataset using an AGPU machine. It achieves good results even for challenging images. Here's a comparison with respect to the state-of-the-art fully convolution instant segmentation system, FCIS. FCIS is an alternate framework that also uses semantic segmentation and object detection to categorize box and mask objects in an image, and it does it fast. But FCIS exhibits systematic errors on overlapping instances and creates spurious edges, showing that it is challenged by the fundamental difficulties of segmenting instances.