 Let's take a look at the semantic segmentation with FCNs, fully convolution networks. FCNs are used to predict the mass from each ROI. So why are we using convolutional layers? This is because convolutional layers retain spatial orientation. Such information is crucial for location-specific tasks like creating an object mask. So you can see why the traditional use of fully connected layers won't work here. In fully connected layers, the spatial orientation of pixels with respect to each other is lost, as they are squished together to form a feature vector. In Facebook AI research, the COCO dataset is used. It's a large-scale dataset for object detection, segmentation, and captioning. There are over 200,000 labeled images consisting of 1.5 million objects.