Non-sciency explanation: Given two frames from a stereo camera, we want to know the depth for each pixel. As long as something is identifiable in both frames, not much of a problem; otherwise (e.g. specularities, occlusions, translucent objects etc.) we allow to specify totally approximate regions within world space, and the algorithm selects the accurate depth within an approximate regions. Human scene understanding + algorithmic precision = win.
Published at European Conference on Visual Media Production (CVMP) 2013.