Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Oct 7, 2016
Grounding and Generation of Natural Language Descriptions for Images and Videos
Abstract: In recent years many challenging problems have emerged in the field of language and vision. Frequently the only form of available annotation is the natural language sentence associated with an image or video. How can we address complex tasks like automatic video description or visual grounding of textual phrases with these weak and noisy annotations? In my talk I will first present our pioneering work on automatic movie description. We collected a large scale dataset and proposed an approach to learn visual semantic concepts from weak sentence annotations. I will then talk about our approach to grounding arbitrary language phrases in images. It is able to operate in un- and semi-supervised settings (with respect to the localization annotations) by learning to reconstruct the input phrase.