Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Aug 9, 2014
This paper appeared in NIME 2013. Please read the original publication to understand what's going on in the video.
Abstract: We present a method for automatic feature extraction and cross-modal mapping using deep learning. Our system uses stacked autoencoders to learn a layered feature representation of the data. Feature vectors from two (or more) different domains are mapped to each other, effectively creating a cross-modal mapping. Our system can either run fully unsupervised, or it can use high-level labeling to fine-tune the mapping according a user’s needs. We show several applications for our method, mapping sound to or from images or gestures. We evaluate system performance both in standalone inference tasks and in cross-modal mappings.