 Hi, my name is Yalan, and today I'm going to talk about the texture classification by audio-tactile companies. Humans can easily recognize and classify textures by touching the surface. This process goes beyond the limits of region and relates on both auditory and tactile feedback and their interplay. So a variety of multimodal features has been explored for texture classification, but most work neglect the countries between auditory and tactile feedback without considering the same source from the interaction. In addition, two scan temperatures, scan speed, and applied force largely affect the multimodal feedback, the classification algorithm to raise home, which has not been ten-coded in most study. In this work, basal signals from unconstrained to surface interactions will present a large-scale texture classification method by analyzing audio-tactile cross-modal congruence in the frequency domain. First, we synchronize the recorded sound and vibration by canonical correlation analysis to ensure the temporary congruence. The synchronized sound and vibration are decomposed into spectrograms of power expansion density with the same number of frequency bands as in time frequency representation. Based on that, we track the frequency components from the two modalities at the current time step and compare them in the same band using current distance, resulting in one D mapping. However, the information covered by this one D spectral mapping is still limited. Instead, at the current time step, we compare the frequency components between sound and vibration across the frequency bands to realize a two-dimensional representation about the compass named inter-band spectral mapping. This 2D representation is average over all time steps to compensate for scan time parameter, and it conveys the interplay of touch-produced sound and vibration in the frequency domain. Its surface force show large variance between textures but high similarity within the same category. Revealing the 2D structure of the inter-band spectral mapping, we consider it as an image and enhance the quality band size measures. We first conduct a single-reddit composition to obtain the dominant agon images from the training samples. We project that input sample onto the agon space formed by the previous agon images and define that feature for that sample named projected spectrum mapping, which is used in our classification. For the evaluation, we conduct a classification test for 6D net texture from the LMT texture database. Compared to a variety of multimodal features from previous work, we achieved a higher 79% and 85% accuracy for the PSM feature and PSM with image features. Also, we achieved an overall 89% accuracy by categories. Moreover, we performed a participant-specific cross-paralysis with 10 votes to show the features robustly against different interaction styles. For more information, please check out our paper. Thank you.