 Hi everyone. My name is Lan Chen. Today I'll talk about my project about predicting fusion of the caudal valves in normal hearing listeners with a physiologically based model. First, some background about valves. Vowels are a type of speech sound which is produced by comparatively open configuration of the vocal tract with vibration of the vocal cords. The important features of the valves can be seen from the spectrum. The narrowed space lines are harmonics with spacing determined by fundamental frequency f0 or pitch. The spectral peaks are formants which are used to discriminate valves especially the first and second formants f1 and f2. Here's an example valve space. X axis is f1 and the Y axis is f2. You can see that each valve has a focus mapping on the space. Note that the valve spaces come very across listeners and with contacts such as f0. In a recent study, Rayse and Molyse showed something interesting about dichotic vowel identification. Two different valves were dichotically presented at the two years through headphones like this. Fundamental frequencies of the two valves could be the same or different. The subjects were not told that the two valves were always presented and they could choose one or two valves. The study found that when the fundamental frequencies are different, the subjects usually correctly identify both valves. But when the fundamental frequencies are the same, the subject usually identify only one valve that is the subject fuses two valves and sometimes the fused valves was not one of the valves presented. Rayse and Molyse use a general spectrum average model to predict these fused percepts in human listeners. Here we extend the concept to a more physiologically realistic model based on individualized valve maps to predict responses in the dichotic vowel identification task. In this study, there were seven valves used as shown here. The symbols are what we presented to the subjects and the example pronunciation is underlined in the word by the symbol. The subject were trained to correctly identify with these valves with at least 80% correct in the left and right ear separately before they continue. Four of the valves circle here are used in the dichotic vowel pair stimuli. All possible combinations of these four target valves with the same or different fundamentals were presented. The subjects were told that they could be presented with either one or two valves and their task was to select which one or two. All subjects were tested remotely. The results are shown in the form of a confusion matrix which helps us to visualize responses versus stimuli. Each column shows a specific vowel pair presented and each row represents the percentage of specific response combinations. A darker shading represents to a higher percentage of that response. For example, this cell shows that with the E and A pair, the subject responded 84% with A alone. Therefore, the confusion matrix can be divided into three parts. The first part indicates that the subjects had fused perception because only one valve was selected in response to the dichotic vowel pair. The second part indicates unfused perception outside the target categories. The third part indicates unfused perception within the target categories. However, only the box cells means that the subject correctly identified both valves in the vowel pair. For the specific study, when the two valves had the same fundamental frequencies, the subject fused most of the vowel pairs consistent with the previous study. However, we do notice that sometimes the subject did not fused the vowel pairs. This could be due to a noisier testing environment than in the previous study as more noise could decrease fusion. When the up zeros differed between the two valves, the subjects fused a few of the vowel pairs consistent with the previous study. We used a physiologically based model to simulate neural responses to valves. This is a diagram of the ascending auditory pathway. There are two important levels in the model. The first is the auditory nerve model here. The second level is the inferior calluculus located in the middle of the auditory pathway. IC neurons are known to be sensitive to amplitude modulations. Here, we chose to use bench-suppressed model neurons, which are supposed to respond stronger to spectral peaks. There are 90 characteristic frequencies used throughout the model, logarithmic lay space between 125 and 4 kHz. There are also 16 time constants used for the IC model, each sensitive to different modulation rates. Here is an example model response to monoreal app. Brighter indicates stronger responses. The strong responses here also line up with the formants of app as shown here with the dash lines. Now, I will talk about how the model neural responses are used to predict human performance. There are two steps. The first step is to use AN model output to determine if the two vows have the same or different f0s. The decision variable here is adapted from Chintan-Pally et al. originally from Madison Hewitt. First, cross-correlation were calculated with AN model outputs of the two channels that have the same CF. Then, the cross-correlation were summed up across CFs to obtain the pulled cross-correlation function. If the magnitude of the highest peaks in the spectrum of the pulled cross-correlation function exceeded our criteria, then the f0s were determined to be the same, otherwise were considered different, and they will be treated differently in sub-2. In the second step, if the f0s of the two vows were the same, the IC model outputs of the two vows was averaged for all CFs and time constants, otherwise outputs of the two channels were analyzed independently. Template matching was used to predict vows or vows selected. In this talk, I'll focus on the same f0 condition. Use the vowspace again to remind you of the vows stimuli. For individualized vowspace characterization, we use the 90 tokens shown here with the small grid dots on the grid, including the ones on the edge. They were used to sample the entire vowspace. Subjects were asked to identify each token as one of the seven vows options provided. For template matching, we calculate the difference between model neural responses to the decoded vows pairs and to each of the 90 tokens to find which token has the most similar responses to the testimony. Here's an example. This is a model neural responses to the ER pair compared to the template responses for token number one located here. We calculated the distance between the two and also for all tokens. And here is a plot that shows the distance between the ER pair and all the sample tokens. The lighter color indicates a smaller distance, and here's the winner token. Just to remind you again that here is the E and R located on the vowspace. After we find the winner token, we look at how the subject characterized this token according to their vow map. Here's an example. The subject identified this token 58% as a and 48% as a. So that's the final output of the model. Here are the behavioral data and the model prediction of one of our normal hearing subjects for same zero condition. The subjects data are on the left and the model prediction on the right. In general, the model was able to capture the general response pattern for the subject when the F zero is lower for higher F zero. The model made more mistakes, especially when it was presented. Here are the data and prediction when the F zeros are different. In general, the model was able to predict both vows correctly, just like the subject. In conclusion, the model can predict some few steps of human listeners with normal hearing. And we have several things in mind to try. For example, we definitely would like to explore why the model is using different strategies for different F zeros. We will also explore different binaural combinations and try other decision variables. We're also interested in modeling the views perception of listeners with hearing loss. With that, I would like to thank my lab, especially my lab member who helped with remote testing and my collaborator and the funding source. Thank you for your attention and I'd like to take any questions.