 Hello, my name is Mark and I'll be talking about an ongoing project using artificial neural networks to model effects of hearing loss on speech recognition. I want to start by saying that we really know a lot about the ear. Decades of neurophysiology and modeling work have resulted in good computational descriptions of the ears signal processing. Given an arbitrary sound waveform like the one on the left, state of the art peripheral auditory models can produce pictures like the one on the right, where each row represents an auditory nerve fiber responding to a particular band of frequencies in the input. The x axis is time, the y axis is positional on the copious frequency axis and white pixels indicate spikes. Time average firing rates are plotted on the right. The peripheral physiology of sensory neural hearing loss has been extensively studied. We know a lot about how damage just specific structures in the ear affect peripheral signal processing. And from human psychophysics, we also know a lot about the behavioral consequences of hearing loss substantially less however is known about how the first causes of the second, especially with respect to complicated behaviors like understanding speech. Since it's usually impossible to directly examine peripheral auditory damage in humans, we often turn to computational models to gain insights about how specific peripheral changes lead to specific perceptual effects. And while these approaches have been highly productive, especially for explaining behavior and psychoacoustic tasks, until quite recently a major limitation of computational models is their inability to perform complex real world tasks as well as humans do. In particular, the primary complaint of hearing impaired listeners that they have difficulty understanding speech in noisy environments to directly model this behavior will one really like is a system that is capable of recognizing speech from noisy sound waveforms. Today I'll be talking about how deep artificial neural networks are good candidate for such a model. In our lab, we often train models like these to recognize words from simulated cochlear representations. The network's task is fairly straightforward, play a two second audio clips of speech and we asked it to report which word out of about 800 options appeared in the middle of the clip. We train these networks on millions of utterances from hundreds of different speakers using back propagation and gradient descent to make the task more difficult and realistic the speeches embedded in background noise from YouTube soundtracks. We use this task because it's quite natural for humans to do plot on the left to the results of humans doing this word recognition task performances on the y axis and the different color lines represent different types of background noise. Naturally, they get the most words right when there's no background noise, the infinite signal to noise ratio condition. We get progressively worse as the SNR decreases humans are also reliably finding some types of background noise to be more difficult than others. Buddy Alex Kellen colleagues showed that speech train networks replicate patterns of human behavior. The middle plot shows the results of a network performing the same experiment with the same stimuli. You can see the network produces a remarkably similar pattern of behavior. If we look at the right scatter plot of corresponding data points between the humans and the network. We see the results are highly correlated and clustered around the diagonal. The names of these networks can serve as a good model of normal hearing speech recognition behavior. The question today, however, is if we can extend this approach to model hearing impaired behavior. In particular, if we simulate hearing loss and the cochlear representation will the network then exhibit behavioral characteristics of hearing impaired listeners. For instance, hearing impaired humans are generally able to recognize speech quite well and quiet conditions, but the performance drops off very quickly and noise your conditions. Additionally, hearing impaired people often lose the ability to listen in the dips of fluctuating noise. What I mean by this is that for the same long term signal to noise ratio, normal hearing listeners have better word recognition. When the background noise is changing in amplitude, the dashed line, then when the noise is stationary solid line to demo this effect to try to listen for the phrase the advantage of going downstream these two audio clips. The normal hearing listeners should find easier to hear in the fluctuating noise case the second condition hearing impaired listeners often won't show this effect. The dash and solid lines are essentially overlapping they received no performance benefit from fluctuating noise. Today we'll test if deep neural networks with simulated hearing loss will exhibit these kinds of behavioral characteristics. To briefly outline the general approach there'll be three main conditions to keep straight. We'll test the network using a healthy cochlear model as front end. This is our model normal hearing. Second, we'll take that same exact train network, fix the learned weights and switch in the cochlear model that has been damaged in some way. We can perhaps think of this condition as modeling the onset of hearing loss. The network is an auditory system that has over the course of evolution and development and optimized to deal with input from a healthy year. The input representation changes as a result of cochlear damage and the system has no ability to adapt to its new years. We'll call this condition the static hearing impaired model. Lastly, we'll also train networks up using the damaged cochlear models we can perhaps think of this condition as an auditory system that is able to maximally adapt to its new years following cochlear damage. We'll refer to this condition as the plastic hearing impaired model. If it exhibits impaired performance, we might conclude that the behavioral deficits are simply an inevitable consequence of the peripheral representation. The cochlear damage has resulted in a periphery that is just inherently less well suited for conveying speech information. Alternatively, if the plastic model looks normal and only the static model exhibits hearing impaired behavior, it could suggest that a lack of plasticity following cochlear damage might contribute to behavioral deficits of hearing impaired listeners. The way our model is set up, we can plausibly simulate many different modes of sensory neural hearing loss. For instance, we can disable the outer hair cells in the peripheral model, which contribute to the healthy years sharp frequency tuning and sensitivity to the very quiet sounds. Eliminating them thus leads to broader frequency tuning and reduced cochlear amplification as illustrated by these simulated auditory nerve fiber tuning curves. We can also simulate inner hair cell loss, producing dead regions along the cochlear frequency axis where the ear loses sensitivity to particular frequencies. We can also simulate auditory nerve fiber loss, which has recently gained a lot of interest as a possible mechanism for hidden hearing loss. To model a healthy year, we might have 20 or so auditory nerve fibers innervating each inner hair cell, and you can see that there's some spiking activity quite nicely encodes temporal changes in the inner hair cell membrane potential. To model the impaired ear, we can simply sample fewer spike trains per inner hair cell, which effectively degrades the fidelity of temporal coding in the periphery. So today I'll primarily be showing results for networks with simulated outer hair cell loss. As I showed before, the network trained and tested with the healthy cochlear exhibits normal hearing human like performance when tested on the same stimuli. We'll now take that same network and switch in the cochlear model with outer hair cell loss. To model the same stimuli, the network's performance tanks. Of course, we made kind of a silly mistake here in experiment with hearing impaired people. It is important to present stimuli loud enough for them to hear. We make the stimuli 30 decibels louder to put them in the audible range for our networks. We see that the network can still do the task with the impaired ears, but not as well as it could with the healthy ears. Consistent with hearing impaired humans, the network's performance is still pretty good when the stimuli contain no noise but rapidly falls off as the noise levels increased. The scatterplot below compares corresponding data points between the normal hearing and the hearing impaired model. Now let's see what happens when we allow the network to fully adapt to the impaired ears. Retraining the network with the impaired ears produces a remarkable recovery. Performance is almost as good as the normal hearing model, indicating that the peripheral representation following outer hair cell loss does actually contain sufficient information to support close to normal hearing level performance. As an additional experiment, we also tested these networks in stationary and fluctuating masking noise. We see that the normal hearing model does receive a benefit from fluctuating maskers. Consistent with hearing impaired listeners, switching in the impaired ears at test time greatly reduces that benefit. However, training with the impaired ear restores the network's ability to listen in the depths of fluctuating maskers. Taken together, these results suggest that peripheral auditory representation following outer hair cell loss is capable of mediating near normal hearing level speech recognition behavior provided the sounds are presented at an audible level. That's perhaps the reason why humans without a hair cell loss exhibit impaired speech recognition is because of their central auditory system is insufficiently plastic. So now briefly show some preliminary results of similar experiments with simulated auditory nerve fiber loss. Once again, the normal hearing model is the same as before, the network trained and tested on a sudden spiking activity with 20 nerve fibers per inner hair cell exhibits good speech recognition performance and noise consistent with normal hearing humans. Losing 25% of auditory nerve fibers uniformly across the cochlea no retraining of the network produces only slight deficits, losing 50% of auditory nerve fibers rules results in a severely impaired speech recognition. The networks are virtually unable to recognize any words if there is noise present and that performance and quiet is also quite poor. We also see that this type of hearing loss also reduces the benefit networks get from fluctuating maskers, though here it is a bit difficult to see given how poorly the network is doing overall. So far, all of these auditory nerve fiber loss results are from networks that were trained with 20 auditory nerve fibers per inner hair cell and tested with fewer. We're still actively looking at the extent to which these deficits can be mitigated by a maximum plastic auditory system, ie by training the network with fewer nerve fibers. So without but for today though, with deep neural networks we have for the first time systems that can recognize speech from audio as well as humans deal and can thus model directly be complicated auditory behaviors that are most pressing for hearing loss. We saw the behavioral characteristics of hearing loss seemed to emerge where there's a mismatch between the cochlea that a system was optimized for and the one it is forced to use. Allowing networks to adapt their impaired input leads to remarkably unimpaired behavior in the case of outer hair cell loss, perhaps suggesting that infinitely plastic central auditory system could actually overcome the outer hair cell loss. Future work will investigate more plausible modes of hearing loss, especially simulating more typical combinations of outer hair cell inner hair cell and auditory nerve fiber loss. We're particularly interested in the possibility of using network predictions to improve diagnostics for inferring types of cochlear damage from behavior. And with that I'd like to thank my collaborators, Janelle Andrew and Ray, my advisor Josh Dervitt and funding sources who made this work possible. Thank you.