 Hello and welcome to my talk about estimating the distortion component of hearing impairment from attenuation-based model predictions using machine learning. When you're sitting in a restaurant and talk with your friends and family, you typically do not have any problems at all to understand them, so you can socially interact with them. People with hearing impairment face a more difficult challenge, since they do not understand everything that is being said, so they cannot socially interact with the others. If we now use a model to predict how a normal hearing person is understanding the speech, we can also use the same model and make it hearing impaired to simulate what the hearing impaired person is understanding in order to figure out what might be impaired in their people and why it is that they do not understand speech as good as normal hearing listeners. The test that we are using to figure this out is the so-called metric sentence test, at least the German version of it. It always consists of five words for the German version that is the name, verb, number, adjective and noun, which can be combined into nonsense sentences like Peter gets four green knives. And with this test, we can measure the so-called speech recognition threshold, the SRT, which is the signal level or signal-to-noise ratio with a 50% word recognition performance. This test is available in more than 20 different languages, for example German, Swedish or Japanese. Back in 1978, Plom developed the concept of the attenuation and the distortion component. If you depict the SRT and DVSPL on the Y-axis across the noise level and DVSPL on the X-axis, you can see this pattern here. The dashed line indicates an SNR of 0 dB. People with normal hearing have no problem understanding speech and calm environments, although their normal hearing absolute threshold defines their listening performance there. But as the noise level increases, then the SRT is defined by the noise level. The same holds true for people with an A component of hearing impairment, so an increased absolute hearing threshold, which limits their ability to understand speech and calm environment, but they can get as good as normal hearing listeners in noisy environments. People who suffer from a distortion component of hearing impairment have a more difficult task, since they have more difficulties to understand speech and quiet, and they also have more difficulties to understand speech in noisy environments. What we did to estimate the D component was that we took SRTs measured in a stationary noise, which was presented 65 dB SPL, from 315 ES. So in this plot, we were around here. If you now depict the hearing threshold across the frequency, we had three different groups. Group A, who had better audiograms, so normal hearing to slide, moderate hearing impaired, and Group B, with moderate to severely impaired hearing listeners. Group E consisted of special cases with deeply sobbing hearing losses, so they were close to normal hearing for low frequencies, and rather deaf for frequencies from 2 kHz upwards. If you depict the SRT here on the y-axis across the average hearing loss from 0.5124 kHz, we can see these patterns emerge. People with a better audiograms performed the task in noise, so they could hear the noise, although their SRTs also increased with the average hearing loss, which cannot be due to the maskers, since they heard the maskers, so it must be due to something else. People of Group B performed worse, so their listening performance on their SRTs were defined by their episode hearing threshold, so they performed the task rather than quiet, and then there was Group E, who was well in between that for low frequencies they could definitely hear the noise, but at high frequencies they could not hear the noise, so they are out of the scope for these two patterns. Our model assumption was that if we just use the absolute hearing threshold for the simulations, then the prediction error of the SRTs reflects to some extent the D component. To examine this we looked at three different models, namely the Speech Intelligibility Index, the SII, a modified version of it, termed PATH, since it was inspired by the work of Pavlovich et al. from 1986, and the Framework for Auditory Discrimination Experiments on Short Fate. The SII works as follows. We take the speech, noise, the audiograph, and calculate the speech spectrum level from speech, and from noise and audiogram some kind of disturbance spectrum level. These two are weighted against each other, then summed to determine an index value which has to be mapped to an SRT. Modification of PATH was just to use the audiogram to weight the frequency weighting, which we did, but we also normalized the frequency weighting afterwards. After that, the output was weighted again to determine some kind of index value which was mapped to an SRT. Fate was inspired by automatic speech recognition systems and works as follows. First, you take speech signals and noise and mix them together at different signal-to-noise ratios. These mixtures can then be processed with some kind of hearing aid algorithm, although this was not done in this work, and after that features are extracted. These were based on the so-called LOCMAIL spectrum, so spectrotemporal representation of the signal, which can also be modified to include hearing impairment. After that features were extracted from the LOCMAIL spectrum, which are sensitive to spectrotemporal modulations. These features were used to train an automatic speech recognition system which was based on a hidden Markov model with Gaussian mixture models and the same signals with different excerpts of the noise were used to test the ASR systems. After that, the performance was evaluated by the means of psychometric functions, so here you can see the recognition rate depicted across the signal-to-noise ratio, and we find the SRT, so we're 50% of the words understood, at about minus 90B for this example. If you now depict the prediction animals, here are the y-axis across the average hearing loss, so our estimated distortion component, you can see this pattern for the SI. So for low average hearing losses, the SI performs good, well, which is as expected. For slightly higher hearing losses, the SRT performs too good, so it predicts too low SRTs, but as the average hearing losses increases, the predictions get worse, so higher SRTs are predicted higher than already measured, which indicates that the SI already considers too much in its prediction such that more impairment in the form of some kind of distortion component could not improve the predictions, but only make it worse, since adding more hearing impairment will actually make the SRTs worse, and therefore the SI is already, well, not modifiable anymore to consider this D component in its prediction, and that means that this is actually no valid prediction or estimation of the distortion component. This pattern looks very much different for Pavlovich modification, since Pav always predicts too low SRTs, so it's always better than the measured data, but also these data increased linearly, so what I did, I used these blue data points to fit a linear function too, and this linear function describes all the different SRTs that were measured here, especially for the steeply sloping group yet depicted in red. This means that the A component was accurately considered. Same pattern was formed with fate, so linearly increasing prediction errors for them since the model misses something, as well as Pav misses something, some kind of form of hearing impairment, to make accurate predictions, and since these prediction errors remain for both models, which were completely different, but accurately taken to count the contribution of different frequency bands to speech recognition thresholds, that seems that these prediction errors indicate the D component. The nice thing about fate, however, that also extends to other conditions, like speech and fluctuating maskers and also psychocyclistic tasks, like the simulation of tone and noise detection experiments, or also to binaural tasks, so this approach is applicable to many different problems that we might want to predict or to simulate. Another nice thing is that these approaches directly support diagnostics in a general and an individual sense, general by just measuring the hologram and seeing while the person has some kind of hearing ever-tearing loss of 90 dB HL, so most likely the D component is in the range of 10 dB. It can also be used on an individual level by making an individual prediction and looking at the difference between attenuation component-based prediction and the actual measurement and regarding that as a D component estimate. With that, I would like to thank you for your attention and I'm looking very much for having a nice discussion with you. Thank you.