 Hi everybody. Thanks for joining the talk on a hearing test using smart speakers, speech audiometry with Alexa. My name is Bernd Meyer from the communication acoustics group in Oldenburg and I'm happy to present this research that is a collaboration between Jasper Oster, Melanie Krüger, Jörg-Hedrich Wach, Kirsten Wagner, Vegard Kolmeier and myself. So the topic is speech audiometry. So I'm going to briefly start with this. It can be used for measurements, for diagnosing hearing impairment, for fitting, psychoacoustic experiments as well. And the target here is to find the signal to noise ratio at which 50% of the words are correctly recognized by a listener. And what we do is to use a dynamic measurement procedure in which the SNR is adapted. If the listener gets more than 50% of the words correct, then the SNR is decreased, the task is made harder, and the other way around if less than 50% of the words are correctly identified, we want to converge to the 50% word recognition rate. And usually the responses by a listener are logged by an audiometrist. So a person shown here and he or she logs the responses in a graphical user interface. In a previous study, we have shown that such logging can also be performed with an automatic speech recognizer. So this illustration here shows the typical components of such a recognition system using ASR features, a deep neural network, phoneme probabilities and a decoding stage. So we get transcripts and that could replace the responses logged by the human supervisor. In this study, we have shown that this actually produces results that are very close to the clinical measurement procedure. But we were also curious if you could take the next step and take the measurement procedure to the living room. So here I show the experimental design of the study that I wanted to cover in this talk, which is about smart speakers as the title said. So on the one side, we're looking at clinical speech audiometry in a double wall booth. And on the other hand, we're investigating a smart speaker audiometry in a simulated realistic room setting. And so there are many parameters that are different here. So there's reverberation. We don't have control over the sound pressure level. For copyright reasons, we're not using the regular matrix test, but synthetic speech. And also, of course, there are potential errors from the automatic speech recognizer, which could affect the SRT measurement. So this is what I will talk about in the following. So here's the structure of the smart speaker app or skill, as it's called in the Amazon Alexa world. And we're using the Alexa speakers to perform the dynamic procedure of the listening test. So when you first start the test by saying Alexa start hearing tests, or this works for German smart speakers currently, so you have to say Alexa start a test. And then the smart speaker or the app on your phone will read instructions. And after that, the stimulus presentation starts. So the listener then should respond to what he or she recognized. And the automatic speech recognition component will then transcribe and search for specific lexical patterns, and then put out what keywords from the matrix test were correctly identified. And from this, we can calculate the response for and then adapt the signal to noise ratio. And as for the usual test, we repeat this 20 times. And then at the end, we have an estimate for the signal, sorry, for the speech recognition threshold. Now there are different acoustic conditions that we covered in the study. And these are simulated rooms, they were simulated in the in a room called the communication acoustics simulator, which is a room with microphones and loudspeakers on the walls and a powerful computer that can basically simulate larger rooms than the communication acoustics simulator itself. So what we did was to look at the living room, a poor classroom, and and at a concert hall all with different reverberation times. And we covered these in the measurement sessions with the listening participants who we invited. And there were two sessions, session one, and session two, where we shuffled the order of these rooms, we don't have a systematic effect of those. The second thing that we varied was the hearing loss of the user groups. So the participants we invited range from young normal hearing listeners to older normal hearing and listeners with a mild or moderate hearing loss. The first thing that we looked into with the recognition error rates that were produced by the Amazon Alexa's automatic speech recognition engine. So on the y axis, you see the error rates. And then we show violin plots for the different listening groups. And there's data shown in red and blue. And these are score deletion errors and score insertion errors. So score implies that these are matrix words that are contained in the matrix sentences. And whenever one of these words is deleted or inserted that might affect the SRT measurement in the end. And as you can see, for the deletion rate data shown in red, the median values, which shown by the white circles, they are at or below 5%. And the insertion errors are around 2.5%. But the question, of course, is how does this affect in the end the reliability of the tests and how is the SRT affected. And that's what we looked at next. So first, we looked at the SRT differences or also the subject standard deviations in terms of the SRT across the rooms. So blue is living room, green is poor classroom. And yellow is concert hall. And for the interest subject standard deviation, when conducting the test with a smart speaker, we didn't see a large effect between the rooms. This was range from approximately 0.8 to 0.9 dB. But we saw a significant effect regarding the bias. So the errors that the ASR system makes introduces a specific bias. And that's different for the three rooms that we analyzed. But the effects are not huge, as you can see. So the deviation ranges from or the bias ranges from 1.1 to 1.5 approximately. Next, we compared the interest subject standard deviation when using a smart speaker to the interest subject standard reference. So that's the clinical measurement shown in the middle. And this we did for the listener groups that I talked about earlier. And we also investigated what's the overall bias between the clinical measurement and the Alexa based measurement. And when you compare the bars on the left side with the bars in the middle, you see that the standard deviation is elevated when using the smart speaker. But it's not a huge effect. So the highest difference that we see is from 0.49 for the red bar in the middle to 1.09 when using the Alexa speaker. We do see a bias, which also depends on the listener group. So interestingly, for the young normal hearing listeners, we see the largest bias over 2 dB. And then the bias is reduced when down to 0.70 dB for the listeners with a moderate hearing loss. At the end of the test, so people should, when they use it at home, they should probably not be informed about what their SRT is because for most people, this value should be meaningless. So we asked ourselves what should be the feedback that is provided based on this screening procedure. So the binary decision here is the question, is the SRT that was measured significantly different from, you know, 95% range of the young normal hearing listeners and the SRTs that we observed here. And for this, we can specify sensitivity and specificity. And we can also calculate the UDIN index, which takes both of these measures into account and waste them equally. And this is what's shown here. So on the access, you see the SRT that was measured with a smart speaker and the corresponding decision threshold. And when we use this criterion based on the UDIN index, we still get an area under the curve of 0.95. So when we reduce our decision to this binary screening result, SRT elevated or not, then we get this 95% score. So to conclude this talk, when we use an Alexa skill to conduct speech audiometry using matrix sentences, then we see around a 1 dB intrasubject standard deviation, which is true for different rooms and also for different subject rooms. We do see a bias that decreases with increasing hearing loss, which is a result that we haven't completely understood so far, but we're looking into this. And if we reduce it to a binary recommendation that is given at the end of the test, we still get a good classification performance with 0.95 area under the curve. With this, I'm at the end of the talk. Thank you for listening. If you want more information about the skill that is online since September last year, please check our website, CAULDE, slash Alexa, test my hearing. And please try the skill if you'd like to and test your hearing. Thanks for listening and hope to see you soon. Bye-bye.