 Hello everyone, my name is Clément Goutier, I work with Tobias Göring on new speech enhancement strategies for cochlear implant listeners. More specifically in this presentation, you will hear about our research project using single or multiple microphone deep learning approaches to tackle the joint problem of noise and reverberation for cochlear implant listeners. The cochlear implant is a neuroprostatic with several components represented on the animation on the left. One of them is an electrode array used to restore a sense of sound by electrically stimulating the auditory nerve in place of the damaged sensory hair cells. People with profound sensory neural hearing loss listening with a cochlear implant can achieve good speech comprehension in quiet even without deep breathing. However, there are still strong limitations in challenging listening situations with noise and reverberation. This is where our research project lies and especially as previous work reported that interactions between noise and reverberation are likely to impair speech intelligibility for people with cochlear implants. Previous work also reported significant improvements in speech intelligibility for cochlear implant recipients using deep neural network algorithms. That's why with this project we try to establish whether more realistic listening situations with noise and reverberation can be mitigated using more advanced DNN speech enhancement with one or more microphone to improve speech perception with cochlear implant listeners. For the single microphone algorithm in our study we chose a typical approach where from degraded speech example the algorithm learns a model to estimate enhanced speech. This is done using the dual path RNN time domain audio separation network where the algorithm first forms a sound representation of the time domain noisy reverberant speech example with the encoder block on the left. Then estimates a mask on that sound representation with the masking network the middle part before transforming it back to time domain enhanced speech with the decoder block. For the multi-microphone approach this time from multi-channel degraded speech examples the algorithm learns a model to estimate enhanced speech signals. This is done using the filter and sound network architecture or FASnet. This architecture which we can also call a neural beamformer learns from representation formed of 2D sound representation and cross-microphone features some time domain spatial filters which are then applied to get hopefully enhanced more intelligible speech. As we chose a data-driven approach for this study we trained the algorithms on 6000 simulated sound scenes using target speech and rooms from the clarity CEC1 data set, noises from the one data set and trained the different algorithms to remove both noise and reverberation. We compare three different cases where we have access to either one microphone, two microphones placed unilaterally, or six microphones placed bilaterally with three microphones on each ear of the listener. We decided to evaluate the models on mismatched target speakers using BKB sentences and 16 talker bubble noise as well as unseen rooms from the clarity CEC1 data set. We also used cochlear implant simulations to process our estimated speech signals before computing distortions and predicted intelligibility scores that I will present you next. We compared the three approaches over the unprocessed noisy reverberant signal and an ideal speech enhancement condition. On the x-axis here lower value represent noisier situations. On the left graph you see a measure of signal to distortion ratio and on the right one the normalized covariance measure used in previous study as a proxy for speech intelligibility with cochlear implants. For both metrics, the higher the better, we see possible benefits for all methods and especially for multi-microphone conditions at lower SNRs. We also ran a listening study including 15 volunteers with typical hearing using cochlear implant simulation and are also collecting data with cochlear implant listeners. We are aiming at testing 12 volunteers for that group. For both groups we compare the 50% speech reception threshold measured using an adaptive procedure. On this slide you see individual results for the group using cochlear implant simulation where on the x-axis you have the volunteers with their average results denoted by M and on the y-axis you have the speech reception threshold where the lower is the better. We compare the 5 different conditions and if the results are a bit mixed with the single microphone approach we note that almost all of them improved their SRT with the 2-microphone approach and they all get their best SRTs with the 6-microphone approach. Let's look at the results from the first 8 participants that volunteered from the cochlear implant listener group. Results are for now a bit more variable but 4 of them improved their SRTs with the single microphone method. 6 of them got better SRTs with the 2-microphone approach while all of them improved it with the 6-microphone approach. Let me present you finally the statistical analysis we ran for the group using the cochlear implant simulation. We see a significant effect of the tested condition with repeated measure analysis of variance and highly significant differences between conditions with Bonferroni corrected pair-wise comparisons except between the unprocessed and the single microphone case. I am not presenting any statistics for the cochlear implant listener group as we are not yet finished with the data collection but we can see similar trends with the results and probably superiority of the multi-microphone approaches. To conclude we showed a clear superiority of the multi-microphone deep neural network approaches in improving speech reception thresholds for more realistic challenging listening situation with both background noise and reverberation. These are promising findings for cochlear implant listeners as even only with 2 microphones placed unilaterally we reported improved speech reception threshold with still bilateral processing potentially giving larger benefits. Mixed results for the single microphone case possibly coming from the joint task here of denoising and reverberation motivates further investigation as well as for example work studying the effect of such speech enhancement strategies on the auditory perception of the acoustic environment. Thank you very much for your attention.