 Okay, so good morning Topic is signal processing for hearing aids my name is Rainer Martin I'm a professor of communication acoustics at the University of Bochum, Germany If you don't know where this is Bochum It's close to Dortmund and all soccer aficionados know where Dortmund is And I'm It's on yeah, I hear some Can you hear me clearly Okay so in my presentations, I will Explain some aspects of signal processing for hearing aids and hearing instruments more generally I have to say that Research in this area is truly Interdisciplinary so there's a lot of different aspects and details and it's a highly impossible to cover all of them so I will try to do my best to give you an overview and go To some detail at some points, but I also invite you to ask questions if you feel that I Treated some aspect to not in depth or you leave some questions open Okay, so before I start I would like to acknowledge all the contributions from staff and students of the Institute of Communication acoustics in Bochum and especially my former PhD students Colin Breithaut, Nilesh Madhu, Timo Gagman and Dirk Mauler who worked on most of the Topics that I will present today A brief outline of my presentation after a short introduction I will have one section on fundamentals of hearing and hearing instruments We will talk about then about spectral analysis and synthesis for processing acoustic signals and hearing aids And finally and that's the largest section on signal enhancement for hearing instruments because speech reception in adverse acoustic conditions and adverse means noisy and Possibly reverberant acoustic conditions is still the number one issue that also users have to deal with when using hearing aids and there will be summary and There will be also breaks in between as We had yesterday Okay, so what's a difficult Listening scenario is a prototypical Scenario is the cocktail party And at the cocktail party you would assume that everybody is happy happy given the booze or the Racky on creed But if you look at these people you you see a lot of happy faces but there's one that is not very happy and The most likely reason is that this person suffers from a hearing loss and thus feel socially isolated because everybody else is talking there's a lot of noise around and it's hard to communicate in such a situation if there's a hearing loss and This quite a widespread Problem here are some statistics on the hearing loss prevalence and also the hearing aid adoption rates in Several countries Germany UK France and the United States and so as a Estimate of that roughly 10 to 13 percent of the population has some hearing problem and There's also evidence that Only a third of these are actually using hearing aids So that's good news for the hearing aid companies and manufacturers. There's still a lot of room for Sales and improvements the situation is very similar across different countries as you can see from this statistics. So here Green the blue bar is a hearing impaired percentage then the hearing rate adoption rate is in red and in green the hearing aid adoption of The stated hearing impaired population, which is only then roughly 30 percent Okay Some facts about the year you probably have They're using it Well the green one is The red one just as a percentage of those who are hearing impaired So if you look at the red one, that's roughly one-third of the blue one Yeah, and the green bar here is this one-third So the green is the hearing aid adoption in percent of the hearing impaired Yes It's not entirely clear, but I think It's a bit redundant Okay, so some facts about the year you probably have seen that it's essential for our Purposes now here. We Essentially look at the outer ear with a pinner the eardrum and the ear canal and the middle ear with the malice, incos and state piece and Then the inner ear where the cochlea and the sensory organ of hearing is located and the very prominent feature of the ear is The spectral analysis that it performs Or so known as frequency to place mapping so different sections within the cochlea Sensitive to different frequencies. I have prominent places for different frequencies and it starts here at the base so at the place where the The staples interfaces with the cochlea With the high frequencies and then as you go along the cochlea the Bezillay Bezillam membrane you go to a lower frequencies and this is Interesting feature and a very important feature of our ear and also it's gives you an indication of why The typical hearing loss is a high frequency hearing loss Especially age-induced hearing losses very often a high frequency hearing loss because here at this point this is a point of entry for the sound waves and so the wear of or the Trauma and have when you listen to a very high levels of noise for example Is the largest here at the entry of the cochlea and where the high frequencies are low located so that is one explanation why age-induced Or age-related hearing loss is Often a high frequency hearing loss because this is the first part of the cochlea that is hit by the sound waves Whenever you listen to for example loud noises Yeah, and you repeat the question which part is all of it. So along this Bezillam membrane you have No fibers connected to the various places here on the Bezillam membrane and to the hair cells on the Bezillam membrane and they all go to the The along the auditory pathway to the auditory cortex it's really like a spectrum analyzer and The it delivers so the inner ear delivers spectral information To the auditory system to the upper auditory system The ear has some very remarkable properties in terms of a dynamic range for example so here's a plot of the hearing surface On the X axis you have the frequency range from 16 Hertz to roughly 16 or 20 kilohertz and here is a dynamic range from minus 10 to a hundred and twenty db and In the different colors you see the typical Range of sounds and how they are located Within this diagram, so speech ranges typically from a hundred Hertz to Well 14 kilohertz music has a larger range and Here is the threshold of hearing and you can only hear sounds above the threshold of hearing and then also very important is a Limit which is a threshold of pain. So if you really feel pain, then you are in a situation where you should not be so you're listening to very loud sounds but there's also a range below that threshold of pain where frequent exposure will harm your hearing and That is this area called hazardous sounds here above this dash line and And the damage that the ear suffers is a function of the intensity or the level and the duration that you're exposed to it The normal hearing ear has quite remarkable capabilities in terms of Speech reception of speech understanding There's a diagram taken from a very nice book of plumb the intelligent ear Where you can see the percent of sentence correct in an intelligibility task so from zero to a hundred percent for two types of noise here steady state noise and here competing a competing voice and You can see that even for a steady state noise you can You have like 50 percent intelligibility At negative signal to noise ratios, which is quite remarkable and for competing voice You can even go to much lower signal to noise ratios Simply you have because you have these gaps in between the words and the syllables of the competing speaker Sometimes called glimpses and you can listen or the auditory system Makes use of these gaps in between the interfering Voice and picks out the information of the target voice and for that reason You can go here to even lower signal to noise ratios and still have a fairly high speech correct percentage Okay, so now that was normal hearing When you have a hearing loss the situation is much different so here now is This End a diagram where we have the frequency here on the horizontal axis and the hearing loss now on the vertical axis so zero means no hearing loss and it can go down to a hundred and twenty DB and Here you have typical phones For example vowels or here Fs fricatives and Essentially the center of gravity in terms of frequency and also their relative level and Also on the scale you find the different hearing Grades of hearing loss so from normal here in this range up to minus 25 or 25 DB to a mild hearing loss moderate moderately severe severe and finally profound and Obviously if you have a hearing loss of say 30 DB, which is still a mild form then you will miss for example the high-frequency sounds fs and Then of course important information That will allow you to Understand certain words and parts of speech and if you have a larger hearing loss like 60 DB Then you will miss out on almost all speech sounds and well Yeah, you can listen then still to not so interesting noise sources like chainsaws and on mowers, but that is Not really a pleasure Okay, so hearing loss is classified In terms of degree of hearing loss from mild moderate moderate to severe profound and finally deaf and Hearing losses often caused by exposure to loud noise and Music obviously you probably all had this experience going to some jazz club or Rock and roll club or whatever in a basement somewhere was reflecting walls concrete walls and then you go out and you have temporary Tinnitus or some sounds in your ear You should not do that too often But also aging effects as explained earlier genetic disposition this position infections drugs So some anti biotics are autotoxic They should not be used but have been used in some countries in the past and Here's an quite interesting Scan of the hair cells in the inner ear on the cochlear Of the outer hair cells your three rows of outer hair cells In the inner ear This is from a healthy person and this is a damaged inner ear and it really looks like Hurricane has swept through The inner ear and lots of hair cells are lost and then of course the sensory function of the ear is Lost as well Okay, so there are also different types of Hearing loss conductive hearing loss that is usually related to problems of the outer middle ear So the transduction of the bones of the ossicles in the middle ear Which have the function to translate the airborne sound into liquid Born sound within the cochlear They do not function properly Conductive loss and attenuation of 20 to 50 dB with a relatively flat frequency response And here I've kind of indicated this in this diagram again. So here is the Air conduction which then shows a loss which is ready for flat in frequency and also typically the audiologist will measure the bone conduction. So the sound is inserted by as a skull and For conductive loss Typically, you don't have then a loss in bone conduction because your inner ear your cochlear is still functional So you don't see this hearing loss in bone conduction, but only in the air conduction And that is a clear indication that you have a problem In the The most common form however is the sensory neural hearing loss where the Cochlear and the sensory function of the cochlear is damaged or the auditory nerve is damaged and that's often related to a Loss or damage of inner or outer hair cells in the cochlear on the basilar membrane or in dead regions on the cochlear and It comes along with attenuation also with a strong frequency dependence because you have this tonal topic or frequency to place organization within the cochlear and Often comes along with some form of tinnitus. So some noises that you hear some sinusoidal noises or oscillating sounds and the audiogram then looks A bit different from the previous one. Now we have a clear sloping hearing loss here so a higher loss at high frequencies and fairly good residual hearing at Low frequencies and the difference between the air Conduction and the bone conduction is relatively low because now it doesn't matter why you insert the sound by other bone With a skull or by the acoustic signal It's a problem in the cochlear. So You would see a similar hearing loss for both modes of Action Okay, then we have conductive and sensory neural hearing loss and then there are all kinds of central hearing disorders localization sound localization disorders And all forms of mixed hearing loss of course as well So what are the consequences of being hearing impaired or the of a hearing loss? So the most obvious one is that the threshold of hearing is elevated So the is increased so soft sounds are not heard anymore as explained previously Speech intelligibility will be insufficient. So you have to ask well, what did you say and and at Some point your partner will tell you well, you better get a hearing aid and luckily this Elevated a threshold of hearing can be compensated by amplifying the signal so that it's again above the level or the threshold of hearing and So the amount of amplification is derived from so-called fitting rules It's usually now a software program which is used by the hearing aid dispenser or the audiologist to Fit your hearing aid and to determine the appropriate amplification from your audiogram, which is typically frequency dependent Then because you amplify everything unfortunately the level of discomfort or the level of pain the threshold of pain is not elevated and even it can be even Decrease a bit with when you have a hearing loss, which essentially means that your dynamic range that you have available Is much compressed when you have a hearing loss. So whenever you amplify Sounds in a hearing aid you have to make sure that you're under no circumstances Go across the pressure a threshold of discomfort or even the threshold of pain So that means you need some compression within the hearing aid to make sure that You amplify the soft sounds but also Limit the maximum level for the loud sounds that still need to be represented, of course so that is The consequence of the reduced dynamic range that you need a compression algorithm within the hearing aid and Then there is something called the recruitment phenomenon which is a strong or describes a strong increase of loudness above the threshold of hearing because at very loud levels often your Loudness the loudness that you Perceive is very similar to normal hearing people but In order to make this transitions you have a very steep gradient of loudness When you come close to the threshold of hearing that's called recruitment well here another consequences often is That a hearing loss goes along with Loss or reduction of spectral and or temporal resolution in the inner ear That means that speech sounds are loud enough, but they're still not intelligible So even if they are amplified above the threshold and according to the prescription and the audiogram It sounds may not be intelligible and That's related to widening of the filters that are represented by your Implemented in the cochlea and they are widened when you have a hearing loss and so sounds are smeared out over frequency and Cannot you don't have that sharp filter or filtering function of the inner ear anymore and That is essentially means that speech communication in noisy environments is severely degraded because you cannot differentiate very well or the the ear cannot Telepart the target signal and the noise it's all mixed together and that means that your your ability to listen to noisy signals and understand speech in noisy situations is severely degraded and unfortunately and unlike this Elevated threshold of hearing which you can compensate by amplifying the signal. This is an effect that you cannot easily Compensate there's no inversion so that you can apply an inverted Or run your acoustic signal to an to the inverse system and then Cancel these effects so there's no direct compensation of these effects as possible and the only or the best strategy really is to improve the signal to noise ratio in noisy listening situations so that the loss of spectral resolution is somehow compensated also it helped of course to Talk clear and slow But that's not always possible of course those speech enhancement and noise reduction pre-processing is very important for successful application of hearing aids and the main reason is the loss of Spectral and or temporal resolution in the inner ear. Okay, so here are some hearing aids The lifestyle wave has also Reached medical technology, so they come now in fancy shapes and colors different designs Here's for example a hearing glasses with up to four microphones on each side so you can really implement powerful beam-forming algorithms They all have in common that they have to operate with a small battery and That is one of the big constraints, so you don't have the computational power of your PC or even your smartphone it's much much lower and Users don't want to change batteries all the time so requirement is that the battery lasts at least For a day, but for hearing aids you would even require three to five days, so Or even a week So that is very important to design all the signal processing functions in a very efficient way Power efficient way. That is one of the key issues some Insights into the historic evolution so 20 years ago all the hearing aids were analog so there were some filters for tone control Most hearing aids two three four five filters to also provide frequency Dependent gain according to the audiogram you need to amplify Sounds and different at different frequencies by different Levels knowledge compensate for the hearing loss. There's usually some gain control at the microphone attached to the microphone in order to compensate for different levels of sounds around you in the environment and Here, yeah, the amplification and the receiver or the loudspeaker the hearing aid domain Sometimes you talk about the receiver. Yeah then the next step was that the Bustic signal processing was still analog so Unwire analog filters, but they hadn't the hearing aid had a digital interface so for programming these filters and Reading out some data So you could attach a PC for example the fitting session to the hearing aid in order to adjust All the different parameters Then for now roughly 20 years 15 to 20 years we have digital fully digital hearing aids where all the processing is done in the digital domain this Quite natural to us now, but this was quite a major move In the industry to go from the analog to the digital it started with the high-end the expensive devices, but now roughly 95% of all Devices sold are digital Yeah, again some different form factors sizes and shapes So you have some very small ones that go completely into the ear canal. They don't provide a lot of Education they still provide a lot of amplification I should say but not the maximum possible amplification because also the battery is relatively small in these aids then in the ear canal in different Shapes and in the ear and then the typical behind the ear devices and finally very large pocket devices which then also provide a lot of power and Use larger batteries again here some information about the maximum gain that can be achieved with these devices and Battery type and the capacity just some data Again, the expected battery life is three to ten days Otherwise users will not really like it and also it becomes expensive for them to to operate it batteries often The computational power That's just an estimate because the implementation is also in terms of Hardware software co-design system on chips Designs but roughly a hundred to two hundred mega operations per second So compare this with the giga operations you have on your PC. Then you know what we are talking about here but of course that is also increasing as technology and semiconductor technology and all this improves So this requires obviously an optimization on all levels so the algorithms the methods You need to deal with fixed point designs. So you don't have a floating point 64-bit Arithmetic Arithmetic unit on your processor, but it's all optimized for word lengths Typically 16 to 20 bits Which also means that you need to optimize or re-optimize your algorithms to Run under fixed point conditions memory footprint. You don't have giga or terabytes of memory kilobytes maybe and hardware firmware components and Industry trend is to have More software in these devices and less hard-wired functions Yes, yeah, that would be one way to Increase the additional power of the overall system to have like a smartphone or Another device that does all the computations But still then you need to send or stream the signals the acoustic signals back and forth And you need to do this verse very low latency. I will come to this later and That is a bit of a problem. So you cannot do everything in a smartphone or third device that You have available Unless you can solve the latency problem of the transmission The battery problem and because you have to see okay, so here is Look into a hearing aid some of the components here are the microphones Typically of two of them was an omnidirectional characteristic, but then they can be Combined to form You have here what's called the receiver or the loudspeaker which generates the output signal And it's connected to a tube that then leads into the Your canal you have in most modern devices a coil for wireless connectivity And while here, this is all the DSP stuff here. There are also different ways of how this is designed. This is Company that is providing this one. They have this All flip chip technology. So one chip is all has all the analog components and Another chip has all the digital components. So you can update these with independent advantage of the advances in digital Design They're just glued together. So this is just like the size of a grain of salt and contains all the signal processing functionality As I said before a trend is to put more functions into software and less into hardware Years ago Everything was cast into full custom asic Simply for power constraints and to get the maximum computational power least energy but There's now with the emergence of ultra low power digital signal processors. There is now a trend to move more more functions ESP and Implement and in terms of software So here in this diagram Will probably allow you to lower the cost of the device then More functions to the software side and make them programmable and more easily updateable okay, some of the signal processing functions Which I will not treat in much detail, but I Have compiled here in my in the first part of my talk is for example dynamic the dynamic compression which Is the task of which is to adjust the gain to match the reduced dynamic range of the listener And typically that's implemented in several frequency bands. So in the first step you have a Filter bank with band pass filters and in each band you have to measure the level of The acoustic signal and then you have some form of compression characteristics Which typically looks like this here with a knee point and so in the lower range you amplify the sounds that these are the soft sounds and in the For the higher levels you have to compress in order not to go beyond the level of discomfort So that is a typical compression characteristic that is implemented and controls then this multiplier here which regulates the gain for Example band pass filter one here and then you do this in several frequency bands and Add up the signal to produce output signal Typically you have two to twenty channels With the larger number of channels now in modern devices There's no single standardized method, but all the manufacturers have their own Priority methods because that also contributes a lot to the success of That users have with hearing aid the speech ability that you can get out of it is also related much related to the compression especially when listening and quiet very nice Development in the past ten years hearing aids with so-called open fitting so in the old days you would have Earpiece that would more or less close the ear canal and you would need that mostly for Feedback suppression, so I have again a look maybe at this slide here components of a hearing aid So here you have an amplifier and loudspeaker which puts out a gain of say 60 decibels and Here are your microphones just Millimeters away from this Loudspeaker so obviously you have a feedback problem here via the device itself and Also the acoustic path From the loudspeaker into the microphones and to avoid the acoustic feedback In the old days hearing aids came with Earpiece that would more or less close the ear canal in order not to have sound leaking back from the ear canal into the microphones otherwise you would have these Disturbing howling noise or beeps or whatever and that was very annoying to users and And also to people close to hearing aid users Now with the emergence of powerful Feedback cancellation algorithms manufacturers were able to get away With the earpiece and have an open fitted system where you just insert a tube with a little tip to hold it in the ear canal into into the ear canal and This is not possible still not possible for the most powerful devices but at least for moderate hearing loss you can use moderate amplifications and That is of course a lot more comfortable to wear than the old ones or the ones with a closed or almost closed earpiece and it also helps to Improve the own voice reproduction. So when you listen to your own voice And you just an experiment that can make easily if you close your ears with your fingers and then listen to your own voice It sounds different When the ear canal is open then your own voice reproductions much more natural better But as I said it requires a powerful feedback cancellation Okay, so here is a kind of Not really signal processing model, but that would come in a few seconds of an open fitted devices But what I want to show here with this small experiment that we will listen to in a Seconds is that open fitted devices require a very short processing latency. So why is that? well, mostly because the sound The speech sounds Will be picked up by the microphone of the device then will be amplified and inserted into the ear canal But since the ear canal is more or less open the speech sounds will also be picked up by the Ear the panic Membrane directly and as you have seen before the typical hearing loss is a sloping hearing loss where Users still have a fairly good residual hearing at low frequencies so especially at low frequencies you will have a superposition of The sound that direct acoustic sound and the sound processed by the hearing aid So you have two times the same signal, but there is a delay There's a latency between the two signals and once you have a delay you get interference between the two sounds and a change in timbre or other effects and Can try to reproduce this here So I fear signal Which a Where put one sound speech sentence on the left channel of the audio system here and The other one bit delayed on the right channel and then we can listen to it and if you listen carefully we will listen that even for very small delays like a two or five or ten milliseconds you will Perceive a change in timbre of the sounds and for large delays Of course, you will have echoes re-burp and all kind of effects, but you can listen to this now first. No delay Only the best players enjoy popularity. We like blue cheese, but Victor prefers Swiss cheese And one millisecond delay between the two Only the best players enjoy popularity. We like blue cheese, but Victor prefers Swiss cheese Who milliseconds five Only the best players enjoy popularity. We like blue cheese, but Victor prefers Swiss cheese Only the best players enjoy popularity. We like blue cheese, but Victor prefers Swiss cheese Now you can already hear that there are two signals a bit delayed The first ones were more change in timbre depends also a bit where you sit in the room, but Now you can hear really reverberant effects and echoes 100 milliseconds and 500 Okay Yes, yeah, yeah, and you listen to the superposition of both and you can clearly see or hear How harmful Delays in the processing are when you listen to both the original and the delayed signal And that means that you have to keep the processing delay As short as possible The latency Yeah, so Typically you try to keep the Latency below 10 milliseconds because Above 10 milliseconds you start to have the really this reverberant impressions is reverberant effects so hearing aids are designed to provide the Latency not larger than 10 milliseconds Might have some Changes of timbre depending on the situation, but Yeah, so The the constraint here. It really is not the synchrony with for example lip movements Which would be also an aspect because you don't want to delay the process signal as So much that you are Asynchronous with lip movements, but that would only be for delays larger than 50 milliseconds and above so here really the constraint is much lower and given by the superposition effects that you need to take care of Okay, so here's the Signal processing model So here's the hearing aid and speech Enters the microphone and noise as well, of course and as the microphone, but also is added here on the acoustic side and That gives rise to then these superposition effects and change of timbre Okay feedback control. I also will not talk in much detail about feedback control There's also a lot of propriety algorithms around The the Task of the feedback control is to Yeah, get rid of the feet strong amplification or the feedback between the receiver and the microphone otherwise your hearing aid will squeal or howl and Conventional solution is to use notch filters to take out The signal at the howling or squealing frequency But this will also give speech distortions because you're also taking out the target signal the speech signal at these frequencies and It's better to use some adaptive filter and cancel out the feedback and By identifying the feedback path here from the loudspeaker to the microphone By means of this adaptive filter and subtracting the feedback here At the microphone input so that it is cancelled out This is an also not a trivial task because Your disturbing signal and your target signal are highly correlated. So if you're familiar with adaptive filters They work very nicely if your target signal and your disturbance is uncorrelated. So target signal in statistically independent Gaussian noise, then they converge very nicely And you are able to identify the unknown Feedback path, but here the feedback is essentially the same signal that is the acoustic signal that you want to listen to so this the disturbance and the target signal are highly correlated and so the standard algorithms to adapt such filters will fail and so you need some E-correlation measures to control these adaptive filters okay, so hearing aids and I slightly like to extend this also to Implantable devices simply to show the full range of possibilities. There are so-called bone anchored hearing aids so your Who is implanted into into into your skull behind your ear and then a vibrating device is attached to it and You're using essentially the effects of bone conduction and that's Typical Measure when you have a conductive Hearing loss. So here this how it looks like There are also middle ear implants where you're attached in Oscillating mass directly to the ossicles in the middle ear. So kind of shake the staples or another ossicle and in this way Provide an amplified sound to the inner ear as well and Finally, they're cochlear implants where you insert an electrode chain into the cochlear into the inner ear and and Provide a direct electric stimulation of the auditory nerve In the inner ear. That's also shown here again in a larger sketch Cochlear implant consisting of a voice processor, which is worn externally transmission coil Which transmits? The information to the implanted device and then you have here a Wire and the chain of electrodes that's inserted by the surgeon into the cochlear and due to these tonal topic or placed frequency to place Organization of the inner ear you can now stimulate different frequency ranges and Thus provide some spectral information which in fact helps Or enables completely deaf people to Hear again to understand speech to even have telephone conversations. So it's quite a remarkable invention and the first successful replacement technical replacement of human sense so quite Remarkable so here again there is A diagram that shows how this electrode chain is inserted into the cochlear There's only three or four major companies which provide these Cochlear implants and the number of electrodes used is between eight and twenty and of course, you need to compare this to the number of Hair cells in your cochlear, which is in the range of several thousands. So The hearing that you can establish with a cochlear implant is not comparable to the normal hearing so it's a very coarse excitation of The auditory nerve but still you can you have some frequency discrimination and you can Understand speech sounds again the different electric stimulation strategies of how to translate the acoustic signal into stimulation patterns for the electrodes here Or the manufacturers have their own favorite method and acoustic front-end processing like speech enhancement noise reduction is moving just now into these devices So all the research done by the companies in the past were focusing on the what they call the coding strategy of how sounds are coded and Translated into these electric stimulation But now they're also integrating all the front-end processing that we know from hearing aids into their devices There's also bimodal stimulation possible. So that means electric and acoustic So if you have a good surgeon He will insert the change such that if there is some low frequency residual hearing it will be preserved And so that can be used to enhance the listening experience Yeah, and finally it's Quite expensive so out of magnitude more than a hearing aid device and The cost to the end user which also includes the fitting and adjustments and service is in the range of 600 euros for relatively simple one to I would say 2000 2500 euros for the high-end devices as you use typically typically two of them There's still a 3d be Gain there, okay the the cochlear Speech processor is Designed in similar ways as a hearing aid you need a lot more power here because The the signal is transmitted through the skin to the transplant to the transplanted or the implanted. Sorry the implanted device and That is a lossy process. So you lose a bit of energy in the signal transmission from the outer processor to the implanted device And therefore you need a lot more battery power for a cochlear implants But the rest of the device like a speech processor microphones is Very similar to hearing aid and if it's a bimodal system, then you would also have a loudspeaker here Okay So that was single hearing aid and big trend is to Interconnect hearing aids and interconnect them also with smartphones or other devices capable of streaming audio data into your hearing aid and For quite some time we already have what is called a wireless link So if you have a left and the right side hearing aid they will talk to each other over a wireless System and The older ones Can transmit only a few Bits per second because also of power constraints the newer ones are now Aiming towards really streaming audio between the left and the right side, but then again you have this or the power constraints you have the latency constraints because you probably want to Code or quantize your signal before you stream it over in order to reduce a bit rate So there are all kinds of nice technical questions attached to this Which are also related to communication engineering's so What can be done now is at least an exchange of setting and parameters, so if users For example change loudness on or the volume on one side It will be transmitted to the other side So they don't need to also change the volume on the other side and things like this Also a state of the art is that you can stream Audio data from for example your smartphone into the hearing aid via Wireless relay, so we have device that translates like Bluetooth Bluetooth stream into the propriety methods that are used in the hearing aid and That allows streaming from the smartphone to the hearing aids and Now there are some systems already As some very Science which would also allow you to stream directly from your smartphone via Bluetooth and low power version of Bluetooth Standard Bluetooth even though the power consumption is lower is still too high and The goal would be of course to have it fully bi-directional so you can stream Data both ways and even process it in the smartphone But as I said power and latency are the big issues Okay, so that brings me to the first summary I Hopefully I've Know that hearing instruments are complex time varying non-linear signal processing devices Many improvements have been achieved in the last 10 years mostly it's moving to digital fully digital and open fitted hearing aids with advanced feedback control in the area of cochlear implants is introducing multi-channel cochlear implants and wireless connectivity is a major issue now yet There's room for improvement. So if you talk to Users, they're quite happy now with listening and quiet so when they sit at home at Living room and talk to their partner or other people They're usually quite happy with the performance, but There's the several sources of Satisfaction including insufficient battery life Also as you put more features into the hearing it's similar to your smartphone or computers or whatever Even though you have better digital technology or better battery technology the new features will easily eat it up and Insufficient speech quality and speech intelligibility in noisy environments and and Yeah, that relates also to the second part of my talk which is about speech enhancement and Possibly also quality of music reproduction lots of hearing aid users Not enjoy listening listening to music or at least complain about the quality for various reasons Okay, that is the first part of the first part and now I Would invite you to ask questions and also then could have a quick break and have some drinks so the the life lifetime of these devices depends on how long the user Use it, but typically it's I would say three to five years and then depends on the healthcare system in your country Typically You are eligible for replacement after four or five years The the implanted systems know they have a longer Life expectancy so the implanted part stays there for quite some time. I'm not sure exactly how long but 15 20 years But the process of the outside part is updated than in between and But needs to be of course to be compatible to the implant Well parameters, let's say parameters say it's all working on streaming audio and but it will come will be there soon Yeah, we'll see that in the later parts The main benefit is that you can use binaural algorithms to process your signal And that will also be covered in the hands-on session later this afternoon So you can use then the two microphone signal so you can use a microphone from the left and the right side and Provide more information For the processing within the hearing aid and for speech enhancement. Yeah That is typically the case already so different setting for listening to music for example you Don't want to have the noise reduction processing on when you listen to music Yeah, yeah, so they're typically different settings, but still there are distortions that are not so easily compensated for example the compression algorithm and other effects but Of course if you when you're hearing impaired you're hearing impaired and Then the quality of what you hear depending also on possibly Spectral smearing effects in your cochlear or that contributes to the Decreased experience No, unfortunately not I have not that's not part in this presentation There are some references I could give you so the in order to make the Adaptive filter converge and identify the truth feedback path you have to Get rid of the correlation between the signals and there are some ways to do it by using some linear using linear prediction techniques then you there are some ways to do it, but it's Not as easy as it looks at first glance What sort of settings? Yeah, for example different programs for listening to speech at home in relatively quiet situations or in very noisy situations where you would Turn on a beam former Or some noise reduction processing programs for music listening to music and These are the main settings and Experience shows that users are Would not say lazy, but they don't want to Change these settings all the time manually. So some hearing aids have also an automatic environment classification algorithms inside so that they can Change these settings automatically Usually you have now some remote control Which would then control both devices. Yes Yeah, so Typically, so when you have a hearing loss on both sides, you should also have a hearing aid on both sides in order to Regain the localization Sounds I don't see further questions and I fear For demonstration purposes a cochlear implant that I can pass around It's a demonstration device. So you cannot use it and The main difference to the real one is that there are no electrodes on the Implantable part so I can show it to you here So this is the implantable part that is implanted into the skull behind the ear and there are two Electrodes one is just a reference electrode and the curly one here at the end Which in fact has no electrodes on it because it's very expensive to manufacture these Electrodes, but at least this curly end gives you an idea of how large your cochlear is because that's the part that is Inserted into your cochlear. So I and then there is the outer speech processor Which is then as looks like a hearing aid and But has this transmission coil, which is attached wire magnet Then to the implant Through the skin and transmit through the skin the the excitation signals for the electrodes here And and then there's also remote control Comes with it so I pass this around and then we have maybe two or three minutes for just I mean There's some standards of course on medical devices and safety of medical devices. So there are some regulations Around all aspects of hearing aids some of them are very general some are specifically for hearing aids and their performance but There's there no standard algorithm. It's not like Mobile communications where you have speech codex and you need to standardize them in order to Spread them all over the world and make people use them and be able to communicate Across different devices of different manufacturers and so so essentially there's up to this point not that There's no need really to standardize algorithms for these devices because they don't you they don't need to be interoperable between different manufacturers So I'm not aware that there's an effort to create the standardized Hearing aid Yes Yeah, there's some experience from the cell phone Community and their limits for cell phones as to the radiation and the Heat that is generated in the tissue by Radiation The power levels used in the hearing is a much much lower So they All what I would expect they can be considered to be safe Also, you would not use it all the time. You would not have the wireless connection on all the time Simply also for power Reasons only for example use a binaural streaming Binaural streaming facility Only when you are for example in a difficult listening situation and you need to do a binaural processing of Using all the microphones in order to get the best possible result. Okay, so let's continue and The next slide I Picked it's a typical block diagram of signal processing functions in hearing aids And so typically you have two microphones. These are omnidirectional microphones because this allows you also to combine them into a directional microphone and All this is done. I will show you a bit later Then you have here the feedback cancellation that we talked about already and Then most of the processing is done in frequency bands So both microphone signals are decomposed into several frequency bands and that is also dependent on the device in the manufacturer Could be for in the older devices to age 16 But now we're talking more like about hundred channels Then there is directional processing that is Combining the two microphones of the device into a beam former There's possibly an audio mixer that allows you to also integrate an audio stream from a wireless interface into the pathway There's noise reduction Amplification and compression as explained earlier and finally at the end you need Synthesis filter bank, especially if you do a downsampling here after the analysis filter bank in each of the frequency bands and They're also briefly mentioned new Many devices you have now an automatic classification and control system that switches between programs and Comfortable Well, I would like to talk now about the first part here the spectral analysis and synthesis part and That's also related to the hands-on exercise later that in the afternoon and Also related to the issue of latency because you cannot just use the standard filter bank But you have to optimize it for The specific purpose in terms of latency and other aspects So in the old days in the analog hearing instruments You had only a very small number of frequency bands because you had to Build them with operation operational amplifiers and discrete components and so on there was not a lot of space to do that But as we have seen the hearing loss often shows a strong dependency on frequency And also we know that some enhancement schemes perform better when you have a larger number of frequency bands available From the technical point of view There are of course a lot of benefits, which I don't need to explain in much detail because also Janis has talked about Properties of speech signals quite a bit yesterday But essentially you can resolve formants and spectral harmonics You also get some deco-relation of spectral coefficients when you transform the signal from the time into the frequency domain You might exploit some signals sparsity effects. So especially when you have competing talkers They will not show up at the same time in the same frequency bands Then also include some psychoacoustic model processing models and and finally Processing in the frequency domain Especially if you downsample your signals after the analysis filter bank leads to efficient implementations A very brief Reminder of how typical speech spectra look like So here's the time domain signal the left-hand side on the right-hand side in Blue the Fourier spectrum of the short time Fourier transform They clearly can you can clearly see the harmonics of the voice sound and have also indicated here what you call the envelope So the spectral shape here represented by an LPC spectrum of order six And the nice feature about speech is that the envelope can be modeled by a low-order filter Same thing for a consonant again here noisy appearance of a consonant and The spectrum now looks quite different with an emphasis of high frequencies and again also the envelope is indicated here in red So what's the purpose of Going through the spectral analysis and synthesis I've given some reasons before but in terms of enhancement algorithms. We are looking also For an optimal performance of these and for example, we would like to have Transformation that Gives us a high compact Compaction of the signal energy in to few Transform coefficients so that all the energy is concentrated in a few frequency bands for example and not spread out smeared out over a lot of bands or coefficients In this way, we are able then to separate The target signal from noisy signals so that would require for speech as you can see here again in the spectrum would acquire Require quite a high spectral resolution Because we want to resolve the harmonics of the speech signal order for example to pick out the noise in between the speech harmonics And that would also give us a good separation of speech and noise But at the same time you also would like to have a high temporal resolution especially for the transient sounds in speech because they are also very important for speech intelligibility and Obviously these are conflicting requirements as you The original systems you cannot get a high frequency resolution and a high temporal resolution easily at the same time We also like to have a high stopband attenuation of our filter bank, especially on the synthesis part Why in the synthesis part because before the synthesis we insert these huge gains of 60 DB or 70 DB in order to compensate for the hearing loss and now if you amplify Specific bands by a very high gain then also all the The aliasing artifacts and the imaging artifacts of your filter bank will also amplified by the same gain So you need for the reconstruction of your signal as a synthesis filter bank You need filters with a high stopband attenuation. Otherwise you get all kinds of distortions in your system You'd also like perfect or near perfect reconstruction Which means that whatever signal we put in if you don't modify it we get it Out without any distortions. It's also a nice property to have and As I mentioned several times already, we would like to have a low algorithmic delay Of course otherwise This whole system will not function as we'd like to have it And also I've mentioned this several times already a high computational efficiency There's a lot of requirements as you can see and you have to make some trade-offs of course and balance these strike a balance in a way So what are the methods that are available? Well, the workhorse of digital signal processing is the discrete Fourier transform or the fast Fourier transform And that's also related to uniform filter banks. So uniform means that we have a uniform spacing of the different band-pass filters and That's a highly scalable and a non-parametric approach of transforming a signal into Which can give us a high resolution a perfect reconstruction and it's also highly efficient. So a very good choice Then there are some non-uniform filter banks for example the gamma tone filter bank or other designs which They make the resolution of the auditory system For example have more narrow filters at low frequencies and wider filters at higher frequencies Typically they do not give you an exact Perfect reconstruction, but they can be designed such that they're near perfect as a relatively low Distortion then they are optimal approaches or Also signal dependent or signal adaptive approaches like eigen vector eigen value decomposition subspace approaches They are in a way give you the optimal decomposition also in the sense that they Give you the optimal Compaction of the signal energy in in few only few coefficients that you can then manipulate But they are computationally expensive and currently not feasible for hearing aids and Then there are some low delay designs. I like to mention here the so-called low delay filter bank equalizer Published by Løyman and Vary several years ago and Of course, they are also low order parametric models linear predictive coefficients and male frequency capsule coefficients So what I want to do is to show you a design that we developed some years ago and Which is also gives you also a lot of flexibility and leads to nice trade-offs between computationally efficiency computational efficiency latency Frequency resolution and so on and and it's all based on what we call the overlap add analysis synthesis system, which you might have seen before So here is are the samples of the input signal X of K a Running from the left to the right, then we take out M signal samples and we By a window function This is large enough a window function WA to these M samples and then we perform a DFT or fast Fourier transform then we can do some kind of processing on these coefficients and then go back to the time domain by using an IDFT and and Synthesis window so the time domain samples generated here then multiplied again by synthesis window and that gives us M Samples again, which are then overlap added With the previously processed frames. So here is frame at time lambda minus two Lambda minus one lambda and lambda and there's a shift between these signal frames or signal segments and so step by step we can process the whole signal by always taking out M samples Multiplying it with a window function computing the DFT and then going back the whole process Which with such a system you can achieve perfect reconstruction If the following following condition is met So if you multiply the analysis window here with the synthesis window And then you overlap added in the same with the same pattern as you take out here the signal samples from your signal then This summation over all shifted multiplied and shifted Window functions should add up to one should give you one. So if this succession of Filters of windows adds up to one then you have Met the constant overlap at constraint that means you will have no distortions in the analysis synthesis process Provided that you did not modify the signal in between of course Okay, so that is one of the conditions that we also will need in the Hands-on exercise to see whether our windows are really Doing the right thing. So we multiply them and then we add them together and they should add up to one So here is the standard way of doing this. So this here is the analysis side We have I've now not indicated the input signal But just the succession of windows that we are using in our analysis. So here is one and then we shift our point of reference by our samples and then we Extract the next segment of signals and apply this window here and then we shift again and we shift again So we cover the full signal with this succession of Windows and a typical window which works well is for example the han window and It's known that if you half overlap this han window. It gives you this perfect reconstruction Now I said we also want to use a window on the synthesis side for various reasons So we need to make sure that the product of the analysis and the synthesis window Gives you the perfect reconstruction one way to do it is to use here the square root of a han window For the analysis side and the square root of the han window For the synthesis side now when you multiply them you get the han window as a total window And if you overlap them by half their length then you get perfect reconstruction This is shown here in the analysis and the synthesis part So that's an easy way to create such a perfect reconstruction system Just take the han a square root han window for the analysis and the square root han window for the synthesis now this Question I could ask now or later is but we might want to think about it already is The delay or the latency that you get with such a system how large is that but let me continue a bit And then I will ask this question again so the delay of such a system is Essentially governed by the length of these windows and if these windows are long Because you want to have a high frequency resolution For a high frequency resolution you need to put a lot of data into your spectral analysis in order to get the high resolution That requires a long analysis window then the delay might be too long for all purposes So the idea is to modify the system now in order to achieve a low delay or low latency spectral analysis system and possibly also with an adaptive resolution and the key idea is to use an non-symmetric analysis windows and Relatively short windows for synthesis. So how does this work? Well, there's one more line latency is I Okay, here's the answer to your to the question already So the latency that you then get is Identical to the length of the synthesis window and I will explain later why this is a case So here on this graph if you just focus on the figure here first on the left-hand side You see the traditional way to do it the way I explained it using the square root hun window in blue for the analysis and in red the form For synthesis the synthesis wins and they are both the same and if you multiply them You get the hun window and if you overlap them by half you get perfect construction So our design now looks a bit different here on the right-hand side. We are still still using some type of Window here, which looks maybe not similar to this one here, but a bit compressed for the synthesis part But now we are using this long and non-symmetric window for the analysis part and the key idea Here is to use in the synthesis only The first part of whatever is analyzed here by this long window So by using this long window here for the analysis part We get a high frequency resolution because you're putting a lot of data into the analysis and by Reconstructing only the first part of this analysis window here in the synthesis process by applying this red window here This dashed red window We are keeping the delay short because we are just reconstructing a short portion of what we analyzed before and This allows us to control the trade-off between delay and computational efficiency and spectral properties Quite easily because we can make essentially this synthesis window Arbitrarily short in principle at least you can reduce it to a single tap for example and that would lead then to Very unsymmetric analysis window, but you can easily see or imagine That by making for example the synthesis window wider we could then go in Continuously to this situation where both windows would be the same by just making this wider and then making this less unsymmetric Yeah, I like to explain that I just maybe we take Take wait for two more slides and then I have a good illustration for it Okay, so The idea is to use a unsymmetric long analysis window and relatively short synthesis window Okay, and then the next idea is that now you can Make this adaptive adaptive to the type of speech sounds that you have So for example when you are analyzing a vowel you would like to have this high frequency resolution because you want to see really the harmonics Of the vowel in order to differentiate between speech and noise So for analyzing a vowel you would use an analysis window like this here and This goes along then with the synthesis window which is relatively short because we want to have the short delay But for a situation where you would like to have a high temporal resolution You could use a short analysis window and also then a short synthesis window and one of the tricks here is that now with this design you can Switch between these windows and you can create arbitrary other pairs of windows at any time without losing the perfect reconstruction property and Why is it the case the case or the trick is that the? Right-hand slope of this window function is always the same Functions it could be for example the ref the right-hand side of a square root han window So the right-hand side is always the same here and also here and here and here So when you multiply them you always get a han window on the right-hand side And by design you also make sure that when you multiply the analysis and the synthesis window You also get a han window on the left-hand side. So if you multiply these these are standard square root Han windows for analysis and when synthesis it's absolutely clear that you get a han window as a product and Here you might have noticed that this shape here is a bit distorted And this is because this is multiplied with this Window here and this is there's also a bit of a slope here and in this part of your analysis window That's compensated here in the shape of your synthesis window So if you multiply these two windows you also get a han window of lengths two times M Now you can switch between these pairs of windows arbitrarily depending on what sound you are analyzing and always achieve perfect reconstruction Okay, so now here Is another depiction of the process The long window and then here we switch to the short window and Stay for example with the short windows and the synthesis process also need to switch the synthesis window But it's we always use a short synthesis window. So now back to the question. So why is the delay? Identical to the length of the synthesis window well in order to Be able to process the next section of your input signal you need to collect our samples so You have processed This frame here and the last long single frame in order to be able The frame shift yeah the shift yeah In I just wonder Okay, not okay in order to process the next frame So if you can process the last long frame here if you have collected all your input data up to this point here And then in order to be able to process this next short frame here You need another our sample. So you have to wait for our sample. That's the frame shift the frame advance And then at this point here you can process this data No, the long window just reaches into the past And you also the long windows are just shifted by our samples. They are always shifted by our samples And so you just need to wait for the next our samples In order to be able to process the next frame of the signal Yes, the overlap is bigger in the analysis side But you're also you you're only using old information past samples which you already have so you don't need to wait at the Analysis side you don't need to wait When you're using a long window you always need our fresh signal samples and then you can process The next frame and that's the next frame. So this is the analysis side So for on the analysis side, we need our samples. So we get a delay of our samples now on the synthesis side We have only short windows Regardless of what analysis window we are using here. You have only short windows and These windows whenever we have processed one their overlap was previous windows by the length minus r so we can Whenever we have processed for example this Segment here, which we can process in this point We have this information, but this overlaps or needs to be combined with the information of the previous signal frame. So we can start out Streaming the samples out at this point here. So when we we could start processing here at this point and we can then Dream out Samples at this point because they will be overlapped and in total this amounts to just one length of the synthesis filter so that is The trick is really to use always a short synthesis filter and by making the synthesis filter shorter you can then Make this delay or this latency arbitrarily short Yeah, yes, they are clearly disadvantages for example stop-end attenuation. So there's this kind of trade-off. Yes, yeah so you You can only make it very short if you do not down sample much in between But if you want to have a high down sampling ratio between your time domain signal and your frequency bands Then you also need a decent Synthesis filter, otherwise you have too many distortions Okay, but nevertheless, this is an easy way to achieve perfect construction and as I said you can start with the square root hunn windows and then you can design families of window pairs and Design them for specific speech sounds so that you get high spectral resolution when you need it for vowels for example and high Temporal resolution when you need it for transient sounds And if you really think it to the end you would need some classification I Have some more slides on this but we just where we just look at points of Transients transient behavior in the signal or non-stationarity the signal and then we switch Whenever we see something non-stationary And when we are in a stationary vowel, we just use a long window Yeah, in the system we implemented we used only two types of windows to not make it overly complicated But in principle you could use four or five six different versions Well, in this case it's it's You could use for example a pruned DFT because whenever you use the short windows you have a lot of zero samples which you Which you get from your window But we have not optimized it to the last bit because there are lots of variations of FFT algorithms Which you could then use to optimize the performance or the efficiency, but One good approach here certainly is to use a pruned FFT algorithm which avoids processing zero samples Yes Yeah, yes, no Yeah, it's just important but I'm I'm not sure they can be used with any type of window But I'm afraid to say probably not It's actually a very interesting problem Yeah, this type of windows how the FFT can be yeah, yeah Yeah, yeah, you probably need to compensate somehow and Okay, so what can be gained by such a scheme Here is a very theoretical or best case Experiment where we say, okay, we have we are looking for transient signals And we have here some noise in this part of the signal and suddenly there is a burst of some signal That's really just a constructed signal here Synthetic signal and so here's noise and here's now the signal the noise plus some burst signal And then we analyze These the signal with either a long window or short window shown here or ours as Rectangular windows shown here as rectangular windows or we use our Specifically designed windows which are tapered windows. So the rectangular window here serves more like a Reference best case limit case because there's a rectangular window. It would capture all the energy Here in this burst right away and what you get is this Gain function where we have here the SNR on the X axis so ranging from minus 15 to 30 dB and here The power gain that you get by using a shorter window to analyze this transient burst and This is shown in relation To the length of the long window So the solid lines are for the rectangular windows We can ignore them for now and the dashed lines are for these tapered windows That we just designed and you can see here that at a certain SNR say 10 dB and Certain ratio of the long and the short window. So for example the short window has 64 tabs in the long 512 you can gain up to 6 dB more power you can concentrate you can say it like this you can concentrate Up to 6 dB more power into your spectral bins This also as I said the best case constructed signal so realistic is that you could possibly get half of it maybe two three dB of power concentration by Switching to shorter windows whenever you have a transient and not just continuing with a long analysis window and smearing out all the information of the transient window then as I Said we designed a maximum likelihood detector I don't want to go into detail here, but essentially this allows us to detect transient signals and non-stationarity non-stationarities in the signal and The typical diagram looks like this. So this is a detector performance and Where these lines here indicate the detection Threshold that you can pre-select and above here in this area. This is the area that you can detect then as a transient and Just to give you an idea what it means. So for high SNRs So here again, we have the SNR as a parameter for high SNRs. You can detect very small Steps in the power. So here in the vertical axis We see the the power step that comes along with it transient or this non-stationarity So at high SNRs, it's very easy of course to detect also a small increase in power. So here this region is reaches down here to 10 dB or 5 dB while if you go to negative input SNRs It becomes very hard to detect in fact it becomes impossible to detect very small changes in the power But if the power change is relatively large Then we are still able to detect it and the dotted or gray shaded Blobs here and points here Are taken from the timid database. These are the T sounds or T phonemes plotted into this diagram in terms of these signal to noise ratio 10 dB segmental signal to noise ratio experiment and The kind of power change that we saw and you can see here on the right hand side the T sound that could be detected With this detector they are here in this area and That's not all of them. So there are also some T sounds which cannot be detected because they are too weak. They the power Change is not strong enough, but most of them can be detected No, so we do some assumptions that the signal is Gaussian and we do some statistics on the signal and then mostly construct a maximum likelihood detector by comparing the power Before and after or for short and long windows And then make a decision whether this is still stationary or it's non-stationary and in the effect Hopefully you can see this here In a spectrogram, this is a spectrum using the long analysis window only And this the lower graph is a spectrum using the adaptive window switching technique where we also indicated the points here where the system Detected Transients and used the short window. So the here in this range the long window was used and here The short window you can clearly see that the transients are now sticking out more prominently than Without using this just using the short as a long analysis window here. So this gives you some improvement gains Okay, once more Window design example Because we will need that in the hands on But I will also explain it again then so if you're just looking at one pair of Windows and you'll want to use a long analysis window and you want to keep the latency short then you can design it such that the right edge here is taken from a square root hand window and You could take also the left Edge here the left side from a long square root hand window and simply glue them together at this point here So that's the most easiest or the easiest way to construct these windows. Just take a short hand square root hand window for the right-hand side and a long square root hand window for the left-hand side Glue them together and then you have your long window and for the Synthesis window which should be always short in order because the delay is governed by this by the length of the synthesis window You use the right half of the short square root hand window. So exactly the same Part of that you have used here for the long window you use here on the right-hand side again because then the product will give you a simply a hand window and For the left-hand side you use a hand window Which will then be multiplied by this Part of your analysis window so you need to correct for this Distortion that you get here and this white does look a bit distorted here as well But when you multiply this part here of your analysis window with this part here Then you get also a hand window on the left-hand side and the rest is multiplied by zeros. That doesn't matter anyhow Okay, so that's the design rule for one pair of these windows Which gives you already some Decent performance, but if you really want to use it in a hearing age you need to in the end Optimize and numerically In order to get really the stop-end performance and all that Okay I have a few more slides on Some very basic stuff which is power spectrum estimation because we also need that in the hands on later on But I'm pretty sure you know all that so I can do these slides fairly quickly But of course you can stop me anytime and if you have questions, please So once we are in the spectral domain for to control the enhancement algorithms and noise reduction algorithms you typically need a Power spectrum estimate as well of the incoming signal for example and for deterministic signals Where x of k is a signal and x of e to the j omega is a Fourier transform The power spectrum is just a magnitude squared of the Fourier transform. That's well known For stochastic signals The power spectral density is defined as a Fourier transform of the autocorrelation function. That's probably also well known So here's the autocorrelation function written as a statistical expectation of xk Multiplied with a shifted version of x okay, and the Fourier transform is then the power spectrum and Then obviously an estimate of the power spectrum can be obtained by computing first the autocorrelation function or estimating the autocorrelation function and Using the Fourier transform then you have an estimate of the power spectrum This While this type of autocorrelation estimates, so that's the estimate for the autocorrelation function is unbiased and consistent Consistent means that when you use more data then it will converge to the true value After the Fourier transform unfortunately the power spectrum estimate is not unbiased anymore so be different methods and a good way to start is because we are already in the spectral domain after our analysis with a bank is to Use the Fourier coefficients directly and to use a magnitude squared Fourier coefficients or Fourier transform And that's called the periodogram Because that gives you an indication also of periodic components in your signal So you simply take the output of your DFT An analysis stage and take the magnitude squared you can normalize it That's just a normalization and that gives you an estimate of the Periodogram and that can be computed by just taking the DFT coefficients and Magnitude squared of square of it Now if you use a tapered analysis window which we always do because we want to have better spectral properties on the analysis side Then we have a so-called modified periodogram because now the signal here is multiplied with The window function and then we compute the Fourier transform and then we take the magnitude square and in order to make it Power of the window Unity power you also normalize it then on the energy of our window function Okay, so this is the periodogram and the properties essentially are derived by Taking again the statistical expectation of our periodogram and that's just some Some other way to write the periodogram that's not so important here important is that if you take the statistical expectation of the periodogram then essentially you find that it's a Fourier transform of the Auto-correlation function as It should be but the autocorrelation function is tapered Before you do the FFT or before you do the Fourier transform and that means you get some bias in your Periodogram so because of these finite limits of summation Whenever you use the DFT you have only a finite set of data the periodogram is a biased estimator of the power spectrum The periodicum is a asymptotically unbiased if you let M so the DFT lengths go to infinity But Unfortunately the variance any estimator also has a variance does not approach zero if you use long chunks of data So the periodogram is not a consistent estimator of Your power spectrum So whenever you put more data in your DFT so you make your window longer The additional information is used to increase the frequency resolution you get more frequency bins out of it So if you put M signal samples into your DFT you get M frequency bands or bins out of it So all the additional information in your data is used to increase the Frequency resolution when you increase the window length, but not to reduce the variance So whenever we use The DFT coefficients as a power spectrum estimate. We need to do some smoothing that is a message here You cannot use the Or you should not use the periodogram directly as a power estimate because it has a high variance And it doesn't help to increase the DFT length, but you should do some smoothing in order to achieve a low variance power spectral estimate Okay, so consistent estimates of the power spectrum require some form of smoothing and The most obvious way is simply to collect several Successive frames of your signal and average them that is called your non recursive smoothing and very simply do a summation of our several of these Periodograms successive periodograms, but that's Inefficient for most algorithms because you need to store all these past Periodograms in order to average them you need a lot of memory and you need to do this Summation and so on a better way is to do that recursively as a first-order recursive process And this is again an equation that we will need in the afternoon So here we use simply a first-order recursive system with a parameter alpha to Generate a running estimate of our power spectrum So here again is the input periodogram of the new frame of the latest frame We multiplied by 1 minus alpha and then we take the last estimate We had multiplied by alpha and added to this and this gives us a new estimate for the new frame at the point at time lambda plus one Who by the way is a frequency bin index here lambda is the frame index and so this is a very easy way to Compute power spectral density of a signal x when you have the DFT coefficients available simply take the magnitude squared one minus alpha by alpha you can control the duration or the the effective range of Smoothing we need in the hands on we will need to compute three power Spectral densities in order to construct our enhancement filter and we will use this equation here But we will not do the normalization because the normalization cancels out later So you don't need to do that necessarily in the algorithm later on because it will cancel out in the processing anyhow Okay, that was the second part of the first part and Guess you're all hungry and Periods I guess ready and to go there, but nevertheless if you have one or two questions we could write and now Yeah In a way, they are both low-pass filters one is a non recursive and one is a recursive low-pass filters so this here the lower one corresponds to a Moving average with an exponential weighting So Yeah Yeah, so this is in a way beneficial as a recursive or this exponential weighting is in a way beneficial because Gives more weight to the more recent data and less weight to the older data. So especially for non stationary signals this recursive form is Preferable it's also preferable because you need to store only this guy here this Yeah, yeah, yeah While this is summation with could think of a rectangular window and this would amount to moving average with a exponential window