 The next part I want to talk about is audio or auditory localization. How do we know where sounds are coming from? Well, let us compare to the vision case. How do we know where um the light is coming from? I guess I can just turn my head and look right. So, I I if someone is talking to me I look at them I I can see the I can see exactly where their eyes are right. So, I have a lot of information I can use I have the um the information of how my head is oriented which our bodies are keeping track of the eyes are oriented where the image appears on the retina usually it is appearing on the phobia I am looking at something. So, I have all of this information I know where the light is coming from the particular stimulus that I observe right. If I am reading the clock on the wall I know exactly where that is at relative to me in terms of angles I might not know the distance, but we had depth perception right and we talked about that. So, if I know the scale of something maybe based on the size of the image on the retina I can estimate the distance. You could also do psychophysics studies to figure out how good you are at estimating the distance based on the size on your retina right. So, we can do these kinds of things. How do we know where sound is coming from using only our ears right. If you close your eyes can you determine the source of audio right. So, every morning when I wake up I hear the loud Asian coal all right the outside and on the campus here very loud bird I feel like I can narrow down where that bird is to within a degree or two. Do you feel like that? Can you tell where can you usually find the bird in the tree right. How are we able to do that? The auditory. So, auditory localization is a very important part of perception if we are going to make a virtual reality system that produces virtual sounds and if in the real world we have the ability to localize to figure out where sounds are coming from we better not mess that up right. We better not fail when we do it in virtual reality. So, how do we how do we get these things. So, that is why understanding is going to be very important. So, where is is sound coming from. We can generally think of three coordinates for that let me draw a kind of coordinate system here. Suppose this is the location of the ear for example, and we have some distance D that the source will be from this from the origin and then project down into the plane here we will have an angle theta and an angle with respect to the plane coming upward which I will call phi. So, that leaves three components. So, we have must say 1, 2 and 3 the horizontal plane direction which is called azimuth which I have represented there as theta that is what we have the horizontal plane. So, just some direction 0 to 2 pi where is the sound coming from this seems related to yaw in the coordinate systems we have been talking about for head transformations. We have vertical right. So, how high or low is the sound? This will be called elevation I represented that with phi and then we have distance which we have represented with D right. So, just using spherical coordinates does not matter right alright. This just ends up being convenient for the way the ears are arranged and the type of information that we get and are able to infer that allow us to resolve where the sounds come from. I am going to give an example of a just noticeable difference with regard to audio localization it is called the minimum audible angle MAA which is an example of of just noticeable difference. This would be exactly as I said trying to localize where is the bird sound coming from and so in terms of the azimuth. So, we have the we have the head and let us suppose it is I guess I will draw kind of a nose here. So, we are looking top down and I want to understand what is the smallest angular change here delta theta that can be detected right. So, you make a very small change you ask people to tell whether or not it is in fact, moved. So, one thing that is interesting is when it is closer to the front we are much better at it when it gets to the sides we are not as good at it right. So, that is one thing to pay attention to. So, when we get up to looking straight ahead. So, it is around one degree if the stimulus is below 1000 hertz and straight ahead it is around 5 degrees to the side. There is some exact plots in the book that contain much more information I just want to give you the general idea to point out that just based on the geometry of the ears and the way the senses developed. There is no simple answer it is not always 1 degree. So, it depends on the frequency it depends on where the location is. So, once you go over to the side it goes from 1 degree up to about 5 degrees and it still varies depending on how far over you get to the side. Turns out that we are terrible perhaps completely unable to localize the direction of sound around 1500 to 1800 hertz on the side right. So, if we have sources on the side we have a very difficult time localizing them in that frequency range. So, there is no simple answer right you have to take into account the frequency and where the direction is at relative to your head and then you can answer questions about the minimum audible angle. So, how much can you change the direction of the sound source and still be able to tell that. So, I find that interesting. So, if you are going to try to reproduce that in virtual reality and wanted to do some experiments you could have people put on a virtual reality headset close their eyes listen to where the sound is coming from and see if you can match that from the real world. Maybe I record the Asian coal making it sounds and then I try to somehow reproduce that in a speaker system and see if I can get humans to respond and give minimum audible angles in the same way right that is how you would know if you have done it right. Knowing that you have done it right in audio seems significantly more challenging than with video right. In the visual case you look at the images and you say all I see pixels or all the colors do not look right right it is very easy for us to just give simple feedback and make some kind of heuristics or hacks that seem to work well enough. Here it may be quite complicated things might sound ok right it may sound all right, but you might not be completely sure unless you have done systematic experiments to make sure that you have reproduced the sound in a way that is as close to possible as in the physical world. Now, remember in the case of visual we had depth cues for example right. So, for depth cues we had um monocular depth cues and binocular depth cues correct and and I sort of I said that um we tend to emphasize binocular depth cues and I really had to emphasize to you the monocular depth cues. So, the same thing is going to happen here we have monoral cues for localization and we have binoral cues. So, let me go over some um monoral cues 1 we get a significant amount of information from the pinna that is the shape or geometry of this outer part of our ears and the external ear canal shape. So, basically the funneling part provides a significant amount of information about where sound is coming from. So, a kind of um signal processing filter or transform is is is performed by our outer ear. I will get into more details of that shortly, but I just want to point out that is a significant amount of information that lets us determine where sound is coming from. It is distorted in different ways across the speak frequency spectrum depending on where the sound is coming from just on how it the sound waves propagate through your pinna and external ear canal. Uh 2 the intensity decreases by the inverse square law. We talked about that in tracking systems for light. So, same thing for um audio this may be equivalent to a monocular depth cue that has to do with the retinal image size right. So, if you know how loud something should be again maybe it is the Asian coal and you know how loud that bird typically is if I can barely hear it it is probably far away right. I do not need 2 ears to determine that I just need 1 ear to determine that. If it is a very unusual sound you have never heard before maybe this cue will not be so good because you are not sure how loud it is supposed to be anyway. Um 3 the spectrum the spectrum of sounds. So, it turns out that um when you look at the frequency spectrum of sounds lower frequency components tend to travel further through air. So, if you hear um thunder in a lightning storm I think we had one last night um the thunder when it is very far away just sounds like a low frequency rumble when it is very close you hear the high frequency components. So, that is a distortion in the frequency spectrum that gives you a cue as to how far away it is right. So, low frequency travels further or there is less dissipation let us say for low frequency as there would be for high frequency. So, that is a kind of distortion it is as if there is a kind of filter applied to it and finally, there is direct versus reverberation energy right. So, in the case of reflecting waves reverberation energy. So, um so, as I speak in this room my voice is bouncing off of the walls tables off of you um there is reverberation energy. So, that is causing phase shifts in these waves. So, you are hearing multiple versions of me at different times. Do I seem to be echoing to you in the room very strongly not really not too much right. You can hear echo in many cases right, but I do not think it does not seem like unless we were in some enormous church hall for example, you may hear some echoes because it is a large amount of distance that the soundways have to travel before coming back you may perceive this kind of temporal displacements you may hear second third ah echoes coming back as I talk. Um this leads me to to give you an an interesting example. So, remember that we had um optical illusions right for the case of the visual sense. Shouldn't there be audio illusions as well right have you ever heard of any audio illusions let me give you a simple audio illusion that is related to reverberation helps you to understand this cue. So, let us put up some stereo speaker system. So, I have a speaker here and I have a speaker here they are supposed to look like the same size and um let us say we put our head here in the middle right. So, we are listening to sounds from the speakers we do this all the time right. So, I get some kind of stereo sound let us suppose I just transmit the same sound to both speakers just don't worry about stereo separation you are listening to some kind of music it is really mono music separated and I am just putting the same audio track out to both channels. So, I have a left and a right track I guess maybe that is the left track and that is the right track. So, everything seems fine now what I want to do is move my head over here and I ask you um do you hear the speaker at all? If I go over here it should be the case that I hear the sound from this speaker and then this one comes in significantly later and there should be a time shift right because this one is traveling further away, but I do not hear that do I I just hear the sound from this one speaker I do not hear them both try it sometime range two speakers walk back and forth usually you hear the sound you have done this before some of you have done this before I think right am I just making this up you try this before you ever notice this you get really close to one speaker you hear only that you get out into the perfect place and you hear both of them you are like wow this is perfect this is where I am supposed to be then you get over to the other speaker and you only hear that one. Of course, your ears are taking in the sound from both, but your brain is masking away the reverberation from the other one because it is a secondary effect it is essentially it is perceived as the same audio, but it is time shifted um and it is lower amplitude. So, it just gets masked away it is an audio or auditory illusion you do not hear the the extra echo from that or the time shifted version of that that is falling on your ears. If I were to suddenly while I am over here turn off the speaker of course, I would hear that one right, but you do not perceive it at all when you turn this one on. So, auditory illusions ok questions so far I want to talk about binaural cues now right. So, just like we have stereo with our eyes we should have some kind of stereo with our ears um another interesting part of who want to compare to eyes is um we had a vestibulo ocular reflex do not we should not we have something like a vestibulo aural reflex or not why do not we have that this some animals can rotate their ears right. And so um I think horses can do that for example, cats can do that. So, if you could orient your ears then you should be able to also have your vestibular signal connected to that from your vestibular sense of that you could orient your ears to keep them pointed at some audio source right. We do not have that we we can not even reorient our ears, but some animals can do that. So, I am just pointing out some of the interesting differences I think it is nice to compare these and and and understand the differences. So, we do not have a vestibulo aural reflex could have happened, but did not seem so important for our survival I suppose. All right. So, let us go to binaural cues. So, there is two concepts here one is called ild, which is a inter inter aural level inter aural level differences. One thing that becomes very important is what is called the acoustic shadow of the ear right. So, if I am facing this way and if one of you were to speak um it is much louder for this ear than the ear that is in the acoustic shadow right. So, just as if it were light the sound waves as well due to some diffraction I may get some bending around the corner due to reverberation off the board I will hear more, but generally speaking the sound should be louder for this ear than for this ear. So, that is inter aural level difference. So, that is one very important binaural cue these are just like the back faces when we talked about rendering right. And the other one is ild, which is inter aural time differences. So, this is based on different arrival times in the ears um just for reference the distance between the ears is about 14 centimeters depends on your head size of course, but um. So, there is that amount of distance maximum and you think about a sound source right maybe coming from 45 degrees away from center and it hits this ear first and then is here right. So, based on that time difference believe it or not our our brains our neural structure is resolving that temporal difference and it is using it is measuring that temporal difference or the phase shift between these waves that are coming in and it is um using that information to determine where the sound is coming from. What I find really interesting about that is that the same thing has been done in engineering for a long time. So, if you have studied sensing systems in engineering this is called um. So, in engineered sensing systems this is called multilateration using what is called time difference of arrival or TDOA. So, if you want to read about the engineering of these systems you can go ahead and explore it this way just look up multilateration and time difference of arrival. So, if I have some um transmitter is transmitting sound and there are um receivers out in a field somewhere you can figure out where the transmitted sound is coming from. Interestingly enough this was even used in World War II it is called the um DECA system for um submarine localization from World War II. So, you can look that up if you like as well. Um let me say something about the geometry of that and then we can take a break. So, I have receivers let's say um in two locations maybe here and here. So, they have some distance between them as I said it's about 14 centimeters for human ears. Maybe make that a little bit better they should look like they're kind of through the center here. So, about here and now there's a sound source somewhere in the space and then we want to look at we have this distance and we have this distance should be straight lines no it looks particularly bad, but I don't see it all right. So, straight lines there and um so we have distance to the left and distance to the right and now um based on the difference in time of arrivals here right there's some delta t that I get right some difference in time that should be equal to the um dl minus dr divided by s which is the speed um in the medium speed of of sound in this particular case right just say speed of sound. So, if I make some calculation like this I now need to think about um what is the set of all places right if I work backwards. So, I started with the sound source and I said ok we need to look at the um the difference between these two distances and that will give us a difference in time based on the propagation speed of the waves. So, now let's work backwards suppose I have two ears they hear a sound source and a difference in time has been detected what is the set of possible places where the sound could be coming from. And if you work through the algebra for that it turns out to be a hyperboloid um and generally you may remember from basic um conic sections and um um analytic geometry hyperboloids come in two sheets the two sheets will be you'll get one sheet if one signal came first and you'll get the other sheet if the other signal came first it depends on the actual order as far as which sheet you get and I'm drawing it in 2D but you actually get a hyperboloid it should be peaking right on the axis here and um the hyperboloid is referred to by perceptual psychologists by the great name the cone of confusion right. So, this is the cone of confusion. So, there's a cone shaped region hyperbolic cone over which you cannot localize any further using only inter oral time differences. Now one thing I find fascinating about that is that we can in fact determine where sound is coming from inside of the cone of confusion. Now part of that is because we're using inter oral level differences, but part of it um is because of some more information that's coming later, but um to give you just a hint of it it has to do with the pinna. So, we can do more information, but if you're only looking at inter oral time differences you have a cone of confusion a region within which you cannot distinguish any further where the sound is coming from based only on this time difference of arrival of the sound waves. Questions about that?