 And if you want to now figure out what is the displacement, you know notice that if I if I were to let us say keep my collar bone fixed and I just start looking around then my pupils are clearly translating around the space right even though I am just rotating my head. So, if I estimate the rotation of your head I should be able to additionally estimate how much your pupils have have been transformed assuming that you kept your torso fixed say your collar bone was screwed down let us suppose and I should be able to estimate you know make a reasonable average human being model of how much translation occurred and that will get you pretty far that is better than just assuming that your eyeballs are individually just rotating in place right really is your head that is turning. So, this is a very helpful approximation to try to compensate for some of the position change, but if the user decides to go back and forth like this and you might remember one of the motion cues called parallax where you see nearby objects moving at a faster rate than far away objects if you would like to do these kinds of motions and it does not work for you because you are not tracking the position back and forth then it ends up being uncomfortable. So, that causes motivation to bring in more sensors right. So, if you want to estimate position what more do we need is the question in addition to orientation again you can fake some of the position you might look at this this is the greatest hope. So, if you have just studied purely in textbooks you may say well I figured out how to separate up from linear acceleration of the body why not I just take this linear acceleration component and instead of throwing it away why not I just integrate it once I get velocity integrate it twice and I get position. So, I can just figure out what your position is from that right any difficulties with that let me explain some difficulties with that and this will you know motivate why we might one might want to use cameras and other more sophisticated sensors. If we just take this residual acceleration here left over after we got rid of the up component first of all we might have separated these two incorrectly. Another thing that might happen is remember that we are always trying to estimate the orientation and. So, if we have some errors in the estimation of the orientation then we will not know exactly which way up is when we do the subtraction of up to get this residual component. So, that will lead to some additional error that does not look like noise it just looks like an offset because we do not know truly which way up is we cannot measure the we cannot estimate the global frame perfectly. So, that is going to cause additional errors and now suppose you have a small amount of errors say it is you know some kind of bias like maybe 0.2 meters per second squared you know some amounts that is that is left over here and now you integrate that once integrate it twice and you take a look at how far your estimate is from reality after 5 or 10 seconds. It turns out this grows very fast you may remember that when you integrate a constant once you get a linear rate and that is what happens you get a linear drift rate that looks very much like the gyro drift rate, but if you integrate twice to get up to position your two derivatives away right or two integrals away from the position then it ends up with a fast quadratic growth rate you know a simple offset like this a simple fixed error will cause a quadratic growth rate. So, it is very much as if your head has a fictitious rocket thruster attached to it that is thrusting at this rate and it makes you shoot away and so that is what gets very very bad about this and if you think you can compensate for this perfectly then there is still the problem of double integrating noise which will also be a very fast drift rate, but this quadratic term tends to dominate because usually we have a very simple kind of bias or offset here because we cannot accurately estimate the orientation and we cannot perfectly calibrate the accelerometer. So, these are the difficulties we get into and so we did a lot of experiments trying to do whatever we could also adding kinematic constraints based on the human body trying to rule out absurd possibilities that would cause a faraway drift because we know that your head is not detaching and floating away into space. However, it does not take long like maybe a half a second or less for a very large amount of drift air to occur and then you cannot correct that very naturally you have to detect it and correct it without making the person nauseated. So, it ends up being very very difficult I think essentially impossible with this generation of sensing technology and for the reasons I have explained there are some fundamental limitations no matter what there is going to be some kind of bad drift unless you have some additional global reference for position like a camera right. So, that is what we are going to get to next . So, that leads us to estimating position and orientation so estimating position and orientation so in other words the full 6 degrees of freedom. Now, this this acceleration information may very well be useful this linear acceleration information we can use this in a system to perform position and orientation tracking. However, we also want to have an extra signal that provides positional drift correction all right. Couple of solutions will possible general approaches one is you could generate a non constant magnetic or electromagnetic field in other words you may want to have some field lines that are intentionally bending right or maybe you want the amplitude of the field to be intentionally changing in some way and you have engineered a technology based on some base stations that you put around. So, you may put a base station and there is some kind of field lines that you are generating from that base station and based on that you may have a second base station as well if this does not provide enough information. So, based on the information coming from one or more base stations they could use ultra wide band radio as an E M example or you could be generating magnetic fields . So, based on the information obtained when you place a sensor in here that is moving around and designed exactly to sense your non constant vector fields then it can recover the position right. It is still with these this type of technology is still very hard to get a high degree of accuracy. It is very easy to warp magnetic fields by moving metal through the field other kinds of problems like that and the radio based methods I think still cannot get down to level of accuracy that we would like which I should point out the goal is sub millimeter accuracy sub millimeter accuracy. Why is the goal sub millimeter accuracy? Well if you have very large deviations it is quite perceptible you know if I have put on a virtual reality headset and I start seeing the world shaking and jittering because very perceptible. So, what might be a reasonable tracking solution for looking at your hands or just tracking viewpoint from a scene, but you are watching it on a screen may be up to 1 centimeter of error is fine, but we need to be sub millimeter level of accuracy because very clear once you put on a virtual reality headset and you look at tracking errors that are a few millimeters or greater you start to see the problems with it. So, that is one thing I need to point out we need to be extremely accurate. So, this is one technology that has some hope and there is there certainly several companies developing methods and products in this arena and there have been for many years. The other category which I will spend more time on is visibility or another name for it is line of sight methods and one of the most common implementations is based on cameras. So, I can talk about various camera arrangements and I believe I mentioned this last time you could have human head wearing a headset and the camera is looking outward. So, this is where the camera is on the headset and this is often referred to in the industry as inside out tracking and the other case I have the person again wearing the headset and you have the camera out here this is what you have in the lab on the Oculus Rift DK2's and the camera is facing the user and this is referred to as outside in or I would put over here in world camera. So, either the camera is on the headset or the camera is in the world questions about that so far that is very good question that is exactly what I am covering next that is the thing to be asking what is the camera going to be looking at right yeah so let me go over that what is the camera need to see for this to work we need to have line of sight or visibility towards something to make these methods work it also helps to have an accurate model of how the camera projection works as well right. How does the camera transform points that are in the world into image points so that is the first thing I will mention before completely answering your question is we have a let us make a simple pinhole camera model which uses perspective projection so I have some features out in space which is called in features markers features whatever you like out in space you draw three of them here and then imagine I have a camera that is pointed at these so I have an image plane for the camera like this and then based on some projection model I have all of these features treated as rays that all meet at a point somewhere using the perspective projection model and so on all these to come together at some particular point and they pierce the image plane somewhere I can get the perspective right here probably not so I have some simple model of the imaging that looks like this right so I imagine there is an image plane and you know the particular coordinates of of the points in this image plane it might depend on where I stick this plane and if I move the plane all the way past this focal point of the lines here then the image will turn upside down so let us suppose I have some model of the camera parameters so that I know where this plane is at so that I can then understand based on this focal point here and where the points are that for each one of these points that I see in the image each one of these feature points I see in the image I know there is a corresponding ray in three dimensional space ok that that is the thing that is the assumption that I am going to end up using right so I see features in the image plane I am going to know that they lie on some particular ray based on how I have calibrated and set up the camera right so that is the model and that is the information I am going to be using what exactly are these yellow stars I have drawn what I want to talk about next which I believe was the question that was just asked so features what are our choices well we could use natural features and this is what I would call let us say hard computer vision so imagine for example you have the inside out camera you have the camera on the headset it just looks for whatever features it can find and works with those it may as the camera turns it may grab some new features and some that were previously in the field of view just fall out of the field of view they disappear that is fine so it just robustly and automatically tries to use features however it can. So, extract and maintain from natural scenes this might not be such a bad room for this as I look around I see the walls have nice grid patterns on them if I were to take the headset and look directly at a with the camera and look directly at a perfectly white wall there may be no features that I can use based on whatever feature detection algorithm I have made right we have to be careful about that but it may be that for whatever automatic feature extraction and maintenance algorithm you have made we may be able to find a natural scene that it might not work so well in. So, if you are engineering a product you have to think about that is it going to cover all the cases or not or is it going to cover enough of the cases that the users will not complain because if you get it wrong and you make gross errors in your estimate of your position then this will have a dramatic effect in virtual reality because it may make people nauseated very quickly you will feel all of a sudden like you are moving when you are not. So, very bad problem um one thing you have to pay attention to here there are some tricks for doing this, but it may be possible to fool them as well you have to remove any moving objects from the scene. So, the headset may be moving the camera may be on the headset and moving you have to remove objects from the scene that you conclude are not moving just because the headset is moving now you have a gyro and accelerometer on the headset. So, you may be able to figure out what motions are corresponding in the image plane you may be able to figure out what motions correspond to your own motion and which motions are not explained by that and then remove the ones that are not explained by that. So, it is not bad there is some information you can use there, but um generally is difficult. So, I would say that um overall using current technology the reliability is low um you will see good demonstrations, but making this work very robustly in a very large number of settings and have sub millimeter level accuracy I have not seen anything that can accomplish this yet. So, and so um so this is this is the kind of the state of the art this is the state of the art for natural features. This then motivates people to go to artificial features and I would call this instead of hard computer vision I would call this trivial computer vision because most of it boils down to something called blob detection as far as the vision part right you are not doing very much work other than um trying to identify markers that you have put into the environment. So, it boils down to some kind of blob detection or variations of that whatever patterns you have decided to put into the environment um you just have to go and quickly scan those back out of an image. So, you are not really doing the type of computer vision research that would make your computer vision professors proud let us say right um this is more like um vision in 1971 or something you know computer vision you know challenges of that era let us say. So, so just simply pulling blobs out of an image, but you have designed it so that you have um easy to find features. So, some examples you could use what are called QR tags you could use what are called retro reflective markers you could use LEDs laser projections um some of these could be designed to work in the infrared spectrum so that they do not interfere with visible lights so that humans tend not to notice them and there is not a kind of interference that may cause other troubles. So, um so that is so choice you have you have IR versus visible spectrum right. So, so that is the idea either you go the hard way or you go the easy way when you go the easy way for artificial features you end up with a more controlled environment and you can guarantee much more robust performance right works in a larger variety of settings and with much higher accuracy. So, that is one thing that we went with at at Oculus on that stage for DK2 um because of this high level of reliability and accuracy. It was better than trying to make the computer vision the most impressive computer vision ever for example right. So, it was a high level of robustness um this was very very hard to achieve that. It would be more impressive if any of you can find a solution that works with natural features it has the same level of accuracy and robustness as this current a DK2 tracker for example, then I can assure you many companies would be very interested in that. So, it is quite a difficult challenge. So, let us take a look at kinds of markers um let me say something about retro reflective markers. This is a very common technique that is used in motion capture systems um they have been around for quite a while the motion picture industry has used them before for capturing dancers and digital actors um capturing real actors and turning them into digital actors let us say. When you use retro reflective markers the idea is you take the camera and then around the camera you have a ring of lights a ring of bright LEDs right. So, um you know there is lights facing me today right. So, so you are trying to get a good image of me and the camera. So, same thing if you are trying to image something out here shine some bright lights on it with LEDs except we can move into the infrared spectrum. So, that the actor or artist who is moving around in these motion capture studios um does not feel the interference or is not blinded by the lights. So, you can make these infrared LEDs. So, just exploit the fact that we cannot see in that range and then out here in the environment you put the features which are called retro reflective markers retro retro reflective markers. Um you can buy special paint and you can paint tiny objects um you may have seen um what look like deer antlers sticking out of objects before in these kinds of systems and you have um gray tiny painted spheres um this is retro reflective paint. What will happen is that the amount of light coming back to the source. So, if I shine a light and I are light onto the source and then I look at the reflected light back it will be independent of the angle at which the light hits this object. So, there will tend to be a very bright spot coming back. So, very nice for that. So, that's retro reflective it's as if you're shining into a mirror regardless of the orientation which is a very unusual property right. Normal if you want to have the light bounce back it you have to have the mirror perfectly aligned. This is as if you have all orientations of mirrors um when you shine the light. So, it's guaranteed to come back and hit you very bright hit the source very brightly. Um so, this is one possibility one of the things you have to pay attention to those if you'd like to extend the range as much as possible of these sensors then um remember that um the power of your light dissipates at um a rate of one over d squared I'll put one over distance squared one over distance squared right. So, just usual power dissipation of a of a propagating wave in three dimensions the amount of energy or power dissipation is one over the distance squared and so um the further away you move these you know you get a lot of power up very close the further away you move um the lower it gets and you have to get the light to essentially travel twice the distance right. It has to travel from the ring to the marker and then from the marker to the camera. So, you have double the distance. So, one possible improvement is to just put the light directly on the markers right. If I can directly light the markers I've cut the distance in half right. So, that means I should be able to move the markers further away and get the same amount of brightness in the camera. If I don't mind putting batteries or whatever I need or this has to be plugged into a separate source if it's on a headset then I have to power these lights. So, that's another alternative that's what's used in the lab which is um I take the camera and instead of having retro reflective markers I have these special markers that I've designed and these are just infrared LEDs and we're also taking advantage of the fact of LEDs becoming increasingly more powerful and low cost um and and very compact as well um earlier LEDs were larger they had a huge lens across them. So, you could see them looking bright from different angles now they they emitted a much greater amount of light uh infrared light in this case very low cost very small footprint. So, now it's essentially if you want to set up this visibility situation and think about the amount of polities LEDs have to have you've cut it in half in order to be able to see them in the image with the same in the same way and also there's the efficiency of the retro reflective material which I haven't covered as well. So, this ends up being more efficient if you have the ability to power these markers. It's cheating even more right so in terms of computer vision real computer vision you know if you want to make something powerful should be using natural features. If you put the features in the environment yourself as special markers that you've painted you've cheated right you're helping the vision out. So, that's fine that gives much better performance but not as impressive from a pure computer vision standpoint. You want to go even further put some bright lights in the environment bright and infrared in this case and then just track those and if they're small very well separated you get these points or these features that we need to get to the next level.