 Again, my name is Tristan Gould and I'm a remote sensing scientist in the AOP group and I'm going to give a talk this morning on discrete lidar uncertainty. So generally here we talk about two major sources of uncertainty, geolocation uncertainty as well as processing uncertainty. So geolocation uncertainty deals with the uncertainty that's associated with each of the instrumented subsystems within the lidar. So the GPS and IMU, laser ranger, laser scanner and the measurements that they make and how the error in each of those measurements combines into geolocation error for the actual point cloud. So generally in that situation horizontal uncertainty for lidar is greater than vertical uncertainty. What we've seen is that if you look at the instrument specifications for lidar they generally don't give you a very good impression of what the uncertainty is. So generally they give you uncertainty specifications in very optimistic conditions that you're not going to see for the most part in the real world. And so vegetation and terrain conditions will also affect the uncertainty in the point cloud. Then we also have processing uncertainty which is really one of the larger sources of error that we have and it's much more difficult to quantify than the geolocation error. And we'll talk a little bit about that. So I just wanted to go through sort of the different processing steps and how the uncertainty is introduced into the lidar system in each one of those steps. So the first is the airborne trajectory which we talked about yesterday. And so you can see here we've got a picture of this airborne trajectory and it's colored by an uncertainty that was given, a predicted uncertainty that was given by the commercial software that we used to produce the trajectory. So the red areas are high uncertainty, yellow sort of middle and then the blue areas are a little bit better. So the uncertainty in the trajectory is a combination of the distance you are from your GPS base station, the distribution of number of satellites, the lever arms inside of the system and those are the linear distances from the GPS antenna down to the IMU and from the IMU to the laser sensor. So we have to measure those linear distances between those so that we can, when we get the position at the GPS, we can translate that down to the laser and then down to the ground. So those need to be measured and of course the accuracy of the IMU. Now what we found, some really nice stats that Bridget worked up this past year is that when you look at the simulated uncertainty from the software, what it tells us is that the distance from the base station is actually the most important factor when we're looking at the uncertainty in the trajectory. And this is sort of an average of the predicted uncertainty for all our flights across the entire season and the distance from the base station. And you can see it around 20 kilometers, you get this jump and that starts increasing. This is one of the reasons we try to keep our base stations always within 20 kilometers of the flight because we know that after that, the uncertainty starts to really raise in that trajectory. And the trajectory is really the base of all of our geolocation so it's really important that we maintain really accurate trajectory. So we also get these stats at the end of the flight that tell us what the uncertainty in the easting, northing and elevation are with the flight. So to further look at this idea of the distance of the base station, we had some flights in D, I think that's D8, three different sites. And what we did is we had base stations located at the site and we processed the trajectory with the base station and without the base station and then we compared the difference between those trajectories at the sites. And so in some cases, this didn't turn out very well. In fact, we got upwards of over half a meter difference in those trajectories when we weren't using the base station. So this is a huge deal for us. We're trying to meet 15 centimeters of accuracy in the LiDAR. So if we're getting these types of errors on the trajectory, we're completely gone. But a lot of times, I mean, I think in this particular trajectory, this area of high uncertainty was when we were transiting and far from other base stations. And so you can get situations like that. Another site we looked at, it was a little bit better, wasn't quite as bad, it was more than 15, about 15 centimeters of difference between those two, but still a big deal to us. So I mean, it's obvious that having that base station really close to the trajectory is really important to maintain the error that we want. PDOP is a measure of, like it's a descriptor of uncertainty in the GPS satellite constellation. So that's one of the portions that contributes or gives you an idea of what the uncertainty in the trajectory is going to be. If you have a high PDOP, then you're going to have a high uncertainty in the trajectory. But what we found is that distance from the base station making sure that that's low is way more important than making sure that PDOP is low. Since we're doing flights just in the United States, the GPS satellite constellation is dense most of the times around here. And so we usually get enough satellites and a good distribution, so the PDOP is generally low. So after the trajectory, we have the LMS processing. So this is the processing that we do in the commercial software that's provided by Optek. And so a couple of things. At the beginning of the season, we do a flight to measure the Bohr sites. And so the Bohr sites are angular differences between how the LIDAR sits and how the IMU sits. So basically, the IMU is giving us our orientation in the sky. And then the LIDAR head, we need to know the relationship between how that's sitting to with the IMU to properly geolocate all the observations on the ground. And the small angular differences, these are usually sub-degree differences between the IMU and the laser head are called Bohr site misalignments. And we do a dedicated flight over Greeley each year to measure what those Bohr site misalignments are. Of course, those are calculated, and so there's always potentially a little bit of uncertainty. And so after we do a flight, what we can do is we can look at how the data in the overlapping strips matches. So like I mentioned before, we have 30% overlap in each one of those strips. So what we can do is we can look to see how well that overlap data matches and how well it compares with each other. If it compares really well, we get these vertical differences associated with scan angle and the software plots these. And so if this is a nice flat line, that tells us that the system is in a really good alignment. But it's also possible to get situations like this where we kind of get this angled distribution here where there's some bias with scan angle. So if that happens, it tells us that the Bohr site alignments need to be redone or checked again. And then often if we see this, then we'll do mid-season Bohr site alignments to get these graphs to go back flat. So there's also what's called intensity table corrections. And these are factory calibrations that are provided by Opteq. And basically these are range adjustments that are applied to the range based on the PRF and the returned intensity. So we really have no control over these. It's corrections that are done in the lab back at Opteq. So after we fly the trajectory, we get our Bohr site misalignments. We process this data through the Opteq software. What we're then able to do is check the vertical accuracy of the LiDAR. And we do that over a runway here in Boulder. So a couple of years ago we went out and took about 2 to 300 really high accuracy GPS points across the entire runway. And so errors of about 1 centimeter or so. So then what we do is we use all of those GPS points and interpolate between them to get sort of a validation surface of the entire runway. So we know what the elevation is everywhere on the runway. And then when we fly over it, these are all the LiDAR points that land on the runway. And then we can get the vertical difference between each of those LiDAR points and that validation surface. And so when we do that, since the LiDAR is collecting hundreds of thousands of points per second, we get this really great distribution with a really high sample that gives us an impression of what the error is. And so since we try to fly over the runway with the laser at Nadir, so the plane is directly above the runway, the primary error sources that are going to be contributing to these statistics are the errors in the laser ranger and the vertical error in the GPS. Other types of errors like in the IMU or in the scan angle, they're going to only propagate more heavily into large scan angles, not so much at Nadir. So usually these stats are just giving us an idea of how well the laser ranger and the GPS is operating. So these are some results for several different lines that we did over the runway. You can see that we've separated them by PRF. And so that's the pulse repetition frequency, how fast the laser is pulsing. And as I mentioned yesterday, we only fly at 100 kilohertz or less. And this chart shows why, is that when we get to 100 kilohertz, you can see that we have very low mean and standard deviations at some of these higher PRFs, 125, 142. The errors are above our limits of 15 centimeters. So this is why we fly only at 100 kilohertz and below. So we also want to test the horizontal accuracy of the LiDAR system in addition to the vertical. And the main source of error in the horizontal component of the LiDAR points is due to the beam divergence of the laser pulse. And so think about a laser, you think it's coming out and it's very thin, tight, bound of energy as it's coming out. But the instantaneous field of view on the laser, the beam divergence, is 0.8 milliradians. So that means when we're flying at 1,000 meters, when that laser pulse hits the ground, it's diameters 80 centimeters. And so what can happen is that the energy distribution of that pulse is actually Gaussian-shaped. And so most of the energy is contained in the center, but out towards the edge at this 1 over E level, this is our 80 centimeter diameter here. So you can see there's still lots of energy out further than that. And it only takes about 1 to 2% of the energy to get return back to the LiDAR system to trigger return pulse. And so what can happen when you have this really wide beam is that, say, if we were flying over here and we're going to the table, which is a very hard, flat surface, if our beam came down here and it's 80 centimeters, it can come down and the edge of the beam can hit the table. That return is going to go back from the edge of the table, but the coordinate gets associated with the center of the beam. So then it looks like the table is over here because the center of the beam was over here and the edge of it hit the table. And so since the coordinates associated with here, but we've got the elevation from the edge of the table, then it actually ends up over here. And so what we can do is actually fly several flights over the headquarter buildings. And we went out and we used traditional surveying at Total Station to survey all the corners of the headquarter buildings. And then we fly over these. And then we look as the pulses, as we're scanning across, and the pulses are coming up to the building edge, where do they first jump from the ground up to the building edge? And what we find is that it's usually some distance away from the building edge where we see that first jump up. And then we can calculate this perpendicular distance, and that gives us an impression of what the horizontal air is going to be. And so when we do that, we see that it's about pretty close to half of our beam divergence, which is 40 centimeters. So we have that 80 centimeter full diameter. But then as we're coming up, we're only 40 centimeters away from the building edge when we see that jump. So that shows us that the primary source of air in this horizontal component is the beam divergence. There's going to be some GPS air, some other types of air, but they're pretty much dwarfed by this beam divergence air. So then when we can, we also try to validate our digital train models when we're going out to sites. And so we visit a couple sites per year to do some ASD measurements just to support the spectrometer. But when we do that, we also collect LIDAR validation points using rapid static GPS techniques. So basically, we take a high accuracy GPS, set it out for about 20 minutes, collect observations, get elevations sort of throughout the site. And then we take each one of those elevations, and we compare it to the elevation we get from the digital terrain model. So this is an example of doing that at Oak Ridge. And all these circles show the different GPS points that we collected. And then this chart down here shows that vertical difference between the GPS points and the DTM. So you can see that we're doing pretty good. We got a mean of about four centimeters and a standard deviation of about six centimeters. So this is pretty consistent with what you can expect for most commercial LIDAR providers. So then to give people an idea of what those errors are kind of across the entire site that are associated with the instrument, what we do is we actually simulate the error in every single point that the LIDAR has acquired based on errors that we know for the GPS and IMU laser ranger and laser scanner. So we propagate the errors through each one of those instrument components into every single point. And then we get horizontal and vertical errors for every single point. And then we create LAZ files or LAS files where we take out the elevation, but insert the vertical uncertainty. So then you can plot these LAZ files. And instead of having the elevation, I have the vertical uncertainty instead. We use the algorithm that I published in 2010. So if anyone wants to know more about that, then feel free to ask. Generally, what you find is that you can see here that sort of at the edges of line. These are all different lines that we flown at the edges of lines. The uncertainty is a little bit higher. That's because at NADAR, you don't have a lot of the errors propagating in from the scan angle. So as you scan higher, any errors say in beam divergence or errors in the scan angle, errors in roll pitch and law, they'll propagate higher into the vertical coordinate as you get a larger scan angle. So generally what we see is that the edges of scans have higher uncertainty than the center. It's also good, potentially, if you can fly where your edge, you're flying with 50% overlap, where your edge is hitting the center of the adjacent line, because then you're getting your highest air compared to your lowest air. But it's always a trade-off between flying time and things like that. I think I mentioned yesterday, we use the triangular regular network to create our DTMs. And then from those DTMs, we create our slope and aspect. And so I mentioned that one of the downfalls of the TIN interpolation method is that we don't get any filtering due to redundancy within each individual grid cell. And so we create the DTM just natively with the TIN interpolation routine. But then as we create the slope and aspect, I run a 3 by 3 moving average across the DTM before calculating the slope and aspect. And this slide kind of demonstrates why we do that. You can see over here, this is just the raw DTM over the runway. And if you look at the slope, you can see it's like really variable across the runway. The runway is a really flat surface. It doesn't have slopes that are ranging from 0 to 5 degrees. And the reason we see that is because there's a lot of noise in the lidar points. So you're just getting your slope between those really noisy points. And so then if we run a 3 by 3 moving average across the DTM and then calculate the slope, you get this blue line here. So you can see the slope is a lot less over the runway after we do that. So next, I want to talk about the canopy height model for uncertainty. This is an analysis I did at the San Joaquin experimental range. So I was able to get field measured tree heights for a lot of the trees throughout the site and then compare those directly to grid cells in the canopy height model. And after getting rid of some outliers and some other points that they measured, for example, sometimes you'll get points that they measure on trees that are lower than the upper canopy and the lidar is only seeing the top of the canopy. So you need to get rid of those. I got this regression line. So we should get a one to one regression and this is pretty close. It's actually not statistically different from one. No trend in the residuals. But the important part here is that the intercept value is negative 0.493. Means that generally we're underestimating the tree height with the lidar. This is a fairly common problem that you'll see in the literature that tree heights are generally underestimated by lidar. And this is because the pulse actually penetrates partially into the tree crown before enough energy is returned to trigger that return pulse. So you'll get some infiltration down and then you'll get that return, enough energy to go back and get a return pulse. And so us seeing sort of about half meter below these trees is pretty consistent with what most people have seen in the literature. So something that we've also done. Some more in-depth analysis of the CH of the Kennedy height model uncertainty. And we leveraged BRDF flights that we flew primarily for the spectrometer. These flights are designed so that we can see how the spectrometer is going to give different observations using different flight tracks, angles, and orientations of the flight tracks. So the nice thing about these flights is we're actually able to leverage the center portion of this where we get 20 lines overlapping. So I can actually make 20 canopy height models. And then in this overlapping portion, just look at every cell and see how it varies between all of those different canopy height models in that center portion. So it enables us to sort of empirically derive what the precision in the canopy height model is. I did this analysis on canopy height models. Amanda is actually continuing this analysis this summer and applying the same algorithm to all of our other data products in addition to the canopy height model. And that's what she'll talk about this afternoon. So these are just some images that's showing when we overlap all those flight lines, you get this nice area in the center where we have all the flight lines overlapping 18 and 120 and the other. So we're able to create all those different rasters. There's an example. And then we can look at the center portion and actually get these rasters of uncertainty. For each cell represents the standard deviation of the canopy height model across all those different lines. So I guess the take-home message from this is that this is the average uncertainty that we saw in the canopy height model at each one of these sites. So it's San Joaquin, 1.9 meters, it's Soaproot, 2.2 meters, and Oak Ridge, 1.1 meters. I have sort of a more in-depth presentation on this stuff, which I'd be happy to give people. But the basic take-home idea here was that each one of these sites represented really different forest types. And there's different factors at each forest type that contribute to the overall uncertainty. But also what this tells us is generally if you're looking at an individual cell in a canopy height model, you could be looking at 1 to 2 meters of air at that actual cell. Yeah, so SJR is like a savannah type landscape with shorter blue oak trees. And each oak tree is kind of individual, has some space around it. And sort of what we saw at San Joaquin was that due to that beam divergence issue that I mentioned before, the edges of the individual trees at San Joaquin had a lot of uncertainty. Because you had some points that would hit the edge of the tree, and some points that would hit the ground. And so you got a lot of variation at the edges of those trees. At soap route, you had really tall, thin, ponderosa pines. And so what happened is that as we flew those different orientations of the flight lines, sometimes the LiDAR point would hit mid-tree on those really tall, thin trees, and sometimes it would hit the top. And so on these really tall, thin trees, you'd get really high standard deviation, sometimes like 18 to 20 meters. Just based on where the LiDAR point happened to hit the tree. And then at Oak Ridge, where we have a really heavy canopy, what happened was we got these areas here of high uncertainty, kind of these segments of high uncertainty throughout the canopy height model. And when you look into that, what you find is that at these areas, we also had really poor ground penetration underneath those heavy canopies. And so what happens is that there was a lot of interpolation that was occurring across the ground surface here. And in the different flight lines, this interpolation resulted in really different ground surfaces. And then we're subtracting the top of the canopy down to the bottom that resulted in really different canopy height estimations. So this problem that I mentioned at Oak Ridge is really important because it's a commonplace across a lot of our sites that we don't get good ground penetration. So this is an example of the Great Smoky Mountains flight that we flew in 2015. And what it's actually colored by is the longest edge in any one of the tin triangles across the entire site. And so what we see is that generally these range between 0 to 3, 3 at the most. And this is for all the points. And so at the most, we're interpolating 3 meters across any given area. We can look at that distribution. See at the most it was 3, but generally it was below 1.5. And this is because, like I told you guys, we're getting generally between 2 and 4 pulses per meter. And so generally we don't have to interpolate much more than 1.5 meters. But this is what happens when we look at that same plot using the ground only points. So using the ground only points, we're going from 0 to 25 meters. And so there's particular areas in this really heavy canopy where we're interpolating the ground surface across 25 meters. And so that's going to add a lot of uncertainty into the canopy height model. Because then if we miss a dip or a hill in the ground surface, it's really going to affect the canopy height. So this is that same distribution except for the ground points only. See, we've got this little bump sort of aligned with the previous histogram that's the open areas, but then this larger histogram showing underneath the canopy. And I will say that Great Smoky Mountains is probably one of the worst sites that we fly for this. So it is a worst case example. So something else that we've done and you will do directly after this is look at differences at Pringle Creek. This is a really nice site to analyze the uncertainty because last year we flew the whole site in bad weather conditions just to get lidar coverage because we didn't think that the weather was going to improve. And then, on the whole, the next day the weather improved. So the very next day we flew it again to get good weather spectrometer data. So we have two lidar collections, one day apart. So this we can assume that nothing has changed in the site from day to day. And so that we can look at, OK, well, how did these acquisitions change between these two days? So that's the lesson we're going to look at directly after this. So then there's also some larger processing uncertainty errors that mostly have to do with misclassification of the point cloud. So I mentioned yesterday about how we classify the point cloud into ground points, vegetation points, buildings, and unclassified. So this is a good example from the flat irons, just local to here. Where originally when we did our ground classification, it thought because those flat irons were so steep, there's no way that the ground can go up that fast and that steep. So it assumed that these were not ground points on top of the flat irons. And so it actually cut all of the top of the flat irons off because it assumed that that was vegetation. And so there was actually talked with Martin who created last tools that did the classification on this. He actually made an improvement to the algorithm that allowed us to correct for that error. So you can see this is the original profile across the flat irons where we were cutting off a lot of those tops. And then this was an improvement that was made to the algorithm that allowed us to do that. Unfortunately, this improvement works well in these cases, but doesn't work as well in some other cases. And so I still generally use the old way to do this. And I just got an email three weeks ago from the park service at Great Smoky Mountains and said, hey, you cut off a whole bunch of the top of the mountains in Great Smoky Mountains. So then I reprocessed it with the new method to correct that for them. So this can also happen with vegetation. So you can see up here, this is a RGB image of an area at Dead Lake, which is one of our D8 sites. And in this area here, there's a lot of really low vegetation to the ground. There's actually one taller tree right here. You can see it's shadow. And when we look at the canopy height model, all we see is this one tree. Everything else here is zero. So when the algorithm went through, it classified all this short vegetation as ground points. And then when we look at the digital terrain model, those are included in the digital terrain model. And then you can see them here in the hill shade. So actually, this misclassification of the vegetation points has added a lot of error into the digital terrain model. We look at a profile that goes across the digital terrain model there. I'm assuming the ground probably doesn't look like this. And that basically what we've gotten is a lot of that different vegetation that was incorrectly classified as ground points. So this can occur within our data. It's more likely to occur on short vegetation. I think Keith's presentation yesterday, he mentioned about the range resolution of the laser pulses in his waveform presentation. So the oak going width of the OPTEC system is oak going pulse width is 10 nanoseconds. And so based on that, we're only able to get a two meter range resolution. So we can't distinguish between two objects that are less than two meters apart. So when we get short vegetation that's lower than two meters, we're not going to get the ground point beneath that vegetation. And so then what happens is that the algorithm sees this as the last point, and it assumes that it must be the ground point. And then we get situations like this. So beware of short vegetation, because it can definitely affect the digital terrain models. So obviously, we would like to correct these things. But we're a small group here at NEON. We're collecting a lot of data. So we rely on our classification algorithms to get us 85%, 90% of the way there. Usually commercial providers will then have employees that get them the last 10% that takes 90% of the time. We're getting 90% of the way there, and then we're delivering the data. So I just say, if you're using the NEON data, it's good to be aware that the classifications are getting you almost there, but not completely. Yeah, so all this classification is done from the LAS files, which are available as the L1 product. And so we give those last files by flight line with no classifications. And so you can definitely reclassify the points. The other thing is right now, we use this. The classification routine takes in several parameters. We use a standard set of parameters for all the sites. And probably it would be best to tweak those parameters slightly for each individual site. If we're ever going to do that, we're going to need to figure out a dynamic way to calculate what those parameters are, as opposed to going in and changing them every time, because our process is so automated at this point. And there is, I have seen research on this starting to come out of figuring out how to dynamically calculate the parameters for the classification. So hopefully that's going to happen, and then we'll be able to apply something that does a little bit better. And so the regal system that we're going to start flying at the end of this year, next year, it's outgoing pulse width is 3 nanoseconds, as opposed to 10 nanoseconds. So that brings the range resolution of that system down to 60 centimeters, as opposed to 2 meters. So then our take home message is, for our uncertainty, is that we try to get those base stations at less than 20 kilometers to make sure our trajectory is a high fidelity. We bi-annually test that sensor, basically, when it's going out and when it comes back at the runways for the test the vertical accuracy, and then here at headquarters for the horizontal accuracy. And then we're monitoring that boresight, those boresight misalignments throughout the season. The simulated air in the point clouds are available, but remember, these are based only on the errors in the individual sensor components. So those errors have nothing to do with any sort of classification error that may be introduced into the point cloud, because that's something that's really difficult to quantify. So these errors really only tell you how well the sensor was operating, not how it interacted with the land cover. The ground point density and heavy canopy can be sparse, which can lead to errors in the DTM and the CHM. And also, these misclassifications are probably our largest source of error right now. So just to be aware for those. Can we get it out? Yeah, so what we do is we actually relate everything back to the IMU. And so the IMU is like our base orientation system inside the plane. And so when the IMU is tipping back and forth, and then the laser scanner is scanning out, we need to know what that difference is so that when we apply the roll and pitch and yon things, that it's being applied correctly. But then the NIST sensor is also sitting slightly differently. So we relate that back to the IMU as well. So since we have both related to the IMU, then we have that really high geolocation, relative geolocation between the two instruments.