 I'm not quite mispronunciating it. He will be talking for the next 40 minutes on boost geometry. At the end there will be some question and answer opportunity. Please have a seat guys. So today I would like to go briefly on some latest developments on boost geometry. Basically through examples so I try to hide technical details so hope it makes sense. So first some things about boost geometry is a part of Boost C++ libraries. It's header only. Follows the C++03 standard. It has metaprogramming and tag dispatching. So inside there are primitives, algorithms and spatial index. It follows these well-known standards and is used by MySQL to provide GIS functionality. So there is documentation, mailing list and the code on GitHub that you can search. Okay this is some basic example to see how we can compute some simple distance between two points. So these are some definitions. We define the point which is using double arithmetic and this two dimensional. This coordinate system that it's geographic so we use an ellipsoid to model the earth. I will give some more details later about this. And then we ask to compute the distance between these two points which is one in Athens and one here in ULB. By doing this we'll get this result. Okay this is the very simplest way that we can use a distance algorithm. Then we can parametrize it with a strategy which basically means how exactly we compute the distance of the two points. And we will get another result. So if we don't put a specific strategy here the algorithm would use a default one. So that's why we get two different results. So the main idea is that algorithm in Boost Geometry has two parts, coordinate-specific part and coordinate-not-specific part. So basically strategies are the algorithmic part of how we treat specific coordinate systems. Okay so this is the coordinate systems that Boost Geometry supports. So we can model the earth as a flat thing so we use Cartesian. You can also model it as a sphere. So this is the sphere equatorial coordinate system. And you can also model it as an ellipsoid which is the most accurate of the three. You can also parametrize with degrees or radians. And then there is another also even more accurate model that it's not supported and it's for special applications. It's called geoid that you also have latitude and valleys and maybe mountains. Okay today I will mostly talk about geographic but all the arguments I will discuss also work for the other coordinate systems. So the interface is generic. Okay so this is the main problem that we want to solve when we deal with geographic algorithms. So there are two main problems that it's called a direct or inverse geographic problem. So in the first one we're given a point azimuth and distance. And we want to compute the point, the new point at following this azimuth at that distance. Just like navigating with a certain angle and certain distance. And there is also the inverse problem that we are given two points and we want to compute the distance between these two. And in boost geometry there are some implementation, some algorithms, mathematical formulas that compute solve these problems. This is the strategies that we saw in the first example. So the names are, okay, there is one Vincetti, this is the state of the art is the most common iterative method that is used in GIS. There are also some to approximate methods, Thomas and Nandoyer. So we give the names of the authors, maybe it's not very convenient, but this is one way to distinguish. You can also do it as it is in the sphere, but this will be an approximation. And there is also another method that it's not, it's currently in a pull request. This was a Google summer of code student that worked on it. So this series approximation and it's supposed to be the most accurate of all of these. I will discuss about it a bit. And there is also an extra way to solve these problems by projecting and do it with Euclidean geometry. But this will be in a talk in the afternoon. I will see more details about projections so I will not talk about this. Okay, so this is an example of computing a distance between a polygon in Brussels and ULB. So I had the same example last year, but now this is included in the library. You can actually compute it. So you define the point which is in ULB and you define also the polygon. And then you ask for the beginning just forget the yellow part. You just ask to compute the distance between a polygon and the point. And you will get this result. Then if you want to use a specific strategy, I mean to compute in a specific way the distance between the points by using one of the algorithms that I showed you before. You can use for example Vincendi, which is an iterative method. And you will get a slightly different result. So in this particular example, I think in most of the applications you don't care about this difference in a distance. And you can use any two geometries here. So you can try to compute the distance between a polygon and a multipolygon in a geographic system. And you will get again the result. Okay, so let me go to a different example now. You have a distance between two points that are near the antipondal on the earth. So we have the globe and the one point is here and the other is antipondal. So it's the longest distance that you can get in there. So in this special case, the algorithms that we show, most of them, apart from the one with large number of series approximation, does not give very good results. So this is the default algorithm. This is Andoir. If you use the distance by default, you will get this result. But by using the more accurate series approximation strategy, you will get a 25 kilometers difference. So in this case, it makes a big difference to change the strategy. Because some algorithms for this specific case perform really bad. So they are very inaccurate. Okay, but this is a very corner case, the two points antipodal. Okay, so I would like to go to another example by showing another algorithm that we have. This is still in pull request, but it's under development. So this called line interpolate point. So we have a line string and we would like to interpolate points given some distance on this line string. So you will get something like this. So we'd like to compute on this line on a given line string in every 500,000 meters to compute a point. So yes, these are points by some of my colleagues. So it's Athens, Poland, Norway, Amsterdam. So I asked to compute interpolate points of this distance, 500 kilometers. And I create a line string with the coordinates of the cities. And I just ask to compute the interpolated points and store them in a multipoint type. And I get these results. So this is the result that you saw visually. I get a list of all these points. So by putting another smaller number here, I would get as many points as needed. Okay, this specific example of this algorithm. Okay, now another algorithm is area. We want to compute the area of a polygon. Here I give a very rough approximation of the borders of Belgium just to fit it in one screen. So, yes, you can construct a polygon like this and then try to see if Boots-Geometry can compute the distance, the area. So you can just say area of this polygon without by omitting all these strategies apart. You will get the first number by say area polygon. You will get the area of this polygon. And then you can parametrize it with different strategies that we show for distance. And then you will get slightly different results. So, okay, one question here is what is the most accurate result of all these? Okay, so we expect to be the last one, but then I will have some benchmarks showing with different cases which is the most accurate algorithm if you want to use it. But in general, it's not easy to know what is the most accurate strategy if you have a specific problem. Okay, so here we want to compute this area and you see the divergence is not that big. Okay, I would like to go to a smaller polygon, a smaller example. So this is the ULB campus, not this one, but another one. The other campus, it has a better shape, that's why I chose that. And so, again, I put the coordinates in a constructed polygon and I asked to compute the area by parametrize it with some all the possible strategies that I have. Now the results, again, are not very different. But if I go to compute the area of just a building inside the campus, I will get a large divergence in the results. So it makes sense to use all these different strategies and it has to do also with the distance that we calculate. So if we have long distances and short distances, we saw that if the distances are covered the whole globe, then you will have high inaccuracies but also in very small distances you may have a large inaccuracy. So let me try to put this in a more formal set. So there is a nice dataset available that contains one million geodesics in WGS84 ellipsoid. This is computed with very high precision. So the results there in this dataset are supposed to be exact or very accurate. So we can do some benchmarks of using our strategies and see how accurate or not and how fast are a whole set of strategies that we have in WGS84 ellipsoid. So the results are in this, I have them in this weekend. I think I will continue to update it if you want to see more details. Now we will give some graphs. So this is about performance. This is the whole set of strategies that I have presented before. It is Vincetti, Thomas and Doier in the series of, I just took series of order one, four and eight. You can see that Vincetti, Thomas and Doier are much, much faster than the series approximation. Yes, and compared the three of them, roughly Andoier is two times faster than Thomas, which is two times faster than Vincetti. And in most of the cases, this also follows the accuracy. So Andoier is less accurate than Thomas, which is less accurate than Vincetti. But we can see a more detailed view of the accuracy, let's say for distance. So this logarithmic scale in both X and Y and in the X axis, we have the distance between the two segments. So this is a data set of 100K sets of points. And I compute the distance with all the strategies that I have. And since I have the exact or the very accurate result, I also can compute the error. So the error is in the Y axis. We can see that with small distances, the series approximation, which is the less performant, is the most accurate. So then we have Vincetti and then it follows Thomas and Andoier that start with very inaccurate results for small distances. So you can understand by looking at this graph, you can understand a bit why we get all these inaccuracies when we compute the area of the small building inside the ULB campus. But then when the distances become larger, for example, when computing the area of Belgium, then this area, this formula has a bit better accuracy. This is the absolute error in meters, not a relative one. So at some point it changes and Vincetti becomes more accurate than the series approximation. And then we have Thomas and low series approximation and then Andoier. And what happens here, the top right, this is the, because the distance is very large, we have undepodal points. So then all of the methods, apart from the series that you see that they convert to a smaller error, all of the methods gives a very, very large error, became highly inaccurate. So for Vincetti this comes, there is a discontinuity but for Andoier and Thomas this also happens in very large distances that are not, that are not nearly depodal. So for example Thomas here has a very large error even if the points are not nearly depodal. So you could have a large distance on the earth. So okay, what the sense of this graph? So I think it's useful if you want to, if you have a specific application or problem and you know more or less what is your area, what is the distance that you expect to compute then you can just choose the right strategy with the right error. And then by looking at this you can also choose, you can do a trade-off between accuracy and performance. So something, okay, so I will have, I have one more graph and then I'll finish. I'm okay with the time. So this is computing the azimuth, so I get two points and I want to compute the azimuth in degrees. Again this is similar that for small distances I get high inaccuracies but then at some point in, I don't know, ten meters they, all the methods become somehow stabilized and then you face the expected accuracy. So Andoer which is series one approximation and then Thomas with series two it's a bit more accurate and this Vincendi alternative method interestingly becomes very accurate for large distances and series approximation is a bit less accurate but it's robust. So we'll get the same small error even if you get the adipolar points because Vincendi here will give you a very inaccurate result. Okay and this is the last graph with area. Again I take two points. I know the exact area and I computed with all these methods. Yes again a similar situation that in the beginning in small distances you have a large errors with the approximate methods. There is an extra method here with projection. I don't know if this is the best projection. I use an equal area projection that as expected in small distances is very accurate. So you project and you compute the area in Cartesian. So if you have a small area then the projection is accurate so you get a good result. But when you get a larger distance then projection is expected to be very inaccurate and the rest is as in the previous cases. So for large distances you have a sequence of approximations starting with one series approximation of Adoyer and then you have Thomas and then series approximation of large order and then you have Vincendi that works really fine between these two. These are large distances of many kilometers and again for anti-podal points you will get high inaccuracies. Okay so I think I transferred the method so at some point this for me at least is useful when you want to have some prediction of what do you expect for inaccuracies. When you compute distances and similar problems so distance is a core problem in all these algorithms that I show. So line interpolation, azimuth computation, area computation. This can be used as an index of what you can expect in the errors of your computation and also your performance. Okay I think I'm done. Times 10 meters I get 100 square meters. I get with some methods I get an error of more than 1000. Okay so the question is about the error of the error in 100 meters of distance. So yes this means that the methods of this case are very nice. So the compute rubbish. I mean these are approximate formulas so Adoyer and Thomas are formulas from 70s. But it's interesting since they are very fast. So it's four times faster than the iterative method that most people use in geography, and it's more than 30 times faster than this series approximation. For example this implement geographically used by post-GIS. If you use these methods in these cases yes of course you will take very bad results. It's a random result, it's really bad. But what makes sense is that if you are large enough and not covering the whole earth, for this case you get a good approximation. Two of those curves drop at the same point, improve accuracy. Is that coincidence or are methods related? Yes, around there. Okay there is an issue with the data set there. I hide this. So the data set has its one million points. It has 100k of random uniform distributed points on the globe. There is 100 of small distances and then there are corner cases. So I got the 100 small distances which is less than one kilometer. And then the rest has more than 100 kilometers. So there is a gap. Of course it would be interesting to increase or somehow add some data to this data set. But I think it's a useful data set. So it's very accurate but of course yes you can improve it. I don't know. You mean 3D geographic? Yes. Okay. I don't know. I mean I don't know of any implementation that do 3D geographic. Maybe geographic leap has some, I think no. But it's a matter of definition. How do you define? So these are geodesic. So it's the smallest distance between two points. This follows some differential equation. If you have two points on the spheroid on the ellipsoid of revolution but then you need a different model. So you should have altitude but also you are going to have mountains. And then you should define some, I don't know, minimum curves that minimizes the sounds very exotic. Maybe you can do it with just 3D. Not geographic or some approximation. Yes it's series 1, 2, 1, 4 and 8. This is series from 1 to 8. So you can get 1, 2, 3, 4, 5, 6. It's like 4. Yes. Or you get very small accuracies that are not visible there. So nanometers. So this implementation is the implementation of geographic leap. So the student implementing in boost geometry, the algorithm that was implemented in geographic leap. And it's very accurate but also very slow. But yes, it makes sense to just use the fourth order. I don't know. We can work on this. I mean, I'm sure that you can play with math and create some method that is hybrid. Do you know of any other? I mean, does it cover your background? Do you know any other algorithms? I mean, there are also some... Okay, you can approximate by... Sorry, what is your question? What is your question? Two polygons in geography, compute the union, and then... Okay. Okay. In the geographic coordinate. I mean, as it is now the implementation or if it's possible? As it is now, apart from union, I don't know. Intersection you can do. So given that graph, I would like to select the strategy that gives me a good enough result. So if I want the error with a 1% or 2% error maximum, or 10% error maximum, I would want to automatically... Yeah, I want the area of this polygon, whatever size it is, it could be big, it could be small, and give me just something with a manageable error. Okay. Is that something you could automatically select? Is it possible to... To read about this algorithmically and pick one? Yes, I guess. So, okay, first of all, you have to create the graph with relative error. So this is absolutely bad, it's easy. Just create a new graph and just pick some thresholds and inline them in the algorithm. So this would be an easy way to do this. Yeah, because... So because you know the thresholds, but then maybe there is an issue because this is randomly distributed points on the Earth. So maybe some specific, or maybe close to the poles or close to the equator. Maybe you have to do some case analysis of specific places. It will be a guarantee because it's the maximum, this is the maximum. You will get a guarantee, but maybe you could get a better result if you are, for example, in Europe because the maximum error that happens here, maybe it's on the pole. But you get the guarantee. As far as the 100K distributed points on the Earth, so matters, so... Yes, yes. Yes, this needs more investigation. So the exact... Okay, so the question is about the exact values and how you compute the Earth. So the exact values are given in this data set, and there is a paper that describes how exactly they get this value. So it uses a geographic leap with some exact arithmetic. So it's expected to have very, very accurate results. So it's... They run the code on this data set with very accurate arguments and they give you the results. So then you can... And also there is some analysis of the theoretical part. But in practice they use high precision arithmetic and then... Okay, the performance is very slow. A large experiment. They compute the exact, very close exact values and then you have the data set. Can I have a different ellipsoid? I have a user to find the ellipsoid. We have to do the experiment again. You have to do this... With different thresholds. Yes, you have to... With a different ellipsoid you have to run this whole experiment with this data set. Actually create a new data set. Yes, the data set could be the same so it's longer than the latitude. But the values will be different because you change the constants of the ellipsoid. You have to compute the new data set. But this could be... Then automating this sounds like a nightmare. But maybe for very well known highly used ellipsoids you can do it and then have some thresholds and this could make sense.