 Dzień dobry, jestem Adam i pracuję w Oracle, moją sekundę. Uwielbiam biologię. Nie mogę zacząć teraz, ale... To jest basically my introductions. Tak, więc uwielbiam biologię, uwielbiam algorytmy i datastructures, które są wtedy używane przez mySQL, żeby wprowadzić... ...geometrical computation, kiedy nazywa się... SQL, MMM, SFA, feature, sd, interseks, czy coś takiego. I internalnie mySQL nazywa się boost geometry functions. Tylko po prostu. Boxy i punkty. Polygony są 2D. Ligeria pozwala... Wydaje mi się, że interfejstwo pozwala, aby się rozwiązać w ten sposób, ale jeszcze nie jesteśmy. Dzień dobry, jestem Adam. Dzień dobry. Myślę, że to jest gometrii RTRI. For those of you who don't know, those geometry is C++ library, part of boost C++ libraries. And we provide primitives, algorithms, and special index for you to use in your C++ codes. And library is used by mySQL for GIS features. features. The Ertree is a self-balanced tree, which is used for spatial searching and KNN searching. And there are various balancing algorithms or various ways how the Ertree can be created. When you are putting elements one by one, then tree is balancing itself. And there are also class of algorithms which are used to create the Ertree from a number of elements at once. And this is called packing or bulk insertion algorithms. And the specific features of boost geometry implementations are listed on the slide. So the class is in boost geometry index namespace. It allows you to store your own user-defined types. By default it supports any type adapted to point, box or segment concept from boost geometry library. And also supports pairs and tuples. There are three balancing algorithms. I'll show them later. The size of a node is defined by the numbers of objects, starting a node, as opposed to, let's say, database implementations of the Ertree, where the size of a node is defined by the page size. And it allows you to perform advanced queries. I also show them later what that means and iterative queries. So here are the examples of internal structures of the Ertree created in different ways. Linear quadratic and R star 3 versions are the ones created using balancing algorithm. And packing is the packing algorithm we use. It's a version of STR algorithm. It's a sort by recursive variation. And as you can see, the internal structures are different. Some of them will be better for searching, because in some of them there is more overlap of the nodes. Some of them have big nodes. Some of them have smaller nodes. The less overlap, the less number of nodes, and the smaller nodes are better for searching later. But how you can expect, it takes more time to create a better structure of the Ertree. So these are the times. In general, the more time you spend on creation, the less time you spend on searching. So there's a trade-off. Depending on your application, you have to choose one or the other. The tree created with packing algorithm is the best in all situations, but on the other hand you have to know all of the elements in front of the creation. And there is also one more decision you have to make before creating the Ertree, which is the size of the nodes. Size of the nodes, the number of elements stored in the node, which will be influenced by your knowledge about the domain you're working with, and specifically about the kind of data you're working with. So for non-overlapping elements, the tree will behave differently. Here are times of creation for number of elements stored in a node. So here is nothing interesting, maybe just that the creation of the tree is longer for all algorithms when there is overlap. But if you see the searching, then here we have something interesting. It seems that for overlapping nodes, for overlapping elements, if you have too small number of elements stored in a node, then you increase the time of search. So the more overlap you have in your data, the bigger the node should be, basically. Later we can talk about it after the talk, why that is. By overlapping, I mean physically overlapping. In 2D, for instance, if you have big objects overlapping each other, or you have objects in higher dimensions, then there is more probability that they will overlap. Ok, now a few examples. I'm using the data from this website. So here are the includes and some namespaces I'll be using in the code. The second one is only if you clone the repository from GitHub, because it's a part of the code which we call extensions. But it's only for loading a shapefile. These are basic definitions of types I'll be using. In this geometry, the default way of defining the coordinate systems that are used in data are part of a point type. So here we can create a type capable of storing cartesian points, spherical points i geographic points. I'll be using the last one in the examples. There are other ways how we can affect the algorithms and also use different spheroids and things like that. But for time being, let's assume that when I'm using geographic points it's WGS84 and I'm calculating stuff on the surface of ellipsoid. So we are not doing any projections, we are on the surface of ellipsoid. Which is good for a class of problems that are global, where you cannot choose the best projection and you just want to stay. You could approximate the globe with a sphere, but here I'm showing the geographic no, how to deal with geographic data. So for this you will need for loading the data I'm showing you will need the extensions, but you can do it however you like. You can write your own shape file importer or fill the data, whatever you like. And the first example will be to find points which are near another set of points basically. So this is the data from this website. One is populated places which are defined like a point representing a place where at least 1000 people live or something like that and the green dots are airports. So I'm searching whether or not some airports are near the populated places like that. And by small area I mean a box. So here is a naif example which has quadratic complexity and what I'm doing is I'm iterating over all the places, then all of the airports and I'm checking if an airport is covered by a box, the small box I created and then I'm printing a result. So I'll show the timings later for the other algorithm, but as you may expect quadratic complexity we can do better. So for this purpose we create R3. Storing points and this will be R star variant storing four elements per node and I'm calling insert for all of the airports which forces the RT2 balancing algorithm instead of packing because I'm inserting one by one and then I'm iterating over all of the places and performing a query passing covered by predicate to the query function of the RT and taking the result as an output iterator which in this case is back insert iterator. But we can do better because we know the data up front. So I'm creating the same RT but using packing algorithm and for this purpose I'm passing the iterators to the range into the constructor of the RT. So here we have all the RT created. R3 created with packing algorithm which should be faster and then I'm using query iterators instead of query function which iterates over lazily iterates over the results so we can for instance stop it at some point if like. And if you prefer working with ranges instead of iterators then the RT also supports it so I'm just passing range of elements which are then stored in the RT using packing algorithm and then instead of standard for each I'm using range-based for loop and for that I'm creating a range a queried range using range adapter similar to boost range adapters So what's the red now is a range representing a pair of iterators basically just as it was on the previous slide and here is my result so some pairs of points which are close to each other. So the next example which will be more interesting maybe so now I'm traveling Seisman and I want to go through all of the airports on the world as fast as possible So classic problem again so this is my initial preparation where I have airports I'll be storing the result in a line string containing geographic points and I also need helper structure for to-store flags whether or not I visited an airport or not here is one of the possible solutions which is the classic one which is called greedy heuristic so basically I'm choosing the closest airport from a starting point from that I'm choosing the closest airport again and again and again and I'm doing it until I have all of the airports so until I have all of the airports I'm checking all of the airports checking the ones which are not visited calculating the distance is picking the smallest one and then when I have the smallest one then I'm adding it to the solution but again the previous algorithm has quadratic complexity so we can do better and here I'm using the entry for that I'm storing a pair of my point and an index I need an index to in order to access the flags visited flags so I'm storing a pair this works by default and I could create a vector of pairs first but I decide to do it on the fly using boost range adapters so here I'm creating a range representing first transforming indexing airports with an index starting from zero range of indexed airports into range of pairs this is what you can do with boost range adapters so I have all of the data needed in my entry and then I'm doing again the same until I have all of the all of the points I'm performing query but this time I'm not traversing all of the all of the airports to search for for the closest point but using the entry for that and I'm passing the nearest predicate and I need so I need the nearest point one nearest point which also satisfies this condition that it is not visited and then I have one in the result as output iterator I'm simply passing the pointer and I'm putting the result in the root line string and setting the flag and that's it and this is my result so traveling salesman went through all of the airports at some point traversing the anti meridian so the daytime change line so this looks good and the whole route is like 400,000 km long and this is some benchmark of both algorithms, the one with quadratic complexity is gray and as you can see the solution using R3 is guiding better and that's it from me thank you and there are three minutes we have some time left let's say about three minutes for questions any questions that you want to throw at him hi do you have any questions for you that you want to sort of ask the crowd what do you want from them how can they help you probably to check out the boost geometry try to do something interesting with it like if you have a problem to solve just consider trying boost geometry and and then maybe share feedback there is one question or two questions could you speak up a little bit please so we are talking about this one okay okay yes so the objects representing the internal nodes of the tree are in fact boxes or rectangles however in leaf nodes so at the lowest level I can store various things and I've chosen to allow to store simple geometries like points boxes or segments so because otherwise you'd be forced to for instance represent a point as a box and then you have a duplication of data basically so this is why it is possible in boost to just store points and the tree is parameterized like it allows you to pass data type or vario type in compile time as a template argument sorry no no no it's only a matter of storing data in leaf nodes we can talk about it more because unfortunately