 Zet de presentatie op, maar ik kan het al vertellen wat dit is om te doen. Vooral de vorige presentaties vandaag, ik ga niet een applicatie proberen. Ik doe een presentatie over het library dat kan onderpinden van toevallige applicaties. Het is al onderpinden van mijn eigen werk, maar tot zover ben ik de enige user. Wat ik hiervoor bekijk is meer users. Mijn naam is, ah, dat is perfect. Mijn naam is Alex Haakensanker. Ik ben een lecteur aan de Universiteit van Surrey. Ik doe research in land-use- en transportsysteem. En een deel van mijn research betekent om te werken met rasters en te doen creatieve dingen met ze. En in order mijn creatieve dingen te doen, ik heb een library nodig. Het is, zoals van de dag vorige dag, het is een OSDO-community-librer. Dat betekent dat het door OSDO-librer beantwoordig is. Het zal er op hun website zijn. Dat is een mooie advies voor mij. Maar het betekent ook dat ze uiteindelijk het was goed genoeg om op hun website te hebben. Dus als ik zeg, we gebruiken de library voor mapalcobrat. Het C++-librer voor mapalcobrat is gebruik om eerst te zeggen wat mapalcobrat is. Of wat ik het betekent. En het is een conceptueel model waarin we de geografieke informatie en science gebruiken. Of klassieke verschillende operaties die we hebben op spatiele data in general. En op rastere data in particular. En er zijn drie of vier meantijpen van operaties. Dus we hebben lokale operaties. Daar hebben we twee lagen. En we verbinden ze om een tweede lagen te krijgen. En alles wat we doen om een tweede lagen te krijgen, is om dezelfde locatie te kijken en een paar operaties op te doen. Dus je moet nu twee lagen aanbieden op de locatie, op de locatie of op de cel, op de cel. Je hebt volle operaties. Waar de nieuwe lagen die je calculaties, niet alleen een functie van andere lagen en andere lagen op dezelfde locatie. Maar het is een functie van de andere lagen. En de cel's, of de pixels die je vindt in de neighborhood en in de proximiteit van de cel die je calculaties hebt. Dus ik calculaties de zon van deze neighborhood van vijf cel's hier. Veluwen 1, 2, 3, 5 en 2. Ik aat hem allemaal op. En dan mijn uitkomst is deze value 13. Dus dat is een volle operatie. Ja, want er is deze window en er is de focus. Dat is ook een kaneloperatie. En dan zijn er zonaloperaties die basic zijn. Als je een zonmap hebt, dan calculaties je statistics voor verschillende zonen. Het zijn de volle operaties die computationaal het meest interessant zijn, of de andere dingen, die je om de cel in de cel te gaan over je data. Dus dit may sound pretty basic. En het is eigenlijk pretty basic. En toen ik begon te werken met dit, ik dacht dat er er veel libraries zijn die je gewoon moet doen wat je wilt doen. En er zijn veel libraries. Ze supporten allemaal map algebra in welke manier. En dat is libraries, ook tools en scripting languages en zo. There are libraries for dealing with raster data. There are image processing libraries. There are linear algebra libraries because what is a raster other than is the matrix. There are scripting languages, there are specie rastered with r raster and so on. And of course not with your q g's and your seca kiss and so on. All of these do something, but none of them do everything. So what do I have in mind when I say no, I want them to do everything? Well, I'm thinking about users like me, people working in due computational areas that do not like to be constrained by the specific tools and methods that are available. That if they have something in their mind related to map algebra, that they can actually implement it. And therefore, my most important requirement for my library is that it's customizable. And that is the main thing that is lacking in practically all existing tools. I want to be able to have a local operation that is just the function that I want it to be. Not just the sum or the sine or the cosine. I want to write my own function and then apply it on a bunch of layers. If I do a focal window, I don't just want to calculate the mean or the variability, I want this focal window to calculate exactly the statistic that I want it to calculate. And again, that's not easily possible at the moment. Of course it's possible, otherwise you couldn't write the library, but it takes a lot of work. It needs to be scalable. It shouldn't just work with small data sets because often our raster data sets are very large. We might have a raster data set of Europe on a 200 meter grid, and it still needs to work. You need to have some kind of buffering going on, and you cannot just have all of the cell values in memory. It needs to be efficient, of course. It needs to be flexible. There are many different types of raster data, different formats, and so on. I want to be able to work with all of them. It needs to be usable. For me, being usable means that if I use it in a programming language, like C++, then I want to use C++ language to work with it, and not some kind of invented markers and so on. I want to use the language features of the language to work with my data. It needs to be natural. And it needs to be compatible. That's another part. I don't want to have a tool that I can use standalone. I want to use my map algebra operations in conjunction with other analysis because it might be part of a model or it might be part of an optimization routine or anything else. Those are my requirements. The first thing is to come up with the right concepts to be using in the library. I am basing my concepts on a concept that is currently very popular in C++, and it's the range concept. There's already a boost range library, and there is an official range proposal, en it's going to be part of the C++, the 2020 version. You can find it online at that website. And a range is a very simple thing. It's a bit like a container where you have a bunch of values. You can go from the first one to the last one. The only difference between a container and a range is that the range says, actually, you don't need to have all those values in memory. The only thing that you need is the mechanism to go from the first value to the last value. You might go in incremental steps, you might randomly generate numbers, or you might be a container and have all those values stored on this score in memory. So that's the difference between a range and a container. Then there is the concept of the range view, and that's a special range that you can copy in constant cost. So that means that it's a range that you copy. You don't need to copy all of the different data values. Either you have a reference to the data values or there are no data values to begin with. My variation of the range is the raster. It's a range with some additional information. You can go from the first value all the way up to the last value, but it goes in rows and every row has columns. So a raster is just a range of which you know how many rows there are and how many columns there are. And then I have one additional requirement. If I have such a raster, I should be able to get a sub-raster of it as well. So if I have a raster of 100 by 100 cells, I need to be able to access just 20 by 15 cells within that big raster. That's the only additional requirement that I have for my concept. And then we've got a raster view. It can be copied in constant time. So in terms of C++ requirements, then this is what we need. So in order to be a range, we need our objects to have two member functions. Begin to find out where is the beginning of my range en end to find the rest of the end of my range en all of the values in between. You've got an iterator that can go over all of the values. If you have a sized range, you also need to have a member function size. If it's a few, you need to be able to copy it at constant time. And then here's the new stuff or my stuff. If you want it to be a raster, you need to be able to ask it how many rows do you have, how many columns do you have, en get me a sub raster of you. So those are all the requirements. If I write a class that needs to act like a raster, so it needs to conform to the raster concept, these are all my main requirements. If I implement that, then I can do whatever I do in the library. With rasters, I can do with that class. Oh, boy. It doesn't all look grand, but I think the main thing is there. So this is how it works. You can just open a raster file. And after opening it, an open raster file is not something that I do, it's something that another library does for me, the GDAL library. And once it opens, it adds this raster interface to it, and then I can use it like a raster. So I can have a very simple range-based for loop, and I can go over all of the elements in my raster and add them together. Simple as that. Likewise, I can open a file, and then I can say, go over only these 10 rows and 20 columns. And then the cost of this little program that opens a big dataset and then iterates only over small parts of the dataset is proportional only to that small part of the dataset. So you don't have the cost of the big dataset. Okay. So that's a raster, then let's look at local operations. So local operations, you've got a bunch of rasters, layers on top of each other, and cell by cell, you're going to do operances on them. And what the library does, rather than calculating, immediately when I add two layers together, immediately calculating the sum of all of the cells, all of the pixels, it produces an expression template that postpones calculating the sum and only starts calculating at the moment that you're iterating over the values. So the expression template, that is the result of an operation on a raster, is very cheap to produce because it's just the instructions to calculate something, it's not actually calculating something. Then only at the point that you're iterating over the values, you're calculating them. And that has benefits because that means you can add two rasters together, do another function on some other rasters and another on some other rasters and all of that is very cheap and then you can combine them with another function en then iterate over them en you can have this whole compound expression of raster algebra which you're only going to iterate over your values once and you don't need to produce any intermediary values. So you don't need to create raster data sets, raster data sets, raster data sets just to store intermediate values. There are no intermediate values. Everything is resolved at the moment that you're iterating over the values. So that makes it more efficient. No. And also more robust because you don't need to write to disk or anything like that. There is in the library a very generic transform function where you can just give any function and apply it on any number of rasters. So here's an example. Open to data sets A and B apply the transform function using the function plus on A and B that's cheap, that doesn't cost anything, that line and then iterate over this sum and only at the moment that you're iterating over it now you're incurring computational cost and because you use the sub raster here you only iterate over a part of the raster you only pay the price of adding those pixels that you're actually looking at and not the whole raster. Sorry guys. Do you know how to flip it? I don't know. I don't know. I touched something. Well, just leave it because we will lose too much time. Yeah, I'm leaving it. Yeah, I know, but the mouse is here and the button is there, so I don't know. Can you manage that? We continue like this. Yeah, just talk. So what this example shows is that you can combine combine two transforms, yeah? So I had the first transform that does the minus ooh, yeah. First transform that does the minus the second transform that does the squaring and you don't need intermediate results to combine those two operations. This transform operation can be quite cumbersome en you can use it as a building block for other operations. So you can overload your operator minus and your operator times multiply and you're actually adding rasters and multiplying rasters cell by cell. This I will skip. I just want to tell you that in working with raster data we often have missing values and working with them can be cumbersome but in C++ we have the optional data type and this library uses the optional data type for missing values so it becomes a very logical way of dealing with missing data and every function that produces missing data knows how to do it and every function that can account for missing data knows how to recognize it. There are a bunch of functions that do some kind of spatial transform dat do not copy data either so again they are being efficient by not copying data but just by referring to data and one function is path which adds trailing rows leading and trailing rows and columns to a data set so you just put a frame around it make it a big bicker we've got sub raster which you already mentioned we've got offset which for every cell gives you the value in another layer columns to the left columns to the right and you can iterate over pairs of adjacent cells and all of that is being used when you're going to do focal analysis and one very popular form of focal analysis is moving windows where you have this window that moves over the map and you calculate the summary statistic for the window en if you want to do it efficiently you do that as a big fat combination of all of these expression templates so you don't need to create any intermediary values and that is possible because there are moving window schemes where you say if I've got my my summary for a pixel then I can get the summary for the next pixel by adding the cells to the right subtracting the cells to the left and then I've got my next window and I can move it again so you can increment if you've got some kind of online statistic you can incrementally move a window over the map and as these plusses which are the cells that are being added to the statistic and the minus those are the cells that are removed from the statistic as you move over the map they move in exactly the same pattern so all you need is a range for each of those offset that does all of the adding so you can make these moving windows using the simple offset function and using the range view concept and it will be relatively cheap and if you separate the logic of the window moving over the raster and the statistic being calculated then you reach this ideal of being customizable so if I express all my statistics in forms of adding elements and subtracting elements or adding groups of elements and subtracting groups of elements then I can implement any statistic that I'm interested in in this moving window framework and again this is how it works you open a map you specify the window you specify the indicator that you want to use and then you just have a function that says this is my moving window and all of that is cheap and it only gets expensive at some point where you're going to iterate over that window and again you can do it for a subset of the raster I will skip this I will skip so one problem let's say what it is about one problem with all these concepts and templates in C++ is that you get very complex classes that are compound of all kinds of things and you can use a type erasier to make it complex classes simple again so that you already at run time know what classes you have that you can look up I would say so what's a big advantage that I am not taking advantage of yet but I plan to do it in the future this sub raster idea that you can set up your expression for the whole raster and then evaluate it only for a part of it that is of course ideal for parallel processing because I can have my expression for my big raster and then say do this block then this block then this block and so on all of that is parallel so that it goes more quickly and actually it's already there the only thing that needs to be done the rest of the library needs to be thread safe and especially the part of accessing accessing the raster data is at the moment not thread safe so it's not paralyzed yet but the architecture is there to actually do it so what kind of applications do we have we use it or I use it for seller automata land use modeling I use it for skill dependent landscape analysis I use it for fussy set map comparison and all of these things can be done very efficiently thanks to no the clever design of the library I would say for the conclusions I wanted it to be customizable yes it is because you can use any function for local operations and you can use any function for moving windows I wanted it to be scalable yes it is because it's built on GDAL and GDAL is scalable I wanted it to be efficient without temporaries temporary files it is because of the expression template I wanted it to be flexible and usable and I think all of that is achieved it's open source as it should be you can find it on this website if you've got any questions and as I said I'm looking for users so if you need any help getting it to work for your project I'm happy to help as well so if there are any questions I'm happy to answer them no questions no no I didn't do that no there is potential for parallelizing but I haven't actually done it