 things, but I'm the quick and dirty visualizer. I just want to sort of see quick results and my time is more into the proper visualization things. As Verenice has the information sent around, the slides are all on GitHub. I'm going through an HTML because I think that renders a little bit better if you want me to do live things and change things and so I can switch to our studio and try to do it there, but I'm not really a sort of fluent RStudio user. You can also, from the slides, you can find, actually you can go to this GitHub site and at this GitHub site you will find this stars.rmd. You can load that into our studio and then click knit and then the whole thing should be run or you can run individual sessions and so on. So there's an Rmarkdown file that has everything in it. Right, so this is the way that you could set it up. It might be best to use the development version of stars, which is easy to install. It's pure R. It has no dependencies except for the SF package, but you can install that from Chrome. And there is a larger data set. You don't have to install, but you can do it. It contains a couple of sort of real sized satellite images, including a sentence of two images. The whole package is like a gigabyte or so. So that's why I put it on a different site, not on Chrome, because it's this kind of material that we can use. You can throw it away after we've done. So what are data cubes? Data cubes are kind of more complex tables, right? So data cubes have long history. Let me just switch to this few. And I always go back to, I think they go back to the so-called OLAP cubes, which are online analytical processing cubes, which are things that appeared somewhere in the 80s or 90s, where people looked at tables and so on. And then they said, yeah, tables is fine. But quite often we have things that are arranged by kind of multiple dimensions. For instance, the classical example is where you have warehouse data. You have different warehouses in different cities, a large number of products, and then you have sales numbers over time, right? So that gives you three kind of keys to index the data and the data can be thought as a cube. But this is a literal cube of three dimensions, but we typically take the idea for multidimensional array data. So that is n-dimensional cubes. Cubes can be two-dimensional, can be one-dimensional, but then, you know, why call it the cube? And it can be like five or ten dimensional, although it will be hard to find those. So they are not strictly three-dimensional. They can be n-dimensional. And standard tables, as we know them, are basically one-dimensional cubes. They are arranged according to whatever their row numbers or something like that, but they are not multidimensional. But a lot of multidimensional data is stored in one-dimensional tables. So we will look at that and see how data frames relate to data cubes and go through examples. So raster data sets are specific cases of data cubes, but are also factor data cubes that we're running to. The package stars, I started writing that I think like three or four years ago, to basically have an entrance for data cubes, including raster data that works well together with the SF package that nowadays a lot of people use for feature data, for vector geometries, point lines, and polygons. So let's go to a very simple example, socioeconomic dataset called product that is found in the package PLM for panel linear models. And you see that here, these are panel data. Panel data are time series data where we have data collected on a set of subjects that could be persons, companies, country, states, and so on, that make them, and that makes them multidimensional and spatial temporal, right? Because we have several persons, they are items in space, and we have sequence of time of which we have data collected. So here's sort of the first set of entrance of these data. You see that we have states, and the states, there are all the 50 states from the United States, and we have years that run, this is our first six records, that goes I think over 17 years. And then we have a number of variables, including the capital, highways, a water, a water metric, some kind of thing with utilities, and then think related to employment, so all kind of socioeconomic indicators. And we can, for instance, use ggplot to just plot these public capital variable as a function of state and year, right? And what we then get is kind of a ruster image, yeah, it's an image in any case, where we have year on one axis and state on the other. So this is a spatial temporal plot where we just have labels for states and numbers for years, but we know what I mean. What we then see is extremely boring actually, what we just see is the difference between states. And then we can look at the state, you see here, New York is very large, California, so this is basically the effect that larger states have more capital, right? So we don't see much happening in the temporal axis, there might be a signal, but we don't see it because it is sort of the whole thing, looking at it like this is dominated by the difference in size in states, right? So one thing that you would then immediately, if you want to look at sort of signals in these data, like where are these states going, then you would want to normalize them and say, okay, we want to take out the size effect of the state and, for instance, divide by, you know, divide by the state by the average of the time series, yeah, we could do that. That is one way of doing that or you could subtract that. That's another way. So what we do is we put it in the stars, we convert it into a stars object, and that is basically a two-dimensional array that has nothing special. You could also do it in a regular matrix. This does adds very little compared to a matrix. It just adds basically the state names for the for the dimension state, and it says I have years, 17 years that sort of start in 86, that end in 86, and then it doesn't matter. So we have the different variables here, sort of listed as attributes, yeah, and then we have sort of laid them out now in a two-dimensional structure. We see that the dimension table here at the bottom tells that we have basically one dimension state and one dimension unit, yeah, so we have two-dimensional data here. Next thing is that we're going to apply something. We're going to apply a function to the first, over the levels of the first dimension. That means that basically that we work over the second dimension. What we do here is basically taking the values and dividing them by their mean. So we take out, we divide by the mean of the time series, and by that we essentially filter out the size effect of the state, right? We normalize by state, and we get a similar thing that looks very similar, just it has now switched names, but it doesn't matter really much, and we can sort of write that out by sort of moving that into a data frame again, and then we see that all these variables are now suddenly normalized. They have different values, and if we do the same plot again on this new data frame, then we see that all the colors of the states are similar, right? But we see that some states go from light to dark, meaning that they have decreasing values over the 17-year period, and other states have increasing values because they go from dark to light. So we see that there's a very different signal, different states, due different paths over the 17-year period, and we did that by essentially kind of sweeping over one dimension in this data set. Now this is still very little to do with really with space and time. These are just numbers and these are just labels, so there's nothing geographic in it in the sense of states, but we can instead of the state names, we can actually glue the state geometries to that, and I just did that in the past, I did it sort of half an hour ago, so this might not be in the slides that you looked at earlier, but if you would refresh them, you would see that. We can take the states from the package maps that has a command map state that maps to the states. We have to throw out the district of Columbia because that's not in the product data set, then we have to do a match, but then we have to sort of take care of that in one data set, New York has an underscore and the other state is that New York has a space between the two words, so we need to take care of that, and then we see that we have essentially an alphabetic ordering in both, that is very convenient, that a lot of data sets do that, so we don't have to reorder one according to the other, we can just brutally sort of plug in the state geometries and then we get the same data set, but instead of the state, we see here that we have 48 things that are called SFC that are now geometries, multi-polygon geometries, so we have a sequence of 48 geometries that are, that have references in WGS 84, and we can then for instance take the first, sorry, take the first year as a slice and then we get essentially a dimension which runs from one to one, we can drop this dimension here with a drop and then we sort of get rid of it, so we have just the first year, we just have all the variables for the first year and for all states, and then if we plot that, if we convert that into a simple features object and plot that, we get essentially a set of maps that shows the spatial variability for the first year, and so this is kind of a time slice that we have now done. Let me just see, are there any, whether there are any questions on the chat, if there are any, then I don't see them, right, but so if anyone sees a question, then I need to be informed. I don't think there is any question yet. Good, okay, but please tell me when there is, because I don't get notified, I don't see them. Yeah, of course. Yeah, so region has no variability, the other ones have, and this is kind of a time slice through this data set where we see the variability of the different variables, you have the different attributes, and alternative we could pick one attribute and then give the 17 maps of this attribute, sort of how it develops over time, that would be, would have been an alternative, possibly more exciting, possibly more exciting viewed in this view, all the variables have been essentially have been stretched and they don't have a common legend. So this is already what I call a vector data queue because it is vector data, and it is time, right, and we have sort of variables that vary according to both. An alternative is the data set that we use a lot in this SF package, North Carolina Sitz data set, Sitz then for sudden infant death syndrome, that package is loaded like this, we get an, we get an example with, with a geometry column, which has somewhere with every record, the geometry of the county, and what it has is a couple of variables, but they mentioned the counties, another dimension, the variable, so which variable is this about, these are the three variables, and which has a third dimension, the years, which are only two years present in this data set. Yeah, so we can, we can get that, and we can, in this, in this case, we only have, we have as county, we have only the names, just like in the previous one, and what we're now going to do again is use the geometry from your original data set to replace these hundred names with their respective geometries, and right then we have a real spatial vector data cube, which has these hundred vector poly, multi-polygon, so the hundred geometries for the first dimension, and the variable and year is the second and third dimension, so we can do this plot, for instance, here we're applying a function called sum, yeah, so we sum basically, we do that over the first and second, that means that we sum the years, yeah, so we run this function, we iterate it over all instance of the first and second, over all county and over every variable, so we sum the two years that we have, and by that we reduce essentially this variable a year into the sum of those, and then we can plot that and we get the three variables for when summed over the year plotted by county, and of course this is sort of squeezed into one common legend, and we see that the birth is of course large in the SID values, our SID, the infant death values are very low, because this is a very rare disease, luckily, so we see that, but anyway we still see some variability, because the color breaks that are chosen, are chosen by quantiles of the data, so we still have in these low numbers, we have a little bit of variability, now these are not sort of really very meaningful data to look at in this in this terms, if you think of epidemiological effects, again because of course when we have more people living in a particular county, we also expect to have more disease cases, right, so the pattern that we see is essentially that we see a strong correlation of birth, yeah, because birth is essentially a proxy for the population of this county and the disease cases, and that is something that we want to get rid of because it is obvious, so we want to get from absolute numbers which basically have population in them, we want to go to incidence rates, right, that is a standard thing to do in epidemiology is that you go from counts of a disease, that you basically divide them by the population and you normalize that to something that is then divided by the total counts over the total population, so that we get a risk, yeah, this is essentially a fraction and this is sort of the global fraction that we expect, so we get a standardized incidence rate that is larger than one when this state has more than the global average of of incidences and it's smaller than one when it is lower, when it has lower incidence, yeah, so the value one is for this standard incidence rate is kind of a thing that is of interest to look at, we can do that by essentially by lowering the dimension, what we now do is splitting is basically distributing the second, the second dimension which was this variable birth, SID and non-white birth over attributes and then we get a two-dimensional array which only has the counties in the years and which has three attributes, yeah, so we see them as three attributes, so we lower the dimensionality of the array, we have now these three attributes and then we can, here, we basically, we sum by the category and we get the sums of the birth, the SIDs and the non-white birth and we can get the global incidence rate basically by taking the second, yeah, the sort of the total SIDs divided by the total births, yeah, so we get now the denominator of this of this ratio and now we can, for each of the counties, we can do the other sort of the other fraction, so we have the denominator and now we can, the other fraction we basically apply over all dimensions, over each of the states and each of the years, we basically take this ratio, we take the ratio of the SIDs over the births and then do that for every state and year, every state and year and then we get, and then we divide that one by the global mean, yeah, which is the sort of the reference, the reference value of the whole state of North Carolina and then we plot that and we do that, we basically plot that and we get the two years that come out of this and here you see that basically here the effect of population size has been largely reduced, if not completely cancelled out, you see that the color scale that I took a bit of an effort to pick particular color breaks because I wanted to have the break, the value one in the middle and I wanted to use a particular color scale that has, that is a diverging color scale and we can see here that we have, we get white values for one and that we get increasing blue values for lower than one, when it gets lower than one, we get increasing red values when it gets larger than one, so this is a case where you want to basically show two things, is it lower or is it larger or smaller than one and how large is it, yeah, so the red values are increased, higher incidence rates, the blue values are lower incidence rates than the state average, yeah, so this is the way where we have a three-dimensional roster, sorry, a three-dimensional vector data queue and sort of standardized them to standardized incidence rates. There is a little, there is more to do with, and I will briefly go there now, this is a link that you know, short while may not work anymore because we are kind of rewriting this book entirely, this is a book about spatial data science that I'm writing together with Roger Bivend and that has a couple of other examples with vector data cubes, one is where we have air quality stations that we look at and here you see the station number and here you see the time and you see the distribution of values and the intensity of the values were measured over the different stations and there we can compute means for stations and we can compute means by area and so on and we can use spatial slices meaning time series of them, but the more interesting case is here the one where we looked at origin destination data where we have for a set of regions, this is data from the UK that is actually open data for a set of regions, we have for every region we have how many people live there and work in every other region, so this is like a relationship we have an end-by-end relationship for every region, we know how many people go there, how many move there, how many move there, how many move there and so on, how many move to the city center and also their modes, whether they go there by car or by bus or by train or by bike or by foot I think, something like that, right here it is bicycle foot, car, driver and train, those are the four options, so this is about movement data and where people move and how they move and we can sort of do a couple of tricks with as you see here with tidy verse machinery and move this into an array that is what I'm going to do where I have one dimension the origins and the other the destinations, so I'm going to make this a full two by two outline which is a three-dimensional matrix with origin, destination and mode and you see here that we have the origins which are the geometries, we have the destinations which are also the geometries and we have the modes, so for bicycle, foot, car and train we have all for every combination of origin and destination we have the number of people going there and then there's a couple of manipulations, for instance here I do a slice for a particular destination where the destination is this 33 is the city center and then we can see where people come from and what transportation they take, so you can see that a lot of people taking the bike sort of don't come from so far by foot, they are really close and the people who come by car are come from much farther away and this is by slicing basically looking at the city center the destination 33 and then you could do that for every destination right, so you can do this here and you can do all kind of sums and here we have different ways of looking at what people, where people come from and where they go to so this is a way of dealing with that kind of data, now let me just go back to the thing where I was, yeah right so why do we do this, aren't data frames or tables, aren't they kind of good enough, well actually you can do everything with data frames and with tables, the thing is that it is a lot more organization because you have to deal with these different dimensions that are basically also attributes in your that are variables in your data frames and you basically you never know there's a couple of things that you really don't know whether all of the combinations of the dimension levels are actually filled because they might be empty and when they're when they're when they're cells empty that might mean that there's a zero like in your origin destination case or there is an NA and if these if these records are actually not present in your data set you have to make kind of implicit assumptions about that, so that is in a certain respect is a disadvantage, yeah also if you have very large data then this approach may be more efficient to get things because you don't have to search because your data are automatically indexed that your dimensions are your indexes and you have more than one index on your data basically space and time and other things that can that index your data that is not a case when you have a table or a data frame when you have a data table of course you can create these indexes and so on, so there might be reasons to do this or to not do this, right so you know there's a lot of people nowadays saying well we can if we have petabytes of data we just throw them in Google BigQuery GIS and so on and we don't look back and we just wait until the result is there which might also work, now we're moving to raster data cubes, so raster data cubes the simplest example of a raster data cube is essentially that of a single raster layer or a raster map right so anyone is probably familiar that in spatial data we have this dichotomy of things that are sort of completely free and where they are in two-dimensional space points lines polygons you can have you can have infinitely exact coordinates for pointing out where something is and on your other hand we have raster data where we have values sort of in a regular LA in a sort of an image form for certain for all pixels we have we have values but the pixels are basically laid out regularly and that is that is useful for certain things for instance for continuously varying things like interpolated surfaces or something like that or just if you observe images yeah if you take an image with your with your camera with your phone or if you retrieve a satellite image then it's created on or then it's available on a raster or a lot of modeling data for instance if you would look at weather data you could look at weather stations but a lot of weather data like reanalysis so basically output of weather models and that is that comes typically as raster data now raster data is not you know it's not a very simple thing here we read a raster file with from the package raster actually with the with the command read starts and it's reading it from this from this system so it's a it's a it's a demo data set that is available in this raster package maybe we get a summary of the attribute we have a lot of nays and we see that it's an 80 by 115 raster which has a description of its coordinate reference system and a couple of other things like it tells you where where the sort of where the raster map starts or where the left edge of the raster is what a cell size is and in this case what the top coordinate the y coordinate the top of the top of your map is and that you have a negative delta negative cell size which means that increasing rows means that y axis decreases right if you go down your rows increase this is how images are typically organized but the y coordinate sort of goes up when you go north right so that this this minus basically indicates there is a there is a a different direction there going on so here i plotted this map simply with this very this so you it's a it's an old sort of data set that is that has a long history raster files are actually data cubes yeah because we have two dimensions right and it happens that both dimensions are regularly discretized and both dimensions are basically cut up the space but it cut it in a very different way than the origin destination matrix because origin and destination both were geometries that span both dimensions every polygon in the origin and in the destination was two-dimensional and here this is a one-dimensional this basically says these are my columns yeah this is my x this goes over x coordinate this these are my rows this goes over y coordinate right so what we get then is for every combination of row and column or x and y we get a value vector data cubes have x and y space in every single dimension that is spatial right so this is very difficult they're very different in this example we have a regular grid that means that we have an offset of the delta so we have raster cells that have a constant cell size over each of the dimensions and it happens even that the cell size is identical they're both 40 so we have square cells yeah but that doesn't have to be always the case right so then we can if we have that if this is regular then with offset and delta we can basically compute coordinates from indexes i and also do the other way around if we if we want to i discuss the delta is for y is often negative and that we have reference system which basically tells you how we should understand these coordinate values when we sort of want to combine them with other spatial data yeah so it's a coordinate reference system i can move this into an an sf object simply by sort of telling them and i'm not going to remove all the missing values that are here because here you see here are missing values that are just not colored but this is a this is dense just to show you that this is a dense grid that everything is there i'm basically creating now um little square polygons so uh here you see the square polygons and i colored the border of either of each polygon in a gray color and i filled them with the same thing so i get a similar map but then all every cell is basically a little square polygon uh plot with with data of this size you can do that when you have a couple of thousand cells uh when you have like a couple of million cells you're not going to you're not going to do this yeah unless you want to sort of burn down your your own computer um quite often raster data cubes have more than one layer yeah where layer is kind of a concept that uh that comes a little bit from the raster package and it's it's questionable whether what it really means whether it means uh something in in something in a dimension or whether it means sort of different attributes and the raster package is not very explicit about that in any case if we read things with restarts if we read for instance a jpeg image then we all know that jpeg image are our image the standard thing are basically color images that contain three layers in rg and b and we get that here it's called band uh and we see that we have three different bands and we can plot them so we plot this three uh this uh 200 by 175 by three data cube and we get then three images of the last three bands yeah so we get the x y rasters for all of the three bands so you see here the the red the green and the blue intensities were not surprisingly they pretty much run from zero to 255 so this is bytes data and you can then for instance you can compute color values from these rgbs with a command sd rgb we discovered that that that creates color values uh as you can see that here uh which is not for some reason it's it's not uh plotted it's not shown here but we know the r logo um so this is a very inefficient way to handle this for larger datasets we just discovered so if you have you have us millions of pixels and you're going to do this with trying to do this with ggplot and it's going to take a long long time so this is not a very efficient way but anyway it was an experiment um we're trying we were learning some more serious data here is a dataset that comes with the packet stars that is a small subset of a lansat 7 etm uh satellite image that was collected uh near olina in brazil right so we are not so far from uh relatively not so far from latin america uh from from middle america from monterey as uh as i am uh so we read these things so this just it just calls the file name we'll read this thing and we get then it tells me okay you'll get a single attribute but you get and you get x and y so this is a 350 by 350 image and you get here six bands yeah this is typical for remote sensing data that we have multiple bands and these are the six bands that have 30 meter pixels yeah so we see here uh 30 nearly 30 nearly 30 um and we see that it's referenced in utm zone 25 south um and we see here that this is sort of these are bands that are ordered in wavelength so it starts with the blue and then the green and then the red and then we get some near infrared and middle infrared and and so on uh bandwidth right and we get this all these six bands here plotted and we get a common uh color scale where the uh color breaks are chosen as quantiles uh over these six bands yeah so we get a maximum stretch basically every color is seen over these six uh images it's kind of has a similar similar area that we get that you can you can compare this a little bit to uh high dynamical range what you do with uh with with digital cameras or even with phones today have these kind of things so it kind of it kind of increases it maximizes the contrast we see in in the image or there's a signal compared to if you would if you would use an equal color scale so we saw an equal color scale for instance here right this is these are equal color breaks uh that that was said and that is easy like this you say breaks is equal and you get equal color right if we don't see anything you get quantile color breaks uh we can also do this without joining the the z limit so the joining the the color values we can say don't join them and for each of the sub images kind of make this make this histogram stretching as it is called in in remote sensing and we see even more per color we see more but we can't see which uh which image which which image has more higher intensities or lower intensities that that goes that is lost now we can then also do RGB composites this is the red green blue so this is this is true color as we say and this is false color from 43432 that kind of indicates is used often because by red we see then when the where the green areas are so the red and false color images typically tells you where things are vegetated easier than than RGB composites yeah so we we can do these these color composites easily um transforming and warping rusters yeah that is another topic rusters are you know are regular but quite often they have to be uh combined with other data and and that other data might be in a different reference system or might be another ruster that that actually doesn't line up with the ruster that you have because it was a ruster in let it get long acute and not a new tm or something like that right so that is that we did something that we're looking now on we're taking a bounding box here over over europe computed in wgs 84 so in in geographical coordinates as you can see here and spends a certain region and then we basically create a regular grid with a cell size of one one degree does it does it mean um and and that that is kind of that kind of covers europe or part of europe and then i looked at the data from the natural our natural earth that's a natural earth data set and took the natural earth countries and selected one uh value the populate estimated population uh for uh for a certain country and i divided that by the uh by the area of the country and i rescaled that to have density one over square one per square kilometer so we get reasonable sort of nicer values than the default is one per square meter which is which is very low and then i plot this this population density map uh over sort of over the original data so we hear in the background we see uh the the natural earth data set as it was and here you can see that i plotted sort of the the grid that i just created i plotted on top of that and you see some kind of connections that these colors sort of continue here right so this is really a rasterization of uh as you can see here a rasterization of my natural earth data set where i take take the population densities uh out of the polygons and i assign them to uh to uh to the raster cells here we also see that the raster cells are not square they are essentially they are square in the sense that they are one degree by one degree but the plotting routine uh takes sort of takes a scaling such that we get one kilometer north equals one kilometer west in the center of the of the area which means that the the raster cells are not square anymore right because one degree latitude uh varies very much depending whether you're an equator or whether you're closer to the poles one degree longitude is always like 111 kilometer so here we took that aspect ratio so that it is scaled through in the center of the center of the map that is a default that that this plot routine actually takes and if we then add something then it's just it uses the same plotting parameters um right so what i did now is i was going to transform these data to another code and reference system this is in basically in uh what we call equirrecting equi oh how is this called it's it's the basically the projection where you just say well this is latitude this is longitude and i'm scaling them in a simple way with an with an aspect ratio equirrectangular equirrectangular i think it's called right so it is the the plat carrier the the two squares of the earth and when you do it for the globe i'm now going to transform that into a lumbered lumbered equal area projection yeah so that has the equal area uh property yeah that that is often used so that countries sort of it deforms things but countries sort of are relative to each other have sort of realistic sizes yeah so that the size of countries corresponds to their uh to their area size and that is often used for political maps so i'm transforming this uh this grid here now this is the the grid the the population grid that that are rasterized from the natural earth vector data sets so this this grid and if i do that and i plot it in this new uh in this new uh coordinates you see that it looks like this yeah so it doesn't look anymore like a regular raster right because our rasters are not anymore lined up horizontally and vertically now well and you know somewhere they are but most of the areas they are not they're kind of skewed and their directions differ yeah so this is what uh what st transform does with a raster if you just say move it somewhere else then it doesn't re-compute grid cells it doesn't say i'm going to re-compute a raster in a new coordinate reference system no it's just going to say okay this is regular right now it is no longer regular it's what i call curvilinear right in curvilinear grids have essentially uh x and y coordinates right registered for every raster cell right for every raster cell center and i'm going to compute kind of local uh little squares around them to uh to plot them so this is now transformed but none of the raster cell values have been changed by this transformation yeah so these cell values are still identical to what they were um and that is uh that is something that you can do so that is also a type of a raster but that is not the type of raster you can do with a raster package or that you typically get but sometimes you actually have to download data that come in this curvilinear form actually a lot of lot of modeling data comes like this the raw sentinel 5p air quality data comes like curvilinear grids um and so on yeah so so you have this and you can handle this and what you then can do is if you want to sort of have this in a in a regular sort of grid in this new coordinate reference systems in the lumber equal area coordinate reference system you can use uh st underscore warp to basically uh warp this population grid this curvilinear population grid to a target grid which is uh regular because sts stars from some bounding box in the new uh coordinate reference system makes a regular grid and then we warp things and we get them sort of warped uh into this new grid and here we can see that the warped values that are essentially in a regular grid and right you still see the blockiness yeah so it is warping really sort of uh you we warp from a very coarse grid into a much finer grid we haven't we haven't set the cell sizes here but we basically have a 250 by 250 map here where here we had a 30 by 20 yeah so we get sort of a factor 10 uh increase of uh uh of cell size densities and that's why we see these this blockiness still from the alt a map that is burned in i wouldn't say that this is a good good way to do i mean it looks terrible of course but it's just to demonstrate that we have grids that we can transform that then becomes curvilinear grids and then we can sort of burn them into new grids by this st warp thing now this is all about uh rosters and coordinate reference systems and moved rosters and so on uh what we also have is uh roster data cubes where we have time and multiple attributes yeah so multiple attributes here is a little net in the net cdf data set that also comes with the stars package that we now want to read in uh that gives you two variables the pr and the tas i think that pr stands for precipitation and tas stands for temperature above surface or something like that they come with measurement units because they're net cdf files and the net cdf people uh believe that it's useful that if you if you sort of communicate a variable to somebody else that you tell in which measurement units uh that was held and we're reading that and we're passing that on and we're propagating these measurement units we're trying to do that this is wrong this has to be degrees celsius yeah but it says c so a lot of measurement units are that you find are actually wrong this is a rather small grid uh of 81 columns 33 rows and 12 time instances here you can see that the time instance of 06ct and you can see that they run uh over the whole year of uh 1999 it it seems yeah and uh here we can plot this thing and then we get basically plotted the first uh attribute so it will ignore the second attribute for now it just plots the first attribute and gives a time series of that and again uses these quantiles to kind of stretch the colors over the 12 different instances yeah so this gives a quick overview of time variability i could here also select the second attribute to plot the second attribute the temperature instead of the precipitation another case that we load here is a data set that is uh read from the star's data so uh you know if you have large example data and you put that in your package and you submit it to crown then within no time you get a lot of you know a lot of uh comments from the from the extremely good crown people and saying hey listen your package is larger than a five megabyte bring it down to under five megabyte right and then you know you do a lot of talking doesn't do it so that is why I have this this package just to to throw a little bit larger data and try things out with them this is actually AVHRR so that is uh I think that is a measurement device run by NOAA or something like that and that is distributed globally and daily yeah so uh so I think it is NOAA that creates daily data sets here are nine instances for nine days uh downloaded and uh basically they put every day they put a new net cdf file so if you want to download this data you end up having daily net cdf files and you have to come you have to put them together right so I'm here just making this uh making this file list which is a concatenation of the file names and the directory name and I'm going to read this and read this file list and what read stars then tries to figure out is aha it says I have all these files have the same attributes they all have units which are again wrong there is a star that shouldn't be there and you can see that it's a four-dimensional data cube and that by reading this we have now um merged these nine files into a single dimension that is the time dimension and we have nine instances and uh they're running over uh basically uh with the delta of one day and they're imposing CT so they're they're they're red and they're basically joined over their time dimension because it's a common thing that you have uh that you have sort of time variable over different files the alternative would be that Noah would sort of rewrite this same net cdf file with the entire array but then the file gets bigger and bigger and bigger and bigger and bigger and you you're never going to download a file that's so big and you're also never going to to to to to to write it properly in a realistic time so this is a way of increasing data sets you also see that there's a z level which is the elevation of these observations and that is zero meters so that means c level uh and it's it's an obsolete dimension because we don't have variability it goes from one to one uh and and so this is a four-dimensional data cube but the third dimension has only cardinality one so that doesn't help much uh we can drop that with a drop which basically is a drop works on arrays two which throws out sort of obsolete dimensions and then we see that we get rid of it yeah we also got rid of the information that this was taken at zero level uh but that is that is okay because we just want to plot these time series of these nine things and we do this plot with ggplot2 just as a demonstration and ggplot2 uses gmstart and gmstart is just a thin wrapper about gmruster i think so that doesn't do much what it does do is down sampling the data so to get sort of we'll really sort of relatively quick data we want to down sample this with factor two because we have here 1400 by 720 pixels in a single image we're going to plot nine images of those right if i was going to plot my full screen this would be useful to to be to work as a resolution here but if i do it in a small part of my screen as you see here and then i have seven uh nine subnaps then you can see i get these subnaps which are basically well what would it be like uh maybe 300 you know 200 pixels by 400 pixels or something like that this is what we see yeah so i'm going to down sample here uh pretty heavily uh and basically make an ggplot of a reduced data set uh at a lower resolution that you can see here but you don't see it really that this is at a lower resolution it is still fine because you see the patterns in other ways you would plot all the pixels and it would just sort of take a long time but not give you more information than pixels you can see on your screen so there was a bit of an optimization that i did that i did here is the is the down sampling um other options other operations that you could do on data cubes is the is the subsetting we can do that by the square bracket for instance and the first square bracket basically uh takes uh takes the attribute right so we here select the second attribute we had these four attributes like sea surface temperature anomaly error and ice coverage percentage and we take the second one here yeah this is the star's object is a very simple uh object it's basically a list with arrays and an attribute that has a dimension table in it yeah so it's very transparent in seeing how it works and this basically takes the second list element but keeps the dimension because that doesn't change right so we have a single array now we can also use the second uh the second and the third um sort of um aspects of the square bracket selector and that selects x values and y values right so we have here 10 rows selected we have 12 sorry 10 columns selected in 12 rows selected right and we can also take one elevation which we already had or we can take three time instances time three to five and we see that we have only three the three to five time instances yeah the offset the offset doesn't change um right another way is to um I don't know why this is not plotted another way is if you have a global one degree grid is to sort of subset the values that correspond to some kind of some kind of geometry which is sort of given here so you get basically it it sets the all values to n a that are not sort of uh course that are not intersecting with this natural earth geometry and so that is another way uh cropping can be done you can crop areas means that you that you make you make them smaller and and and things outside of a mask are getting n8 there's a couple of tidy verbs that you can use a slice and filter are both for for the mentioned value for slice uses their index just as that they did here this is also slicing and filter looks at dimension values so you would get coordinates or dates in the filter argument pull pulls out an attribute select selects one or more variables and might create new ones mutate create new ones there is an aggregate and an extract values so aggregate can aggregate over areas or over time periods and extract extracts um data cube values at at point locations there's ways you can go from raster to vector and vector to raster so rasterize as we did above rasterizes values and sts preserve vectorizes them yeah so you can either that's and we have also done that but it can also merge raster cells that have sort of the same attribute value into sort of polygons that that that include more than one raster cell um the final thing that i wanted to um that i wanted to to put out uh to to sort of make you aware of is that is the idea of proxy objects and lazy reading and lazy computing um and that is sort of cost me a lot of headaches how to do that the thing is that with image data pretty soon you end up with looking at amounts of data that no longer sort of uh that that would no longer fit in in your main memory of your computer yeah and even they might not fit on your hard drive yeah so if you would look at um all the sentinel 2 satellite imagery collected over Mexico over the last uh four years then you have you're talking about data sets that probably don't won't fit on on the hard drive you have yeah so so that is a problem and even one of these tiles so we have a tile here available in the stars data package is something that you have you knew you need to have a beef you laptop to sort of load this whole thing in your in your in your main memory and loading things in main memory is what r tries to do if you don't tell it differently right so r is not you know hasn't been written to to sort of officially use your your hard drive as a buffer when things don't fit in memory no it will try to read things in memory so we had to to move around that basically make things uh workable for larger uh for larger raster files just like the raster package also does that for you but uh here we took a very different approach basically by by using proxy objects and lazy evaluation right so here i'm going to define a data set that has a very long name if you if you look at the data set uh and that is because uh because a lot of metadata is typically encoded in file names for uh for satellite imagery which is absolutely absolutely absurd but the people do that right anyway with this long file name here um collected with uh with sort of tell here it tell which driver it is that it's a zip file where it is comes from here and then what the what the tile name is is basically this one and then it says you need to read this little thing uh to find out what what what means what what's what in this zip file so it's it's going to read this this is not you know very little what stars does is really gdal what gdal does uh but it's not going to read this whole thing it says no i give you back a star's proxy object um with uh with one attribute and it's held in this file it doesn't even show the full filing because it is way too long but it reads the dimension table right and you see here this is an 11k by 11k image right so you need kind of you need uh if i wanted to look at this entire image i needed 100 screens here in my room just to see every pixel and the pixel of my screen right this is kind of the detail it has this is 10 meter resolution which is very high but also a very large data set it's 100 million pixels and then it has four bands this is the the four 10 meter bands sentinel 2 has like 15 bands but there are only four of them are 10 meter resolution and they are in utm zone 23 this is an area nearby that i selected um and now i want to sort of look at this data and you can see the bands have names band four five six and eight i think uh and what i want to do is now look at ndvi values ndvi is the vegetation index which looks at a normalized difference vegetation index which which divides the difference over the sum this is a normalization of near-infrared and sorry near-infrared and red and by that it gives you a good indicator for how much vegetation somewhere is so i'm going to apply this and when i apply this to this data set this proxy data set which is here called p yeah and i say do this for all pixels meaning do this over the band dimension yeah because they are here are the red band and the near-infrared band um do this for every band so so do this for every pixel over the band it gives me a new data set back that has done nothing because you see you would expect that the band dimension would have been reduced it has done nothing it has just sort of glued the homework that is still to do uh to the object right so this is uh this is this is really cheating it says well you know you just want an object that can do this but it doesn't actually do this if it would do this it would take 10 minutes yes to do that and then to compute it and then probably 10 minutes to write this to to your hard drive this takes like uh you know a fraction of a second because it doesn't do anything it just sort of says okay this is what i have to do if i then plot it like here this takes i don't know but it takes a few seconds or something like that i don't know if anyone if you've tried this but what it does uh when i ask the software to plot this then it said okay i have a data set i have something to do and then it says uh how many pixels do i have well this is my plotting region uh so this is like 500 by 500 pixels okay uh how many pixels do i have 11 000 by 11 000 so i cannot plot all of them right so what it then does is says well let's look at what we are going to do we have an sd apply that runs on a certain margin well if this margin concerns you know includes my x and y coordinates then i'm going to do something only on this band dimension so whatever i do for one pixel or a next pixel it's not going one thing is not going to affect the other this is just running on the band dimension so i can compute all these 11 000 by 11 000 100 million pixels but then i'm going to throw them all away and just plot these 300 by 300 because that is all you can see yeah so that's all it's going to plot so what it does it really reorders the thing it sort of analyzes the homework uh then it says okay i have a plot starts proxy i'm going to look at uh is this pixel wise yes then i can do it if more efficient i can read pixels at screen resolution then apply this ndvi meaning 300 by 300 then i can apply this ndvi function to all these pixels and then i can compute ndvi for the pixels red and then it plot yeah and instead of taking 10 minutes what it would take to do this plot for the entire data set it will do this in a matter of seconds essentially because it doesn't have to go through the whole data set but takes a couple of uh a couple of uh overviews so lower resolution uh sort of pyramid representations of this which works much much faster yeah but this is a pattern that anyone who's ever worked google earth engine also does so you you have the feeling that you work with a full data set but google earth knows that you're only looking at a little part of your screen you have so many pixels and that's going to compute the pixel values that you can see and then if you zoom in it recomputes and re does that for the zoomed in area yeah so you can see it well while it does that with with building up the tiles and so on and now we can we cannot zoom in into our plots but we are actually working together with with them apple hunts on the in the map view package to to make that same pattern uh work uh basically by when when the recomputing then happens in the in the browser so that if you do something that it only does that for the screen resolution if you zoom in that it redos it for the for the zoomed resolution yeah you have certain functions like like in ddi band so that brings me to the end of to the end of this of to the end of this presentation if there are certain things that anyone wants me to to try out with this data or something like that or if you want to try things out yourself and you have questions uh or or get stuck or things don't work or unexpected or something like that then don't hesitate to uh to contact me to contact us either on the rcg o mailing list or as a github issue or you know you can you can try you can try direct email i usually answer them you know try to make this some kind of thing where other people in the community also could benefit from the answer or other people could also look at whether they have answers so that not every not all the questions come to me because that doesn't scale up very well uh but please don't hesitate in in getting involved in in in discussing you know the development of the software in the the properties it has in the documentation these kind of things um that was actually all i wanted to to present to you about about the analysis of about what's what what data cubes actually are how do you relate to vector and raster data the way we can analyze them and a number of very sort of simple uh but you know sort of very simple and effect somewhat somewhat effective but rather primitive visualizations that i've been that i've been working on um and uh that was my i can see that there isn't there's a chat there's a couple of couple of questions in the chat i can see yeah that's it's one from adrian cajeros if i wanted am i able to run something like sonal statistics like zonal statistics um yeah you can do that zonal statistics is the is where you compute it's a raster operation it's it's the raster operation where you want to compute properties of a set of raster cells in a certain area where the area is given yeah so it really it's it's basically what i call going from raster to vector right you want from a raster image or a raster stack or a raster data cube you want for each of a set of regions you want to get some summary for instance the mean or the maximum and that is in the stars package that is available in in the command aggregates so there's an aggregate command uh where you could let me just uh let me just sort of share my screen here is stars there's an aggregate stars that especially aggregates stars spatially or temporarily aggregate star object so you throw in a star object and you throw in a set of polygons and you say what kind of function do i need to apply to summarize the raster cells under the polygon for each of the polygons yeah so it returns an object that has a data cube that has instead of the raster dimensions has now a simple feature it has a polygon dimension and the rest stays the same so if it's a temporal cube it's still a temporal cube and uh and does that yeah so here a set of examples um this is an example where uh oh it gives it gives some the warnings here but it is an example where i basically did one uh accumulated uh precipitation over over an area you know where the input data set was i think was these the input data set was i think a set of a set of let me just try this uh now i'm moving to oh hang on i have to move now to the so we're going now this is going to get embarrassing see my primitive abilities in our studio so i just did this and i read these uh these precipitation files so we have this precipitation data cube which is the i hate all the completion so this is a vector data cube that is now appearing here it's a time series of curvilinear grids with the precipitations um i think it is much easier if i show you this because it is also on the this is also on the um i think this is also on the starting page of the stars packet so you can see that here you see here the same time series that i was just trying to plot with curvilinear grids over north carolina and that is basically here burned in into these uh county values with some with some function that takes the maximum precipitation for each county uh and i get a time series or a vector data cube and here it's aggregated over the time dimension uh to get the to get actually the time of the maximum precipitation back and so i i did kind of like which time uh was the precipitation in this county was the maximum precipitation at its maximum yeah because it's maximized over the raster cells in each county uh so yes that is a yes so you can do uh you can do that there's another question about mark weaver uh you end with the star's proxy objects and made compression with air engine do you have any tips thoughts on limits with size of data analysis with the stars saying in comparison to tools like earth engine yeah so we are uh we're looking at that um but not so yeah same comparison with tools like earth and so earth engine it has a data set that uh i think last time i heard it was a couple of years back was it was in the order of like 30 petabyte or so and so it's probably already doubled or quadrupled or whatever so this is an astronomical size if you think of that amount of data you have to really think of uh you know of a building a building full with wrecks and hard drives and something like that um so that is that is i mean that's the the amount of data that is relatively difficult to grasp how to how to handle that so uh those kind of data you can forget the whole idea of downloading right because you don't have drives here even if you had the drives available for doing that the network bandwidth it would take to download such amounts of data would be kind of would take possibly years to do to get your data there yeah so this is some these these are things that you can only realistically uh do ever in the central place if a lot of people want to do that so for instance the Google cloud for instance the Amazon cloud have these these large these complete data archives uh and a number of other clouds uh that are people are trying in Europe are also trying to to do this this is of course this is partly a bright thing um we are looking at you know uh because earth engine is a fantastic product the thing is that it is not open source so you don't know exactly what's going on the thing is that it's not extendable in the sense that you cannot run your own R script with your time series model on Google earth engine data because it doesn't allow you to do that so if you wanted to do something like that but on the scale of of data that's available in Google earth engine you couldn't and you would have to look at other um at other approaches to doing that and probably have work with clouds where you have more control over access and processing the data uh we're I mean still sharing this thing yeah we're we're running a project that's called OpenEo already for a while where we're sort of slowly building uh let's say infrastructure infrastructure meaning compute backends a proper API for abstracting the functions and abstracting your data sets in the sense of working with image collections and uh an ecosystem of clients like notebooks python uh r clients java script quantum gis plugins and so and whole ecosystems of clients of using that uh to do something that that probably uh you know that that earth engine does does much better because you know because we we started only a couple of years ago and earth engine started you know 10 years ago and earth engine had a budget of let's say 20 times that of hours yeah so so there is and so we are working in the direction it is just that we haven't we haven't really finished that uh but it's certainly something that is worth exploring but um the yeah so that is and also in the context of stars we are we are using stars for instance to um if you would if you want to have an infrastructure like that where your big data archive is in the cloud and you want to do your r functions uh on them and you want to do something spatial temporal then you probably want to work with stars objects because they are very simple and comprehensive uh and and and you could do that once some kind of the cloud infrastructure backend basically does the looping does sort of goes through all you know creates all these little uh these smaller uh data spatial temple data cubes and and sort of throws the stars infrastructure or whatever it is uh to them and that would sort of give a lot of uh an enormous amount of flexibility that the earth engine right now doesn't have yeah it is it is earth engine is I think relatively poor uh for time series analysis uh compared to what you could do uh with r in terms of time series analysis um yeah so I hope yeah uh I think we have uh last question uh from Alexander Quevedo that's if I'm interested in parallelizing a function for example the calculation of n dpi what is the best strategy using the stars uh I think that I think I have to look it up but um I'm still sharing the screen right so uh right so where were we uh here we that we I did that with uh sd apply and sd apply I think as a couple of arguments that basically point you to using uh using parallelized approaches to doing that because apply is sort of embarrassingly parallel sd apply is that too uh so it lets you sort of if you pass it a cluster then it will use parallel apply if you say use future then it will use future that apply call it on future underscore apply so it has it sort of it connects to the obvious are ways of parallelizing things whether this will help much depends also a little bit on on the uh uh on the complexity of this function yeah so if you have an expensive time series function that really takes some time to do something then that will make sense to parallelize it this way if you have something as trivial as n dpi I'm I'm not sure how much that will bring because you know parallelization always has a cost and a benefit right and the more things are big and independent the more the benefit is but we quite often see that things are relatively small and independent and then uh you use suddenly 10 cores but you also have to do a lot of overhead because they're all these tiny things that you move around so it's not it's not like uh these are simple implementations and that is this might work you know might give you an enormous benefit it also might might be sort of a surprisingly small benefit with you that you get so this is hard to to predict in advance how much this will this will bring you but that sort of simple approaches are there definitely okay thank you so much so I just want to thank et cetera for your time your amazing presentation and all the materials that I that it is available on the links that I sent you guys uh yesterday and this will finish the first part of this presentation and the second part will be presented by martin and so I led you with martin okay hello everyone so uh my name is martin yes that's you know we pronounce that martin but martin it's also fine um sorry um yeah so I work at statistic Netherlands so the official statistical institute so that we make um like statistics from survey data admin data and also big data and I work at methods department uh also big data and your visualization spatial data all that kind of things um so what I want to do let me share my screen yeah um so one to go through uh a couple of examples that that's showed and go a little bit into more depth uh on the visualization of them and well probably you know what I have written a package called team map and I uh use that of course um and team map is like a pro project for like a couple of years um like started from like basic plotting functions and then expanding expanding expanding and currently we're in a phase that that's I released version 2 3.2 I just uh Wednesday and so there are two big things which will be uh happening in like within the next year so the first one is we are currently uh writing a book on team map finished I think one chapter one and a half chapter but it's it's uh it's like a site project next to my current job and all the other things I do so it can take a while and also I'm thinking about um making it a little bit more generic and going into like 4.0 uh but I'll yeah that's just future things um so I'll start with a very short introduction team map like um and use some some spatial objects and then I'll focus uh on the the stars objects and how you can visualize them by through the best team map so okay um yeah so if you want to run this code yourself uh make sure you you run uh the latest version so 3.2 it's the same version as the current GitHub version so it didn't make any comments commit last two days so there are some data sets that you can use uh that are like within within the team map package so there is for instance world country polygons and also land so yeah I think I'll skip ahead and one metro so let me focus on these three uh objects um so when you want to plot them so let me let me load SF and stars as well so when we're going to plot them we can do plot world and what you'll see is um all the the attributes all the columns in the data so world is an SF object with all these columns and each of these columns gets an old map with team map when you do a quick plot it's qtm which stands for quick thematic map by default I don't show all the variables I just plot the map as it is um and you can um the depth completion you see shape of the world so we are team map it's like a vocabulary that I use shape shp I use for spatial objects in general so it can be SF objects cluster objects stars objects as well um and then you see all the aesthetics so it's if you're uh familiar with the plot then these are like the uh AES function aesthetics so for instance if I fill and I use like a regular uh so yeah so just specify the variable names within double quotes they'll get like a corp left of footprint and that's like a way to show that and if you want small multiples so for instance you have the planet index well do this um um so I'll just yeah okay so first I'll just introduce the spatial objects so land is stars object uh with four attributes and like two dimensions x and y so you can think about these yeah as layers I mean if you uh in team map two point something I was using roster and then that this was a roster brick which contains all these four uh four different layers uh but I think really nice thing about stars is that you just look like what's et cetera called and you distinguish between attributes and dimensions meaning if you have time it's typically a dimension you can have time as attributes uh like 1920, 1930, 1940, et cetera but it makes much more sense to do it as a dimension than to do it as a time dimension whereas these things cover as it's like uh this is land cover land cover class trees cover elevation they're like very very different things so it makes much more sense to model them as so if I do a QTM land uh then I got some things that I'll explain later on so what you already see is that it's down sampled I think it's because I was working on small multiples using a very large stars data and I changed the settings the down sample settings I'll come back to that later so but there's like function team map options uh which is a list of of all the options um within team map and you can set them you can reset them so I just reset them and if I rerun the same example then it's isn't down samples because uh land as you've seen it's not very large it's about 1,500 pixels and then for all right so the other okay so the other message that you've seen is sure this one should should not be any better anyways uh there is uh elevation contains also definitely negative values so therefore it's interpreted as a diverging color scheme I think yeah I'll come back to that later uh okay so so just returning to the one the focus is right now is like to show you all the spatial objects so and for so for polygons the default is just to show the polygons in gray and for for stars the default is to show all the attributes or if there is one attribute and there is a third dimension for instance time then to show small multiples for each time step uh so the other data set is called metro it's uh yeah spatial points metropolitan areas it's a very simple as an object cities ISO 80 code of the country and population this star in this case so this this data sets can probably return to a star star status because it has like a time time dimension okay so but for for like theme um yeah so before I'll dive into the popping mechanism itself uh there is there are two modes of plot of view so by default it starts with plot mode but you can change it to theme and mode view and likewise plot but I'm a lazy person myself so I'll just invented this tpm it isn't used anywhere else in the argument so I just decided to use that to toggle between the two modes so if I do qtm world I get an interactive map of the world and if I do qtm lamps then I get an interactive map of all the rosters uh let me switch to the browser so there you see a star subject uh and this looks very fancy but it's uh I didn't do very hard work in implementing it it's basically made possible by leaflets or leaflets and maps you so uh big thanks to um so again you can always uh do tpm to switch back and then and then if it plots then you get uh this set of plot and the same so if I want to show the world cities metro qtm metro and it's very handy if you click on one one unit special unit then you see all the variables and by default it's also that you also see the let's do the moment because often if you want interactive mapping you often want to see the coordinates and you can also if you run it without any uh then you just see a basemap you can use the layers as well and there's this theme well for instance by default it shows three base layers and the resource team of options basemap basemaps sorry basemaps which is a factor of those those layers so I can also set it as you do this and if I now run qtm then I see just that open street map basically uh okay so um one more thing about qtm what you can do something like this so hopefully yeah I'm in the field mode um oh sorry sorry uh let me say yeah so we decode simultaneously so I now stacked two plots so the first one is the star subject but land cover and three cover elevation and the second one are country polygons I only show the borders so feel sort of feel aesthetic except just know and that's of course very useful and under the hood um it does transform or warp the objects that are in a different map projection different cms so in this case I do sd crs land it's it's just uh the we w gs 84 so plot correct and if I do sd crs world then you see different reaction and the akert four projection why do I use akert four projection uh because as statistician yeah I'm statistician this statistician um the equal area properties are important so this map correctly shows that Australia three times as large as green and not the other way around if you use different crs's in the same plot what it does it it uses the crs first object by default and you can change it um I think I'll show this later on this yeah yeah you can change it with qtm but it's not recommended um so but by default what it does it just reproducts transforms basically the polygons into w8 gs 84 okay so if I want to replicate or so qtm is very handy for like quick plots as like like uh there's then a plot function that except that's written a couple of times and then gg plots there's a q plot which is kind of similar but uh the main plotting methods go like this so we have to shake worlds plus again it's it's very similar is to the plot so I why did I did it this way because I like that gg plot and those things I didn't like from uh gg plot we did it a little better or better uh in a sense that it works better for maps I hope and if not let me know um so the polygons uh so this is like a basic corrupt method in which you have you want to specify uh the shape of yet state world and then I specified polygons draw polygons so if I leave it out you see like a great polygons the default color but if I enter this specify this this static variable there's only one for the polygons which is just still still called and you see a whole bunch of also look at the documentation see a whole bunch of variables as you can set for instance let me show you them um style so by default it was a pretty skill that you can also use k means clustering which basically is one dimension of k means um which creates in this case five groups such that each group has same amount of values in the data um yeah we can tweak like this so eight groups uh so I should just show you just quickly the different uh different parameters so pallets so blue skill okay um all right so in the end so of course if you want to join me with running this code you're free to type in yourself if you're a little bit lazy like me then you can wait until I share this code code I'll just put it on my my workshop repo so that's this one a little bit faster full size uh so it's this address so I've created two two more our markdown files that I'll show later on and I'll upload this script as well because what I often like to do in this kind of workshops is to show you uh different things and every time it's a little bit different so I can yeah so I'll just just just upload this one uh okay so if I want to read reduce this map then uh first set you shape dance that's the roster and there's so one uh side note so for plotting stars object I use yeah roster images so basically pixel images for rectangle for regular wasps for irregular uh I convert that onto the hood to uh an sfo yeah and if yeah okay so first yeah so and then I'll just show the borders so that's function borders which just show the borders without fill so they have the same map in um yeah so about the the use projection so in interactive maps the projection is always uh w gs 84 four three two six projection um simply because there are base maps available um and it's possible currently to use other projections rejections in rapidly for polygons however for roster objects stars object is not possible yet so I have to work wait for for leaflets to make it possible then I'll implement the map obviously uh but if I go to static mode then because the lens is the first uh shape object so it's the first layer it takes the w a uh ps 84 projection if I want to use world projection the the accurate for operation that I use for this world shape rocket then like I said s master is true and then it automatically transforms the roster object into uh the as I felt the interpretation of the shoes here and by the way not sure why why it's can be a bug can be so I have to look look into that but everywhere else looks okay looks like yeah the the warping was successful but not for the apologies for that okay so any questions so far I think there is no question okay okay um so again there's there's on github so if you have any questions please ask me but if preferably if I add the issuance plus I got a lot of emails and then those got lost so when I when I go to stars for data cubes it's sorry so you can so within the workshop page I just rendered the the rd files to markdown files so you can if you click on a vector and school cubes dots and d you see the markdown I just uh I show it into our studio okay so the first thing that I will want to show sedation I saw the North Carolina example so okay yeah um the stars object with so with one attribute population one dimension that contains the polygons and the other dimension contains years and this was one of the things that I thought okay this they should be implemented in t-map directly so unfortunately it can which ideally you want to do like this tm shape shape object plus tm polygons population plus tm facets tm facets it works the same way let me show it with the continent so this shows you the facets per continent the hp so I ideally want similar function for this so tm shape polygons population that's it's by year uh it will be implemented next next non-minor portion uh work around for now to show that it's to convert into an sf object so first I'll extract the years all right so yeah okay um you can so this one if you want the values for one of these dimension there's these functions sd gets dimension values and all you have to do is specify the stars object and the dimensions so the first dimension you get geometry set so the geometries the second dimension is year so you get the years so if you uh you can convert a star object into an sf object with sd ss sf years and geometry column is called sfc and there you can plot it as follows so tm shape plus space shape object and then what what do you want to plot polygons the variable years so this is the aesthetic for fill and in this case there are two two values and that's when you get two small multiples uh this is title sf r yeah I have to explain this right here so there is this um r doesn't know because I think it's it's related to to the discussion distinction between attributes and dimensions if you see this sf object tm r doesn't know if these ones are uh can be should be treated as as attributes but totally different things levels and origins or if they're like uh values in one dimension which is not so by default to be safe I assume they're totally different things so if you run just this title uh you get the legend to the title the legend calculated differently for each facet uh so it takes the minimum maximum value it determines what kind of color scale should be used in this case sequential color scale but you see that the values are different maximum value is below 3.5 so because we want to compare these these images you want to set we want scales and spots and if you do this just these two lines you're almost there but then you don't have panel titles so we have to assign panel labels here so that's okay all right so about the color palettes so by default it uses uh color palettes for color ruler uh there is there are a couple of packages that facilitates nice color palettes uh so team up tools it's like packages that contain so kind of yeah handy functions I think the handiest of all is palette explorer this I used a lot it's similar to to the one that's included in it's just similar to uh to the one that's color brewer has but then it's in shine so what you see is uh the brewer sequential palettes, brewer categorical palettes, brewer diverging palettes and there it is it is also what you used for model and also a couple of sequential palettes and so sequential you use for um yeah like a value range that's either positive or negative diverging you typically use values can be positive or negative or what that's showed if you want values uh below one and above one one is sort of midpoint they can use a midpoint to be this color and that values that are less than one so I'm creating this side uh what's so you can set the number of colors that you want to have which you also can do you can simulate color blighters so there are I'm not an expert I'm not a medical expert but there's like this these uh uh I think this is the most common one red distinction you can make it then you see that how people with color blights perceive colors and the bold printed names are the ones that should be should be okay yes preferred to use these ones and it's also the color the code generator so you can use this function to get the colors you can also get them by using our color brewer package there's also the the night bells the backup so anyway so couple of ways in how to achieve how to get those uh let's call them so bells is one package that also contains these all these these colors probably if you call brewer that's greens the specified number so it goes and it's yeah it's as many as you know I think on the internet there are ways how to show all these all these colors but for for team map I think you can so for team map you can just enter the name of the pellets all these pellets are embedded and the number of colors so in this case for one pellet maybe like 10 color glasses of purples all right and I think if you do a minus sign take the reverse pellet okay um question so far no I don't see anyone so anyone have a question okay no okay okay um so this one so roster cubes so the landsat seven image lc this plot seven as this one was six bands sorted by way plan if I was sure want to show it in team map then I can use dm shape plus dm roster and again you get all these options so pellets is magma from the viridis package and so yeah so again by default because I didn't specify any aesthetic and l seven is a star subject with one attribute and third dimension it shows by default the all dd bands in small multiples like this what I noticed is that most values are between you want to improve the contrast a little bit so what you can do with style so again so these are pretty yeah I call this pretty it's I think it's also called by uh buddy in the line so if they do pretty zero so this was to choose unrehood uh and a second so these are the color glasses but if you want to use k-means then you just have to enter style is k-means and they already see much more contrast yeah so this was the old one is the new one um it's also dm roster it plots values as if they were data if you want to let team have know that this actually red green blue bands then you say dm RGB and then you can find realistic colors but again it's too dark because the values are between 50 and 100 rest maximum value is 245 you can also see it's instagram so again l seven star subject l seven the first attribute then you get data values so again it's an array three dimensional array and if you make an instagram of that so you flatten the one one single factor make an instagram of that you see almost normal Gaussian distribution with one peak here so what I did is um so because our team RGB it plots the data as is you can only set maximum value um so I think there was there was a feature request to do that with specify dynamics but I'm not yeah to know how yet but for this one I um I just truncated the values between 25 and 125 so basically what I did is an sd apply on third dimension so per band I use pd max with the other third other value being 25 so p max 25 it gives me a truncated value so the value is at least 25 and as I just said for men 150 so it may just change all these values to 125 out to 150 in this case that's 150 so yeah it goes from 25 to 150 and then I subtracted 25 so I shifted this range to zero 125 and if I show it now so this was the the new image and this is the old image oh yeah by the way if you have just two images so I know I'm correcting myself and there is this function let me write it in if you want to show two plots next to each other other than like small multiples because these are different they're just different plots they can wrap them around in that range I think this object second item yeah so they have to uh so this one is the old one and this one is the one and it's also possible false color image so the one that that's a shot and so now so that uh this is red we do and if you use for we do then it's infrared and this is yeah often used for with observation data to show a little bit better the differences between that uses so when we look at why we look at weather data so this one precipitation and temperature then you see it's two attributes three dimensions so it's y time so let's just go and if you map it with theme map so yeah here's a little bit description about this data so to show that I already explained uh to apply it here so I just show one attribute so the station so if I okay so if I use double records I get the values so the array and if I use one pair of records then I get the stars object but only that attribute I think there should be like deep liner methods like slice as well filter background things but I don't use this one okay so if I want to show it then I have specified the shape specified the in roster the title because yeah so if I don't do it yeah yeah then you see from PR are specified the pallets and the style so in this case don't continuous and I plot the North Carolina borders so you can also do it instructively so again TTM local team ethnic map into view mode and we run this code and then you see small multiples of presentation I mean the legend is there you cannot you can just show you disable the legend legend show set the rules still let's see let's see the title to get to that so what you can do is original map okay yeah okay so the original method of panels panel labels and somehow they got lost in the intractable but anyway if you should look at static map you see that so again so yeah with style yeah so if you use k means that I used earlier the the advantage of discrete color pallets is that you can read the values from map so you can say okay this is the middle blue here but the disadvantage is that you so I can't see the how blue was that this moves the distribution is so if you use the continuous color scale then we see how yeah how the value varies within the use patient okay so for temperature I used from the Bell's package the cool basically how to pronounce it's rainbow package talents 10 colors and I use it because color blue is the P associated with cold and orange to red to very hot and so the midpoint so green I have set the midpoint to 10 10 degrees is yeah when when it's from 10 degrees is really nice cycle that's like the distinction between whether it's cold whether or not so cold but I'm sorry 15 yeah so that's so yeah so yeah so yeah so again DM shape so this is the first shape object start object the layer is rustle layer second shape object liner borders polygons which are shown with borders you can also do this it doesn't make sense but just to show you that you can have multiple layers per shape object and now you see a dot in the center of each polygon which could for this case it's not very useful but where it is useful I think of it is doesn't have a name it only has identifier so what I can do is the labels and then the name of the variable okay now it's included with names if you see this one interactively then you see the labels as well but it's kind of slow because there are so how many hundreds polygons hundreds times the number of multiple objects in total but at least you can read the the question so far if not okay if not I think so the last part of my talk will be about large data projecting data of kind of things so if I look at the large integral two data data package so this one is 10 000 cells and four bands because as I told this is one to start proxy object so it's not stored into memory and only when when when you use it then then the data are and often over also don't separate so if I take u tm i think i'm still an impressive process those four bands you zoom out they see it's part of the Netherlands and you can always be select static motifs like this so eight bands and then again the channel names have to make them work again yeah qtm is handy for really quick quick mapping and title yeah this is the main title so you can have multiple legend blocks anyway so i'll just show you the main blocking method so there's an attribute in tm shape whether to not down sample the rest of it if you don't or why this does but it's not recommended because it's very slow and the number of pixels on your screen is limited anyway now use hd screens if if now they're downsized by 1000 by 1000 200 by 200 so we cannot see uh it's a higher resolution of 200 by 200 and unless of course you print it in high dpi yeah so for this image this is rgb block which now i set the max value to 14 000 yeah so for the uh the down sampling there is this three map option max raster sorry max raster it's uh named factor two values you upload plot of you which correspond to the mode both are set to one million one million means 1000 by 1000 if you do small multiples which i can show here also an example that i should show it then it's down sampled so by default by one million by one million but it's because apparently the number of x values larger than the number of y values it takes into account so it's this aspect ratio basically and now the down samples by the way not very useful because the original data is very close to these values but it's still kind of slow but it takes takes a few seconds if you do this so max raster is 1000 so in 1000 means for the whole plot then you see that it's downsized to 142 by 71 and it's much quicker and this one is that's the same information this one the value range a little bit probably there are some some maximum values some some pixels some raster cells that value 40 over here and when they are aggregated down samples they are a little bit lower so different skill issues etc so again what you see is because there are also negative values you can set yeah you can set the midpoint as i what i did with the presentation so if i set the midpoint to 15 then 15 is yellow and if i do it i would recommend to set n to an alt number then you have clearly have it still see eight because of the pretty skill still see eight eight classes because we are forced to use uh as i said n is nine they still have eight classes that's how the pretty works and if you look at the documentation and go to the argument style see that's all also so i show it's pretty and k means count but there are different there are many options so this made possible by the okay now i have nine groups uh equal size classes and now the middle color is yellow and this is how a sequential divergent color palette should work so you have like a one neutral color and two sequential palettes in both directions and if you whatever you use uh or a five seven or nine it's depending on the situation normally seven is what i would recommend because then you can also see difference see the color and also read the corresponding class see this screen um i think the last thing that i want to show is like or i've explained how to transform what yeah that's already told us how to transform our wall processors but i'll briefly explain how to deal with that so what you already saw is that if you uh convert a star subject into a different crs which is now so if i look at the crs of b b was the object from the citron data that's here so utm utm shown 32m and when i plot it into an interactive map because the base map is in the webmortcator reaction for 326 it has to be confirmed and a conversion is either a warp or or transformation transformation that's the one that's so this is a typical warp or that you see the the still the the rectangles and this like a transformation everything is because each pixel has to be converted to a polygon in order to plot it so therefore this one is slower than actually this this is that and this one is much quicker than this one so for most uses this one will do there is an option there is an option roster but warp my default is true if you set it to false let's try it's only uh 100 by 100 cells thanks to the setting that i set therefore it's it's not so slow but now you see that they're like they are little polygons like a motion this one is nicer but the other one is faster when you zoom out it's not so far so why this appears when yeah so i think that's it for my part so if you have any questions i think if you want me to show them let me know thank you so much martin um any questions uh well uh while you're thinking if you have any question i will share with you um the last meetup we have um i don't know if you see my screen did you see my screen let me know we can see you okay okay that's the next meetup we have an related chapter thank you for all you guys uh meet us today with ed sir and martin um also i want to thank them for your time to share your knowledge with a related community at mexico and so anyone have another question something to share well the only thing that i'm going to say is thank you all of you and we hope to see you again okay thank you very nice thank you