 Hello So she said my name is Joe Conway. I work for crunchy data been with postgres community for quite a long time I'm one of the committers. I'm on the PG infrastructure team and also a board member in PG us Just a disclaimer This talk is very code heavy source code heavy a lot of the source code is actually are It's obviously integrated with postgres and post GIS as the abstract says But you know if you're we're looking for something a little higher level. This is probably not it So hopefully I won't bore anyone with the details. I will have to go fairly quickly through the slides Another point I'll make is that the slides are very complete in terms of you should be able to after the fact I'll post the slides you should be able to cut and paste and Basically follow along with what I did if you wanted to try and reproduce some of this And that's one of the reasons I make the the slides so detailed is so that you can do that The code is a combination of sequel and are The other point is that there's a there's more slides in this deck that I'm even going to try and deliver There's a there's like 17 slides or 18 slides are in an appendix So some of the details are actually at the end in the appendix and I'll make note of where that is So I'm going to do a spatial analytics I'll start out by covering an overview and Then go into a section on how to how to ingest data at least certain types of data, and I'll go through some Basic analytics There's really not enough time in one hour to get anything near Complete coverage on the possibilities here So hopefully what what I intend to do is show you how our might be integrated with these other things and be useful to you and And the fact that by using our you actually gain a lot in terms of expressiveness because the Ecosystem round R is so so rich So there's a lot of things that you can do with it The background on the name of the this talk How green was my valley? I was reading a book on our spatial analytics not coincidentally and It was talking about in part of it was talking about something called NDVI normalized difference vegetation index basically a vegetation index is one where You take satellite images and in the satellite images if you break out the color spectrums It turns out that natural vegetation tends to be absorbed pretty completely by in the visible range but in the near infrared range, it's mostly reflected and so you can take a Combination of the visible and the near infrared and use that as a reliable index for how much vegetation is in the image that you that you've gotten from the satellite and This NDVI itself is a specific calculation. It's right here. It's a normalized value It's it's really supposed to be between negative one and one although in the presentation You'll see the values are going to show up between minus ten thousand and ten thousand because everything's been scaled up and Specifically the product that I'm using in this talk is something called mod 13 a1 That's one of the image products that comes off of the MODIS satellites Again, I'll show you a little bit more about this later on where you can get this data from how you can get it There are other types of vegetation indexes. There are other Products from the same satellites. There's actually multiple satellites that produce the same data In this case, I took the data from the the MODIS terrace Terra satellite There's also an aqua satellite that just do the same thing just at different times of the day basically There's in this data, there is different resolutions, so I'm using the 500 meter resolution This is raster data, which basically means you've got you can think of it as a grid on the ground Every 500 meters and there's a one point that represents that square So there is a 250 meter which is a finer resolution, but of course that means the data is four times the size So I chose the 500 meter because one they use that in the book and two I didn't really have enough this space to start messing around or patients really to mess around with that But if you wanted higher resolution, it is available As they said there are other satellite products that do vegetation indexes that are probably more detailed as well Yeah, I'm going to show you where you can get it There's lots of applications for this kind of data. I also as you'll see as I go through this I'm doing this for my county in California, which is Southern, California We've had you know, probably most people are aware there's been severe drought there for the last several years So I thought it would be interesting to look at the vegetation over a period of time I'm also going to cover in this talk a little bit about Administrative shape data and some geocoded data the components. I'm going to use our postgres post GIS R and PLR so everyone here. I'm sure knows what post postgres is and Probably everyone in here knows what post GIS is. How many people here are familiar with art a good number? Okay, that's good. So R is a specifically a statistical math processing language But one of the reasons it's to me very interesting is The ecosystem of packages around this thing if you look at the R C ran which is like equivalent to C pan for Pearl. Let's say Right now. There's over 10,000 packages available up there pretty much all of the cutting-edge research and math and and statistics is being Developed for the for R before almost anywhere else I mean even sass has our connectors these days Oracle has an R connector sass has an oral and our connector because most of this the newest kind of analytics is being developed in our first The other interesting thing here Worth mentioning is they on their site? They do organize things by what they call these task views because there are so many packages It's hard just to find what packages you might want to use for different things and if you go down there there's you know, there's things like machine learning and There's of course spatial So there's all of these different packages available just for doing spatial analysis analysis visualization Reading or writing of spatial data point pattern analysis Geo statistics So you can see it goes on and on and on so to me this is this is why I'm talking about R today It's also why years ago. I developed PLR, which is a plug-in Extension for Postgres that allows you to use R from a function within Postgres So in terms of getting all this set up and ready to use first You'd have to install your desired versions of Postgres and post GIS and PLR Create a database install extensions and then there's a required a variety of required Packages just to do what we're doing in this presentation as I just pointed out There's a whole lot more you might want to potentially use Just a note here that if you're going to try and follow along Install those R packages when you're logged in as root or At least logged in as the Postgres user and the reason for that is when PLR runs the R engine It's doing so as the Postgres Process which means the Postgres user needs to be able to see those extensions to use them If you install as root then any user on the system would be able to see them because they'll go into the Standard locations for the system So we're just going to create a database called GIS and then create extension post just create extension PLR This is assuming the binaries have already been installed on your system Either from source or from RPM Both of those things are available in the Postgres development group Repository for instance so you can get devians you can get RPMs on the R side Some of these extensions require Java and I found it necessary to do this R command Java Reconf Which basically just tells R how to find the JDK? And then this is kind of a neat thing about R2 you can In one command install dot packages you can install all these packages and on an analytic system It'll say what it'll actually prompt you for what mirror do you want? It'll go out to that mirror download all the source code and compile all of those libraries right on your system. I mean Maybe you like that. Maybe you don't you can also get most of these packages Available through your normal vendor for wherever you get R. So if it's RPMs, I think they're in Apple probably On Debbie and they have quite a few of them as well But you might need to do this for some libraries that are not available and you know through your through your vendor Alright, so we're going to start out in terms of ingesting data any questions before I move on So the first thing we're going to do is talk about how to ingest some data for administrative boundaries So this this is going to be Vector data so geometry type in post GIS There's a package called RGDAL With a function called get data and what this will do is it'll actually go out to well known Repositories of this kind of information on the internet and just download it for you right into your R session Which is which is pretty convenient and pretty cool. So in this case, we're going to use something called GADM Which is a global database of administrative boundaries and there's there's a link there in case you want to go look at that We're going to use that data from the R session to actually create a table a post GIS table And just a note that this get data function also supports a database of climate data a database of topographical data and And some other things so there's links for that as well down here So now here's here's our first example of here. This is a mixture of SQL and R because it's using PLR So you can see this is the normal creator replace function that you would see for any function that you're creating in in Postgres I'm going to tell it. I'm going to use some arguments So I want a particular country and a particular level and that's because this administrative Shapes database is broken down by country and by level and so you'll see down here What I'm doing is I'm selecting country USA and level to what that means Is I want to get for the United States all of the shapes of the states and the counties within the states because level 2 is the counties or And you know in some states they call them other things like parishes, but the equivalent is what would get pulled down So I'll get all of those shapes all in one go in order to do that. I have to load the library The RG Dell I'm creating some connection strings, which you'll get used by these other functions But this is this is where you know the rubber meets the road this one function will pull all that data off that website Put it into this variable shapes and then this right OGR function will actually Take that spatial data that's in the R session and write it out to a table in Postgres With a post GIS geometry column So this will end up looking like exact the table that this creates will look exactly like a table would look if you used shape to PG sequel function that comes along with post GIS It's basically one geometry column and a bunch of attribute columns and amongst the attribute columns There's gonna be a name for state and there's a name too for the count the county name amongst other things This function is using the OGR library and it and it's actually using in this case I mean this is there are other ways that might be Let me step back as I went as I went through writing this presentation One of the things I realized is that and I kind of knew this for a while But PLR knows how to translate Standard data types between Postgres and R. So things like integers and floats and even arrays are handled pretty transparently at a binary level post GIS is an extension so PLR does not natively know how to translate a post GIS geometry into a spatial object in ours, so There there's some workarounds in here that are probably not ideal That are only exist because I've not gotten enough round to its to to go make the modifications to PLR To make that a little bit more transparent. So this is one of those workarounds This is actually going to go out to Postgres Using the OGR library and directly create the table for me and load all that data into it Well, it's spatial data You can use the same function to write it out into a shape file Which is a specific format also and later on I'm going to show you another example of a function that uses that because I want to Present some timing performance information. That's what this is going to do This is going to connect back to Postgres using a regular connection That would require me to modify PLR, which is what I just talked about. I would like to do that at some point I just haven't done that yet It's a function in the database in order to do it directly I would have to make modifications to the C level to use an interface called SPI and That's that I have not done that So in this case it was for the purposes of creating this presentation It was a lot easier and it doesn't take long to run this it takes a couple seconds to create this table I mean running this function takes a couple seconds Basically it creates the table and then basically here this query here Which is creating an additional index on basically state name and county name That is actually this DB get queries executing that string that is actually happening through SPI Because PLR knows how to do that so Calling this function like this is going to create a table called counties Using the data from the US down to the county level So now let's see what it looks like to actually do something. Let's plot this thing There's going to be a couple of different methods that I use for creating Plots and and saving them in a way that I can use them That's intentional I wanted to show you alternative ways that you might do that the first way is that if you return byte a from a PLR function Whatever object was returned from the our side gets serialized and sent out as binary basically so in this case there's some code here this this part here and basically these two lines here are Setting up a buffer in memory so that when the plot gets rendered it gets rendered to that buffer and then that entire buffer gets Returned so basically the image just gets streamed back to the client in this case So this is one method you can use To generate a graphic using PLR and then capture that Graphic for later use whether it's displaying it on a website or if you want to write it out to a file Or you want to start in a table somewhere you can do any of those things Well, the standard formats are supports yes so in In this case what I'm doing here first I have to set up a projection string that represents the data that I got Anyone here not familiar with projection strings the idea of projections Everyone's pretty much familiar with it. Okay, great Again what I'm doing here is I'm selecting out of that geometry column out of that table that I created just a minute ago And I'm gonna have to and again This is the the weakness is the part that I haven't fixed yet in PLR But I'm converting that geometry to a well-known text format And then I'm gonna basically take that well-known text and read it back into our right here And then just plot so it's just a single command plot that that'll generate the plot So this is what the outline of San Diego County looks like this other bit Here is a function that I provided because when you get the serialized our object back out as bytes there's actually a little bit of Rapper that are puts around the image itself when it stores it internally and this this PLR get raw strips that off So you get you wind up with by taking this With the output of that function you get exactly the image now Later on I'm gonna need to make use of what's called a reprojection. It sounds like everyone here is kind of familiar with that that idea but basically When you've got kind of a 2d image Projection is what how you want to render that to that what's really a three-dimensional object of Something that's laid out on the globe as a two-dimensional object And it turns out that this modus data that I was talking about earlier has a different projection than than this administrative data The modus satellite data is raster data as I said, which means it's a bunch of data points in a equally spaced grid and Yet but these points are Three-dimensional vector points you could actually Re-project the raster data to the administrative data But if you do that because of the equal spacing and everything when the spacing changes you'd have to extrapolate So you'd end up having some some loss of fidelity by doing that So you're better off if you reproject the vector stuff into the projection of the raster So that's what this is doing here and again as you can see it's just a single R command Which is again very similar. It should look familiar to you if you've ever done this in post GIS Just SB transform over that data and then plot it and it now it looks a little different, right? And that just has to do with the perspective of the satellite So there's a couple other alternative methods that I could use to get this data. I could use basically I wrote out the exact same data as shapefiles and in this case This will just read from the shapefile instead of from postgres into the shapes Object and then I can just Subset that to get just the county and the state in the county that I want from shapes and plot that or Another way I could do it is I could read back into postgres using this read OGR and essentially pull the whole table in using read OGR But you'll see it doesn't make sense to do that The original method that we used was quite efficient because all we did was grab just the data that we wanted for San Diego County and Pass that from postgres to R. Whereas in the other two cases. We basically had a render the entire Layer and then just subset it and pick out the piece that we wanted And that was actually one of the reasons I wanted to kind of do this Presentation the first places because I was as I was reading through people doing this in pure R That's what they tend to do is they they tend to do something more like one of these two methods Because they read everything all an entire layer into an object and then they subset it in R And there's no index to help Okay, so now I'm going to move on from the administrative layer stuff to geocoding any any more questions about the administrative layer The third data type I'm going to ingest is actually raster data So this this who here does not know what geocoding means anyone So the idea of geocoding is you basically start out with like a postal address and you want to convert it into Spatial data, which is more of a Latin along right? So there's some services available. And again, there's an R package called GG map and It has the option either use Google Maps API or something called the data science toolkit Just to note these are both Something that you can use kind of for your own use. They're not intended for commercial use They're limited in the number of queries you can run per day So, I mean for commercial use it may not be the best method But you can you can do this like 2,500 times a day on Google Maps and I think 10,000 a day for the data science toolkit So the process that I'm going to use here is I'm going to create a list of addresses for some points of interest I'm going to use that library to go geocode those I'm going to add some names to them I'm going to set a coordinate reference system So I know what projection it's in and then I'm going to create another one of these post GIS tables using that OGR function So this is what it looks like I'm going to you know again This stuff is what you're doing in an R is fairly straightforward this geocode function You just pass it you pass in the array of addresses that you want geocoded as an array here to postgres to the postgres function and That just becomes a vector that gets passed to this geocode function and now that creates My layer with with lat longs for each of those and then I'm going to add to it an array Which becomes a vector of the names of those points? So I just combine that now essentially I've got what's called a data frame in R And now if I tell Shouldn't shouldn't be so loud if I tell R now that long and lat Attributes of this data frame are actually the coordinates that converts it into a special Data type in R called a spatial data frame so that all of these other functions within our know how to work How many have I tried in here? I tried three. I didn't do a ton of them You mean how like how fast is it? I've not tried to do any kind of performance testing. Like I said, it's limited to 2,500. So if it's using the Google API No, I didn't try 2,500. I tried three and it was very fast But I mean if I tried three and it was very slow then that would tell you something anyway So I'd be interested to try like maybe after this talk. I'll go try it so you set again you set the The projection string and then use this right OGR again to create a table with the data And then I'm also going to create an index on it So the same as before and then when I call the function, I'm just passing in these two arrays These are three addresses. These are three airports that happen to be in San Diego County And the layer name is going to end up being the name of the table that it creates So now when I want to plot those I've switched here so before I was doing plr functions now I'm showing you how you might do this in just straight R Just the thing to note here is if you basically take everything below, you know this Function decoration part here you could just cut and paste that into a plr function And it would become something that you would run through plr When you're first doing this kind of thing You probably want to do it interactively until you figure out what it is that you want to do And then once you're kind of happy with what you're doing at that point You might want to bottle it up into plr so that it's kind of more controlled once it's been qa'd vetted whatever So it and the other change I made here is I'm I'm passing in the name of a file for my image And I'm basically saying if that file name is not empty I'm going to set things up so that are just creates the file on disk So instead of streaming it back It's just going to create it on this just another method to do the same thing So this is similar to what we were doing before I'm just selecting the the data out of the the table for the areas event the area of interest table and That that's my county outline and then on top of that. I'm going to read the data for those points of interest and Again, the nice thing about R is what I can do here is I can plot the San Diego County outline and then go grab the other data and then just Call plot again, and it just I Say added equals true, and it just adds it to the plot. So now I'm just adding another layer to the plot So this is a fairly small Function and now I'm plotting San Diego County and it's maybe a little hard to see But there's three red dots here, and those are my three points of interest in my area of interest So now we'll talk about the NDVI Download there's a lot of text on here. I'm not going to try and go through it in like great detail But if you this is where you go get this data. It's called the USGS site. It's USGS.gov. It's Earth Explorer and There's an interface here Basically all this area on the left here is where you're going to enter all the criteria and Then it'll select Based on your criteria which things you want to download and then you're gonna go through and check them off and say Yeah, I want to download all of these things So that's what this is kind of walking you through what I did I for it you do need to get a log in to their site if you want to be able to download the information. It's free I Set Alameda County as my search criteria and what it does is it will You pick a point Somewhere within where one of the satellite images and it will basically give you the satellite images that cover that point When I originally put San Diego County in there It turned out that it was far enough south in in the US that there were actually two different passes of the satellite that Got San Diego County, so I ended up with twice as much data and it was redundant I didn't really want it so I ended up using a Northern California County just so that it would get me just the continental US I Ended up downloading all of the data from 2000 through the end of 2016 basically these satellite passes are done They're constantly being done, but then the data gets aggregated into like they say 16 day intervals But it's actually more frequent than that's more like every six days And so over the course of six days They'll have multiple passes and even multiple passes per day And what they try and do is boil the data down so that because as you can imagine with satellite data Sometimes you're looking at clouds. You're not getting good data So they keep taking the images and then they take like six days worth of these images and they go and they figure out Before they even give it to you. What's the best? Pixel out of that time frame that represents this area for that time period So you'll end up getting a raster where each pixel may come from a different satellite image And they kind of keep track of all that and I'll tell you a little bit more about that in a minute But the bottom line is this ends up being a hundred and four gigs of data that you're downloading You know just at the 500 meter resolution if if I'd done the 250 meter it would have been like 400 and something gigabytes And it took like 12 hours to download they have they provide you with an application To do the downloading it actually Retries and does all kinds of fancy things so it was kind of convenient, but it took a long time to download it So this is this is the area of interest that there's a spot on here where you can actually click and it says show footprint for the image that you're that you're currently looking at and This is this is the area that it the satellite is collecting Basically, and remember we just want one little Patch out of there. This is a screenshot of the downloading app. So like I said, it's it's just a small job Lightweight works worked pretty well I Probably wouldn't be very used very easy to get this data without something like this. I ended up with 1166 Individual zip files to download for that to cover that period of time for 16 years So in terms of processing the data The process Basically, you know one of the things I realized pretty quickly is that not all this is real life Not all the data was good. So out of 1166 files nine of them were actually bad and You know I first discovered that as I was looping through to unzip them and I'd get errors But then I quickly realized that if you just kind of looked at that directory and Look for the files that were sort of an anomalous size So they were all bad because all the rest of them were pretty much the same size So the first thing I had to do is clean out the ones that were bad This pre-processing function, which I definitely I'm not going to try and go through it's probably You know 80 or 100 lines of our codes. It's not a huge Bunch of code, but it's too much to go through in detail here But this is the basic process Load that area of interest or the San Diego County boundary again Re-project that to the the raster data Projection and then loop through the downloaded files you unzip them You get three different rasters out of that you get this NDVI data that I've been talking about But you also get a QA raster and an acquisition raster the QA raster basically on a pixel by pixel basis Tells you whether it thinks the data was good So basically if if the QA raster pixel equals zero It means the data in in the NDVI raster corresponding that pixel was considered to be good There and there there are various levels of degradation, but basically anything that wasn't a zero I just I marked it as na which in our terms is basically like making it a null and Then this acquisition raster has to do with the exact date that that pixel was acquired on because as I said in one file There may be six different days and different acquisitions per day So I ended up using the acquisition dates I converted them into year dot fraction of year based on number of days and Then taking an average for the entire raster and saying right for this Raster layer that I'm going to save this was the average date basically represented as a float and as I said I N8 all of the The pixels that were not good for from the QA data, but I also used that San Diego boundary to crop the raster data, so cropping is basically you take you know essentially a square that that the minimum square that fits around that shape and Throw away everything else completely But then in addition to that I also did something called a mask operation Which in addition now everything that's still outside of the band boundary of San Diego County got marked na The reason I did that is because now when I do any calculations on that raster I'm only looking at data that's inside the county boundaries and then after doing all that looping what I end up with And again, I could have done this differently, but I end up each loop I now have basically San Diego County NDVI for that That NDVI file and I stacked them into what's called a raster brick So it's a bunch of layers in one raster That I end up writing out to a file and I could have used post GIS to write it out to a table It didn't feel like it was worth it given the way I was using it It's probably something for some future version of this The other thing this function does is it returned all of those calculated dates and the file names that were associated with them Because that turned out to be very useful for debugging and it it sends that back to Postgres as a a by day a binary So again, it's a it's a data frame in our that gets converted to binary and I just store that in a table for later use So this is what the you know the header of the footer of that function look like as they said that's in the appendix This is what it looks like in terms of calling it. I create this our objects Table which is where I'm going to store that date time associated with the raster layers I call the function insert the result back into this table That took a little bit over two hours to run on those 1157 files Now this is basically if you take the plotting that we saw earlier and we overlay some additional information That includes the NDVI so now I'm going to Re-project everything to the NDVI projection I'm going to actually acquire my a layer from the NDVI and When I plot that thing, this is what it gets. So you saw this pretty much earlier in the reprojected form There's my three points of interest for the airports, but now it's overlaid with the NDVI area And so you can see you know the green areas are the ones with more vegetation Okay, so as I said earlier when I when I did the processing of all that in the NDVI data I returned the dates and the file names to Postgres, but I also wrote out that raster brick as a geotiff So rather than storing it as a raster data in Postgres, I kind of cheated I just put it into a file that was easy to do was actually pretty quick So this is reading in that tiff file in order to get the NDVI data Yeah, so this is frame 42 out of 1157. So this is just one example of it so now I'm gonna this is a Kind of a helper function that I just give it the name of the object I want to pull out of the table that I stored the r objects in and it'll just return me that object as Side you know and aside PLR has it supports this PLR modules table. It's actually been some talk on the list lately about Alternative ways to do this, but for now. This is what PLR does Basically, if you create this table and create a function in our function You insert it directly into that table when our gets initialized it will instantiate It'll it'll compile that and put it into the environment So now this becomes a helper function that I can call from my PLR functions And what this one is going to do is it's going to take that floating point Representation of the date and actually turn it into an actual date. So then this function takes the by-day input Which now is now inside of r becomes a data frame again And it's going to use Format and my little helper in the PLR modules table to actually output all this data as a date So now if I now I have access to all the data that was in that binary object that I returned from r earlier So let's take a look at what does NDVI look like as an average over time In this case, I'm passing in again the name of that raster file with all my layers in it The cell stats function will take this function here and apply it to each of the layers So I'm going to take the mean basically of the entire layer So all of San Diego County, I'm going to find the mean of the NDVI value for the county over time and I'm going to plot it Well in this case, I'm actually just going to return it. So these are this is just an example of the first three in This function, I'm going to take it one step further. I'm actually going to plot that data Along based on time So this is now you can start seeing all right. This is what This is what my Dryness or my vegetation in San Diego County looks like from 2000 through 2016 and you can kind of get a sense that there's a you know, there's some upward trends and some downward trends That kind of makes sense But there's another way you can do this with our using a different plotting function called GG plot and In particular it supports this thing called GM smooth When I plot that it actually automatically does geometric smoothing for me So it's a geometric moving average smoothing and now you can definitely see that there's a trend in the downward direction in terms of vegetation index So there's just a couple other visualizations. I'm going to do here. I'm going to do average by month So a lot of this is the the same stuff. We've been seeing in in here I'm just looping through 12 months of one particular year that was given as an argument And for each month I'm going to take all of the data for that month and average it together Pixel by pixel now instead of all the pixels in one raster every pixel is going to get averaged together for all of the data That came from one month So now and again, I'm just calling plot once but since I built a raster brick with 12 layers and I plot it I get 12 plots. So this is January through December and you can kind of get a sense for it's you know Greenest in the winter and then as you move on toward August it gets Considerably drier. This happens be 2011, which was the peak of the wet Season that we saw in that trend data Now if I do the same thing except I do a particular month across all of the years Now I have one for each. So this is January from 2001 through 2016 and You can see the kind of the progression in January of How it gets drier and then wetter again and then drier and Then this is it's kind of kind of a nice contrast if you go from January to August You can see how much that changes So that's it Any questions? I think I have a few minutes left Eight minutes for questions well In so if you go back to this function these dots here are just mean it's continuing on to this So this is complete. That's a complete function there. It just won't fit on one slide This one what I'm showing is everything kind of above and below this part is The same as it was in that function all I really did was I replaced this looping part with a different looping and You know back in like this is this is fairly complete, but this is pure R. This is not a PLR function here This function is complete when you saw This part if you were earlier we had another plot point of interest function earlier in the slide deck and Again everything that's dot dot dot is the same as before and these bits were added to it And then finally that the the loop processing that I was talking about of the actual raster data is Kind of yeah, this is it's all the way at the back after the appendix in this Pre-processing function, so if you wanted to look through that there's some additional information About how that data all works I'm not sure I'm following that question The San Diego County boundary I Actually I created that raster brick as a file, but I could have created I'm post GIS supports raster layers So you could I could have stored that in in a table in Postgres And in fact you could you could have not done all of this subset and you could have stored the entire original image in Post GIS and then done the subsetting kind of at runtime so that you could do San Diego County now and you know Some other county on the east coast tomorrow Yeah I mean there's a there's a lot of there's a lot of ways to do all of this stuff And I could have stored a lot more data in Postgres. I just didn't really want to Consume the room on my hard disk, which didn't really have enough room on it You know I downloaded a hundred gigs of files and if I had loaded that all on a Postgres database It would have been at least another hundred gigs maybe more But the nice thing about being able to go back and forth between post GIS and r Is that you know as I said ours got tons of different libraries that you could potentially apply to this data I haven't really thought I haven't really thought about it I mean I tried to write these functions in a generic way So they weren't you know hard coded to return San Diego County So you could you could kind of take what's here and extend it to an to a degree Well, I mean there are There's there's I don't know if you're familiar with grass But there's a lot there's an open source library called grass that also can tie into It also uses a lot of the same underlying libraries and it's open source and R can tie into grass as well So that's kind of a you know a GUI layer for this kind of stuff There are other you know, there's It's a QGIS is like a a GUI for doing geographic editing There's of course commercial products like arc serve from esri But you know this is all about trying to automate the processing Because you know, there's certain things that are That lend themselves to hand operations And then there are other things where you you don't want to do it by hand anything else All right, thanks