 Of course it's just a two-hour tutorial, we cannot cover everything, so I will not cover any map stats theory of course. I have a bunch of links in this tutorial but that's about it. Also there are many other packages available for modeling spatial data or spatial temporal data. Here we're really boiling everything down to a huge dataset down to the core basic ideas of spatial modeling and also along those lines we're not going to talk about for example coordinate reference systems or how to best handle polygons and fancy display images, how to bring down lots of Google Earth engine data and so forth. Essentially we have a huge matrix x, y coordinates and some observations and we work on that. Of course ultimately you would be interested in also working then with coordinate reference systems layers, rosters but at the end of the day here for those two hours we break everything down to tripled representation of the data and do very simple basics analysis steps in our spatial modeling approach. I have a very short introductory presentation here as I said just to get the same language then the first exercise where we don't distribute you in breakout rooms, very short one then the second one a little bit longer, a short bio break such that we can relieve our laptop officially after a break another short presentation from my side then the third exercise and then a little with a longer open problem there that may take a little bit longer half an hour or something like that who will be on time that's for sure. For the breakout rooms we envision you to put you with groups of five and please profit from this small cohort ideally you know someone just shares the screen has RStudio running and all others help coding. We provide you with good starting chunks of code such that you only need to add a few lines or understand some parameters and that's best done when you when you discuss with each others. We would then visit you during this exercise times visit room by room and maybe we are not at the proper room whenever you have a question so please put your questions also in the Slack channel Roman has already put the link in in the chat there this is also clickable link and if you would like that we visit you in the room then please just state the room number and possibly then also the question the material is all on this web page and what's probably important is at the bottom of the web page there are already four files the first one is the PDF that I'm showing you right now then there is a data set that we'll use and one markdown slash html file that we encounter in short as we move on during this tutorial we'll upload their more markdown files more html files i.e. all the solutions to the corresponding exercises and so if you struggle with one sub problem or if one sub problem is a little bit longer more extensive or so don't worry the solution will be there on just after the problem okay about spatial statistics when you talk about spatial statistics then we envision indicated here in blue a spatial domain and within the spatial domain I have a bunch of spatial locations locations in short say n of them as one up to s and a measurement associated with that spatial location and the idea is that we use or apply the first law of geography from from a Swiss American geographer from Walter Tobler who stated that everything is related to everything else but near things are more related than distant things so we will use and exploit this proximity this dependency in space and here in geographical space there are different types of spatial data and quite often we classify them according to the following three possibilities so first there's a point reference data that's what we will use that's what I've explained before but quite often we have observations that is aggregated for a certain area say one measurement by community municipality or district or state or counten or whatever okay then we do not have a measurement for every precise geographic location but just one value per area we also call this lattice data we will not really cover that in this tutorial although to a certain degree it is very very similar to point reference data certain aspects are almost easier so you should be able to handle that once we've covered the classical geostatistical approach the third one slightly different than the first two that would be when you have a stochastic mechanism drives the location okay so that's called point process data the handling of dust that is a little bit different than the first two so we put that entirely aside of course there are many many packages for spatial data the analysis of spatial data for the three different types of spatial data that we've seen before a few of them are quite old old in quote but well established fields run fields g or r for example in class when I teach I typically use g stats space time they're real nicely elaborated main developer gave a good keynote talk yesterday spatial rec sb depth for lattice data spying space spot start for pole process data and so forth then there's a couple packages to handle the spatial data per se say simple features as up stars or sp and so forth here we will use as I mentioned spam and spam 64 spam stands for for sparse matrices and yes spam is quite old I've written that started right that shortly after my my phd google search ability was not really an issue when I named the package that way so these days maybe I wouldn't I wouldn't call it that way anymore when we talk about spatial statistics for point reference data we have essentially three different problems so one of them probably the most intuitive one is prediction so we would like to predict quantitative interest at an arbitrary location that location is within our spatial domain so that means prediction is equivalent to filling missing data or sometimes we need to force the data on a regular grid maybe for visualization procedures maybe because we need that data as an input for some further processing approaches or we could even smooth out the measurement error ie we would like to differentiate notes from the signal but all of that will be done of course in view of law of first law of geography by taking into account the spatial dependency here I'll have a prototype equation for spatial statistics and I have this additive decomposition where on the first terms here on the right hand side we have possible predictors then possible mean term so large-scale variability then my z here would be small-scale variability my spatial process so to speak and at the end I have a possible nugget effect the possible noise measurement error or something else so no spatial dependency iid typically to make things a little bit simple we're going to avoid the notation about the first moment and so quite often for simplicity when we talk here present formulas or also in our R session later on we're going to just assume either constant mean or no mean at all of course in reality we would have a mean but typically the mean doesn't really cause much of a headache as causes the second order structure of the small-scale process so prediction and I'm sure you all have seen that in some way or the other would would fall down in a case here to minimize the mean squared prediction error and in case I'm in a Gaussian setting then of course things are a bit easier otherwise we retain ourselves to linear predictors and so at the end of the day my prediction is based on the best linear unbiased predictor and that is nothing else but if I want to predict at a location I take into account the variability so observations that are more uncertain receive less weight for my prediction than than those that have less uncertainty but also per observations that are nearby should receive more weight than those that are far away and this is essentially here this this formula I do weight answer and I do weight proximity the big message here is essentially we need to solve a huge linear system and this is what what spatial statistics for large spatial datasets or huge spatial datasets it's all about a huge spatial system how can we solve a huge spatial system and yes presented here just for one spatial process for no trend for non-covariant structure but if we would relax all that essentially we still have to solve one huge system if you use other a little bit more sophisticated spatial prediction routines and tools at the end of the day they still have to solve linear systems maybe not as large as the ones that we discussed here maybe many or several of those so it all boils down how can we manage this best linear unbiased predictor for for large datasets this is what's all about and here again look at this covariance matrix we need to solve that and this covariance matrix of course contains the dependency information and the covariance matrix grows and squared and the number of observations and now you directly realize why we're here as an increases it gets more more difficult to solve this linear system for estimation unfortunately it's not much simpler here so for estimation we have typically a parametrized covariance function this covariance function is then driving our our covariance matrix what we've had before you know is describing the uncertainty of my observations and the relation to each other and what we typically have is such like a green function here you know such maybe an exponentially decaying or maybe a thirdly decaying covariance function either asymptotically converging to zero or zero after a certain threshold after a certain so-called range we know the shape of the curve but not the exact parameters typically we call those parameters say the partial sill that would be the height of this green curve the practical range how how slowly or how fast the curve decays if we have an additional spatial noise component the nugget effect then we also have nugget effect parameter here indicated by by this small offset this discontinuity okay so here in our framework today we're going to estimate the range to seal and the nugget with a likelihood based approach in a Gaussian setting here depending on what software you use sometimes you have to estimate the nugget to seal and range or just the nugget to seal or or some additional parameter that depends a little bit the last point the third point that typically needs to be addressed is simulation here and you we have to differentiate between unconstrained simulation and conditional simulation both of them rely again on those huge covariance matrices that we have already encountered a couple times here this will pop up later again and I just put it up here that we don't forget to talk about that numerical instabilities can always appear if we have huge covariance matrices okay so before we start in with the actual part there uh coding part summarizing here what we have so we essentially work with a two-dimensional space here uh x y coordinates uh very simple there is one example in in one dimension for illustration but that's it we typically use Euclidean distances you may use great circle distance there is not much of a difference between the two from from a computational point of view we place ourselves in the simplest setting namely Gaussian stationary distributions for that we have very simple form for the likelihood in practice you can relax that and possibly or in many settings you can still write down the likelihood it's a little bit more cumbersome so maybe for for tutorial it's better to stay in the stationary setting I mainly talk about prediction and estimation a little bit about simulation and what's now important is even though the title is about huge we don't have the time to really treat with huge spatial datasets but all the code that we give you and all the ideas and tricks they do scale so even though we work only with large datasets here they do scale so that was our purpose you know that that um if you have time later on tonight tomorrow I don't know you can really run it with with a huge larger dataset and for certain computations we will give you also to approximate time which would have to to count to do the full fledged analysis so to a certain degree you know for this tutorial we use a single core maybe one or two gigabytes of of of RAM and even if you don't have that much no worries then then you can still run everything through here good so focus on spam is the package that I've started to write very very long time ago at that time there wasn't really another package for sparse matrices on the market that did allow subsetting or similar things it's not been or it's mostly been a one-man show so there is the the capability of the package is more tailored to to spatial stats and not something else but recently what we have done is we have extended it for 64 bit capabilities and we talk a little bit about that and with spam 64 you can really really work with huge huge spatial datasets all you need is then a little bit of time because of course working with those large datasets you you need time but at least on on the computational and algorithmic side there is no limitation anymore here the Gaussian likelihood this is now essentially what we need to address pretty much everywhere when we estimate I need to evaluate the likelihood when I predict I don't have to evaluate the full fledged likelihood but I have essentially the same part as here at the end you know such a linear system that I need to solve so if I can follow the likelihood then I can essentially solve all all the problems estimation prediction and simulation and we solve the likelihood by decomposing my covariance matrix in lower and upper triangular matrix via a kolesky factorization and then for prediction just a couple back solves for the likelihood because I have already this triangular matrix then the log determinant is straight straightforward and I think there is a log missing in this equation here yeah there should be two times the sum of the log dyac of you sorry about that good okay so here I'm gonna stop sharing this pdf I'm gonna quickly pause if there are any questions so far quick question yes right question in the chat so Artemis asks would you please explain how stars work Artemis maybe you can or which stars do you mean and you need to unmute sorry dimensionality reduction method named stars I'm not aware of dimensionality reduction here in this context so we have would be a completely different different aspect here of handling typically we have just two dimensions for our locations possibly three and then one response surface and not really multivariate so dimensionality reduction has a slightly different meaning here maybe we can we can discuss that later on or offline here because that that will go a little bit outside the scope here I'll then propose that you open your R studio you go to that page the one that we've showed you before this one here and that you download say a markdown or the HTML file and that's what I'm doing now so I've just opened here in this R studio oh you don't see what I'm talking about sorry about that okay so one more time next try so what I propose is that you go to this webpage here that you download the markdown or the HTML file and that's essentially the same what I do and then we can go in parallel through the the document here so and I presume you can all see what I have here I need some nodding from Roman that's okay can you read perfect thanks a lot so here at the beginning I just have copy pasted the gaussian likelihood again in case we need that it's on the HTML I'll I like to have that always always present for for my mind okay and we start with very simple exercise essentially a likelihood estimation for a small dataset and what I propose is we're going to go through the code here so first we're going to load some packages and really small literally just meaning 100 data pumps so this is really really really tiny so not really worthwhile talking about time and I'm gonna just generate some location here oh I'm gonna just check here logs whether that works okay here are the locations in the unit square and I will now generate a covariance matrix okay and that covariance matrix will be based on on the distances okay and our disk calculates the distances and then I use this green function that we see here for kof bendler okay yes I can I can there was a question that I could increase a little bit the font and the parents I think correct here edit the font 12 apply better like that and I need the function fields here first okay so uh we were there the artist here calculates the distances kof wendland is this green covariance function that we had before maybe I can just copy I can just show you that yeah curve this is the function the green function that we saw before so nothing spectacular at distance one half range one half it reaches zero and it remains zero and then we use the function rmv norm um this is a new function in spam uh if you don't have the news function the news version of span downloaded you would have to use mbt nor uh with a slightly different argument okay and then the quill plot we saw that before so here we have our locations again with color coded observations and we clearly see spatial dependency so blue points here nearby blue points reddish ones here yellowish ones there and to estimate I just use the function emily no mean no mean because we don't have a mean here I specify my observations I specified a distance matrix the covariance function and an initial value a lower bound and an upper bound very very similar to an optimum call and actually emily calls optin inside so the output of emily no mean is very similar to an optimum call so we have here the value the parameters which are parameters here so point four seven or four eight point seven eight instead of point five one so not too bad with one hundred observations the value of I think that would be the negative local likelihood would be 40 ish it took 23 function counts to evaluate but we have our parameter estimates okay so now based on this code we have prepared the following tasks for you so I would like to from you to reproduce it with a specific set seat while showing the code I've expected a bunch of interventions from the audience and say well where's the set seat where's the set seat indeed set seat is really required here then maybe you can play a little bit with the number of occasions and see what happens with the estimate possibly you can even add a mean okay so you'd have to modify the simulation of the y vector but also then destination and then those that are already done they may quickly check what if I would have chosen a different covariance function so this green function which is actually not green here but black in this panel here what if I would choose not cove wendland one but cove sphere these are the tasks I would like you to work a little bit on that we reconvene after a couple minutes because I'm going to discuss the solutions here task clear please go ahead there was a question and on on part three this additional mean so I would propose just to use the function Emily which is essentially like Emily no mean but with a mean and for that this is this x better construct you can choose x as just being a matrix filled with ones of length and so let me quickly discuss here so I'm going to just work on this code here so yes indeed we should first set seat here and I'm going to just pick 12 and of course probably not much changes the image changes a little bit but the estimation here remains pretty much the same if I would increase here slightly okay well of course the points here in my image get a little bit denser here function counts is a little bit higher the estimates are in the same ballpark maybe I could put here on the list and then I'm going to just pick one two four I'm going to pick one two three four what's the name of the object that contains this information this you mean what I'm showing here would be the the output of Emily no means and that's essentially the output of opt-in if we look at the function Emily no mean okay here what does it do it essentially defines inside the negative two log likelihood similar what we've displayed before there in maybe a slightly more cryptic way with these back solve forward solve something like that we'll see that one but essentially there is this negative log two likelihood function here oops sorry and then returns the opt-in call so it's essentially a wrapper so nothing more but a handy wrapper that's why I've included then I don't have to worry about this this log likelihood function if I will get every time I do some error in a likelihood function a couple bucks I wouldn't be or I could spend a lot of money to do some charity organizations okay okay so here unlist let me not sure whether that works on it let's just check unlist okay so here now we have the parameters how they still vary of course depending on on the realization of course but the value decreases because we increase and function count decreased now and here's also the message so first of all um always check whether you have conversions or not and function counts may go up may go down it's not really directly linked to the to the data set but this will be one of the culprits when you when you brute force estimate with with opt-in and you may have noticed as I move up here it gets every time one or two seconds longer and until I get there and then for the what we're going to use for 100 later on let me just rerun that for 100 then we have it to add in a mean function I would just add here mu and down here as I've mentioned we would eliminate the no mean and I would have to specify x my matrix here plus an initial value for for the beta what to choose for maybe initial value well which is okay pick the the plane mean good so I think oh no here four um what we have had the curve that one here and then with parameters 0.5 and 1 now we're going to do it in green and then we're going to add in a second one and then the spherical one there is it okay so here we have a different behavior at the origin but both decay to zero and remain at zero and it's very important to to realize and the solution said indicates that um because we have few more quadratic behavior at the origin the resulting surface is much more uh uh or appears smoother okay strictly speaking in mean terms differentiability it is not maybe I can just uh copy paste maybe that works otherwise I would have to to run everything okay so Wendland on the left circle on the right the the Wendland appears much more smoother the code that I've just picked to generate those figures is um virtually now uploaded let me copy paste those and Roman can you quickly confirm whether they're up there or not and then there was a question in the zoom uh the why values the why values are the observations and this pdf you know these are say the the the response that we measure could be for example uh the the precipitation amount at a particular location or the temperature at a particular location uh the heavy metal content in sea sediments or things like that sorry Reinhard I think that Steffi was referring to the covariance function ah that the one that you have just shown um can I actually sorry let me um clarify I was actually referring to the why that we are simulating with the rmb norm so I think you I think you did answer my question right so the why that we're simulating with the rmb norm function that would be essentially are simulated um the measurement value so let's see the precipitation etc okay thank you can you please uh say again where is this called pasted yes I can try or Roman can you just uh put the link in the chat again yes it's the solutions are online now and you can just use here oh I would have there it is okay in one second faster yeah so the um I'm gonna click on this html page here and if you go pretty much all the way down we see exactly the images that we've just generated okay so then let's see what what do we have now so of course um what we've done until now that's that's very very small I mean even if we will go up to 1000 that's not really large but what you've noticed is that just with this standard approach here you know it increases of course quadratically at least in in memory even more memory quadratically in in computation time possibly even more such that for for 1000 locations we already have pretty much eight seconds computing time and 1000 is not really large so so time is growing so everything is religious because of this huge covariance matrix and so the question that I have now is well is it everything just time so if you would have more time would we be then safe okay so I mean I could be a coffee addict and I could just launch my simulation get a coffee come back and and have my result um now it's not just the time it's not just the time and here's a little bit an excursion very very shallow excursion I admit to memory and r so in in r you have to imagine everything is stored like in some sort of a vector or possibly a collection of vectors our covariance matrix essentially single vector with one attribute that gives me the the dimension and whenever I work with this matrix I have to access elements there are so I will I will go in my memory and then jump to a particular location and get that element so this is like subsetting okay like like what we have here if I have a vector v I subset vi would give me the element and for a matrix I would then be just you know a little bit longer it's composed of column and rows and now if if if I look at how much memory actually is required if I calculate if I have a vector of of length 2 to the power of 10 so essentially slightly more than 1000 elements then I have already 4000 bytes so essentially every element every integer element in my in my vector takes four bytes and and a short header okay so this is slightly more than four times 10 uh two to the power of 10 slightly more so this is the header whatever is more is the header the header contains just some information about the object but the head remains essentially the same if we increase here say go to 11 then we have exactly doubled or almost doubled the size here same delta for the header okay so now this accessing is typically done with an integer in r okay in r itself typically or standard or pre or free that was an integer and hence an integer is not arbitrary large an integer in in r in this computing environment and there is an upper limit this upper limit is given here this is the upper limit okay so this is the largest location that I can address in a vector in a standard setting within integer and that's really what works well in r but this is quite a limitation quite a restriction this is essentially based on old 32-bit architecture of our processors by now we have 64 bit so they started they core of ours started to realize well maybe we should go to so-called long vectors and then now long vectors are now then indexed with a double but with that we can possibly construct larger objects so we can construct larger objects now the question is well how much memory do I actually need then on my laptop to work with that so please don't execute this this line here this line what I'm going to do is I'm going to construct so to speak like a vector or matrix of pretty much maximal size of doubles so that would be a covariance matrix the largest possible covariance matrix that I can address and work well in r which is 16 gigabytes okay 16 gigabytes not not nothing but if you work on on a real uh not laptop but say workstation or or or cluster so you typically have way more than 16 and 16 gigabytes is is not really nothing because if we check here well let me add here 16 so possibly the largest number of data points that you could work with is is essentially 46 000 okay so there is this natural upper limit on large datasets that you can handle classically everything else needs to be particularly addressed and that's a memory limitation but we still have the time limitation of course if you would really work with matrices of of gigabyte sizes then standard um um tasks may take a couple hours or even several hours so don't don't forget that a message here you know um when you work with such huge datasets what we typically do is we subsample we have something that runs well for a smaller dataset but the code is scalable scalable like what we do here okay that's pretty much independent of of data size it's just a matter of time good so in order to address this uh issue here with large datasets we propose to use sparse matrices and um i'll depict here the structure of my sparse matrix that i've of my covariance matrix that i had before this covariance matrix is sparse sparse y well we have a covariance function my green function decay to zero for large distances i have a zero covariance zero correlation so there is a zero in my matrix and the idea is if i have zeros in my matrix why should i keep those zeros i may have matrices with the majority of zeros or even more than the majority of zeros and all the manipulations with zeros should not really take place because something plus a zero remains something and something times zero is zero so i don't really need to to take that into account that's essentially the idea of of sparse matrices okay so we we only we only do take into account the elements that are non-zero so example in this matrix here we have roughly 50 percent of of zeros and there are two essentially two rough classifications of how we take those uh how we store those matrices because now this object this matrix cannot be stored as a matrix per se because i need to know where my non-zeros are so our sparse matrix is essentially an object that contains the location of the non-zeros and the question is how are the locations ordered intuitively you might say well you know i have a an i and a j so row and column index and then the value computationally it's much easier if you have a slightly more complex approach in the sense of column and row column indices and row pointers but that's not really important if there are questions later on i i'd be happy to discuss on that but it's just slightly more complex than triple representation but hell much faster if you work with huge matrices good so the spam if you if you load the package spam we get many handy functions that construct the sparse matrices directly we will encounter a few of them to say spam random we will see the mle we've already encountered the covariance functions we've already encountered so under the hood you virtually do not need to take into account that you work with sparse matrices so the important part is never pass via a full matrix once you've constructed directed sparse matrix then everything else works up to certain certain particularities but as it should so you have your sparse matrix you can add two sparse matrices as if nothing happened you can calculate the chelasticity composition of a sparse matrix and so forth because we have sparse matrices we have also particular tools well suited for sparse matrices so chelasticity compositions then adapted for these sparse matrices i'm not going to comment on that much okay so let's let's try to work on on on this and i presume in the interest of time we're gonna not distribute you on the waiting rooms we're gonna just stay here in the main room and we distribute then in the waiting room for waiting room breaker rooms for for the next problem because here this exercise two is very very similar to task one except that we will now work with sparse matrices so instead of artist we have to work with nearest this so what i suppose what i propose is that please take the code jump that you had above and try to use nearest this instead of artist but check the output of nearest this and take into account that i may have to change some arguments here then after say four or five minutes i will comment on on task two can i please ask something yes give me a second please so this part actually is after you already have the sparse matrix does not contain the process of making a matrix sparse so reducing the dimensionality of the matrix so setting some correlations towards zero and so i think i understood your question here yes now so you have to be careful with when you talk about this dimensionality because it's a little bit in the biggest term here so we let's say i have n observation points so my matrix would be n by n okay this covariance matrix has to be positive definite such that i can work with i can force certain elements to zero with a certain technique we see that after in the second part but this is not really a dimensionality reduction because the matrix is still full rank has to be full rank and we work here with nearest this so for the moment we specify which distances should be considered and which none and we choose this distance equal or at least larger than the range of the covariance function okay thank you so i have uploaded the solutions again on the web page they should appear there yes task three is uh it's task three did i copy the wrong no did i not it's fine i just it's task two sorry okay okay everything fine you've just tested if i look at the chat here correct yes okay let's get back um let me quickly discuss and show you what i what the goal here was i have uploaded the html and the r markdown file already on the web page so you can download it from there or just follow here on on the screen and here on the screen shared with you shared screen is the html file and i'm highlighting here the line that was artist before and now we use nearest this nearest this only calculates the distances shorter than delta delta here sets 2.6 uh i need to set upper equal to zero if i would really like to have the lower and upper triangular distances if you look at the disk matrix from from base r it does the same so they're also you the the standard call just gives you and half of it due to symmetry of course the the other half is is determined and then everything else is as before you don't have to specify anything because the covariance function takes into account that we have a sparse matrix the mle takes into account everything works like like a charm then of course you get pretty much the same the same result here as before pretty much depending on the set seat i didn't use exactly the same set seat before as here so that's why there are some numerical differences and of course here we've generated the y vector the response vector the one that we've discussed before with a range of 0.5 okay so here with this range if i work now with very very sparse matrix that for the estimation allows an upper limit of 0.3 my estimation will give me as an optimum this maximum so so to speak i've estimated under a situation that's not sufficiently flexible so the model that i've estimated with does not really entail um the model for the data so so to speak i have some sort of specification here but um maybe i know that already beforehand and then i wouldn't have to estimate the range here if i know that my specified range will anyway anyway be too short too small then i can directly just set it to to the to the maximum possible number and then i would really gain on on speed if i don't have to to call that so that would be this fourth part you know can you comment on on the speed well if i know that certain parameters will be set on on on the bound then i don't have to estimate those questions about this part of the of the tutorial then i propose just a a couple minutes so three three four minutes of the break um i will be be right back then on on the screen can i actually ask a question oh sure yes so i was thinking about it um why didn't we um when doing this exercise why didn't we turn our original loax matrix into a sparse matrix why did we wait until we had the sigma matrix very good question because i did not want to get that feeling into i construct a normal matrix and i transform that into a sparse matrix because it typically takes way too much time to construct a normal matrix and then to construct sparse matrix there are several situations where you could not get a sparse a full matrix but you can get the sparse matrix okay thank you i also have a question if you don't mind i was trying to replicate the same exercise with larger data points right and then i get this message about the the covariance matrix being singular in the sense that they cannot it cannot compute the Cholesky factorization so i was just wondering this with these tools how how how do we specify the nugget for the so if everything has been set up independent of it should be independent of the data size so if you have the proper covariance matrix with the proper estimation part then everything should be fine all you need to make sure is that here for that the lower you really have some some lower bounds such that it doesn't really bounce to negative values not having a positive definite covariance matrix may result in co-located observation points so where two observation points really are on top of each other yep or maybe at your delta your delta upper delta here is not really in sync with the delta that you have constructed covariance matrix all right or the distance matrix sorry the distance matrix because that's that needs to be that needs to be in sync okay okay thank you okay so i think i've said three minutes ago that we have a three minute break so let me let me continue here and the next step would be a couple slides here on the pdf that are now addressing this issue of how do i then actually get this sparse covariance matrix from a modeling perspective i do acknowledge that there are many many other tools and many many other approaches to deal with large spatial datasets so what we're discussing here in this tutorial is essentially just the first part here sparse covariance matrices and to some degree also sparse precision matrices but of course there are many many other approaches and i've given you here at the bottom the link that pretty much lists i think almost all of them here with advantages disadvantages examples and and so forth so this is just the the reassurance that the we're going to just zoom in on one one approach but we do need to get sparse covariance matrix and here we have again our green covariance matrix the curve here one that asymptotically decays to zero if it asymptotically decays to zero then it doesn't have any zeros explicit zeros and it's now possible to just truncate those to zero if i truncate those to zero then i may end up with what you've seen before with a covariance matrix which is not positive definite so i cannot just set four zeros in my covariance matrix i have to do that um i'm very carefully how to do that there are two approaches so on one hand i have a covariance functions function that has a compact support like quenland one or like the spherical one or i'll impose a compact support artificially through so-called tapering here are both ideas schematically sketch so for the first one typically if i have a compact support covariance function like the green one here in venlon in practice this compact support is nevertheless too large to end up with sparse matrices that i can work well with typically i need to further reduce this compact support and i could then just model with my red covariance function here which is a direct mis-specification i know that this would not be the optimal covariance function but this covariance function is better than no covariance function or any other covariance function with a smaller range but at the price that i can now calculate my my likelihood direct mis-specification on the right we have so-called tapering what i do is i'll have my green original covariance matrix and i multiply that direct multiplication with a correlation matrix that has a compact support so here in blue venlon and then the red one would be the product return times venlon when we do have a compact support this is not exactly like truncation as illustrated with the images but it guarantees that we have a positive definite covariance function and it has quite a few advantages because tapering essentially honors the true model so we have an underlying model that we believe is true and the tapering is just a tool for calculation to generate sparse matrices so purely a pragmatic approach and of course this year i need to taper next year i get a huge better laptop i don't need to taper prediction is good with the same model so there is very very little loss i do get biased estimates that's that's for sure so if you're interested in in actual estimate values then tapering is not ideal the direct mis-specification has similar advantages similar disadvantages it is a to a certain degree a little bit easier to sell because you don't have this multiplication but you do not honor the model and and not as flexible as tapering with respect to all the families of covariance functions that you could that you could use but prediction is similarly good the estimates are similarly biased so this is what i wanted to say with respect to sparse covariance matrices so let me put that aside and let's go back to to our part here okay so i have a couple points that are relevant here okay so we're now in a setting where i have a sparse covariance matrix so covariance matrix that is huge several thousands of elements but sparse and in practice highly sparse so maybe 10 percent or even less data points that are non-zero and this is typically enough if you think in terms of well i use only nearby information to predict nearby information maybe 30 50 100 data points nearby okay well 100 data points but you have tens of thousands of observations then your matrix is getting very very sparse with this sparse matrix we now would like to calculate likelihood or this loop and for that we need to calculate a coalescady composition here is really the advantage the advantage if i have a sparse covariance matrix a coalescady composition is a computationally very fast and be memory wise very very efficient so here is the entire game here is the secret so to speak calculating a coalescady composition consists essentially of of three different steps we cannot really go in all the details here but important to realize the underlying idea that first of all i may reorder my notes to better take into account the sparsity then for the factorization one step the next step is the so-called symbolic factorization so i essentially calculate what the result will be how does the structure of this coalescady factor look like and then in a second step i calculate the actual numbers to fill in this matrix this has the advantage that if we do have a already calculated covariance matrix and we just change the numbers then i don't need to refactorize everything so if i have here a random uh a very sparse covariance matrix and then the one on the one hand i could just update my structure because i just added a diagonal part similarly to solve a linear system i can just pass my structure as before and then everything is fine and now this has a big big advantage think in terms of our optum call optum call evaluates many times the likelihood okay and for all of those evaluations i only need to do the numeric factorization reordering and symbolic factorization does not have to be done a couple words here just for spam 64 um and let me check on the time roman should keep track here should we should we yeah okay now i'm gonna i'm gonna i'm gonna skip this this part here if time permits at the end i'm gonna talk about that um for the tutorial it's not really necessary here so this is what i've showed before this is the tapering no problem whatsoever here we've already discussed most of that and let me just one part of the tutorial it has a star so it's not really it's a little bit more advanced okay is to use optum parallel helping to speed up the the optum call here and what i've done here is the same lines pretty much as above and here i've taken out my negative two low likelihood function as we've seen above and i would like to calculate my covariance with van land and my distance matrix is called h and now with this these few lines what i'm doing is i'm gonna call optum parallel so this is a wrapper for optum but that really uses a cluster capability of your of your laptop and then it launches it and and gets optimization the advantage also is that it gives us information about the optimization and you just as an illustration i have the likelihood plotted and maybe we don't see much here so i'm gonna just plot it again in a different fashion and we see the banana shape here of the likelihood range still very typical the optimization charms back and forth a couple times to then finally end up here at the optimum this the individual steps of this optimization procedure can be extracted by optum parallel so even if you don't use the optimization the parallel optimization you still see a little bit of what's going on where does where do all those evaluations go and that may be helpful in optimizing your your procedure so i propose that we're gonna join exercise free and task free jointly together so we're gonna not reconvene here we're gonna just go out in in one breakout room and remain there and we will have six breakout rooms so two we will circulate among those task is here artificial data that i've generated and try to estimate the parameters this is a large data set so maybe look whether you would like to evaluate wd or wd one or wd two so a careful eda and then do a prediction those that are advanced may estimate or may use the optum parallel part and i'm gonna simply limit that here and then the second part of the problem would be with this data set provided with spam so here we have a real quite a large i will probably call that a huge data set of spatial data here eight thousand data points essentially so you really need to be careful when you estimate okay and we would like you to predict it on a fine grid we give you the code to construct the fine grid here that would be that would be the task in case you would like to take a shortcut estimate the parameters it's not necessary you can download the estimated parameters from the web page i will send out broadcast messages to the breakout rooms if we've uploaded the additional stuff or if there are further comments otherwise i'll ask roman to launch the breakout rooms and we will then circulate and visit you and give some hints and feedback the time is is running up always way too fast so the last five minutes i would just like to wrap up here with one or two elements on the slides if you need more details there are some additional links on the slides otherwise please just just write an email i will also consult the slack channel for the next couple days so if there is something we can also add there so we've seen estimation prediction simulation not at all the last one but essentially all boils down to solving a huge linear system this huge linear system has to be in a way such that there are many many zeros in it so how to get to that well one approach is directly specification just a tape range which is very very small or a tapering approach how to get the parameters of your spatial data evaluating a likelihood is very costly you need to be care you need to make careful choices of what you would like to evaluate how often you would like to evaluate quite often i'm very happy with setting a few parameters and and doing a grid search on some orders or also reduce the estimation or optimization precision with the control arguments of optin for the prediction we haven't really touched much about the mean part but the mean part for large spatial data sets you could really estimate with an ordinary least squares approach i mean that's an unbiased estimate variance is slightly biased but with such huge data sets you're probably not really really interested in in such a high precision estimate of the variance otherwise you could again sub-sample and and get variance estimates there for prediction you need to think about do i have a thinned data set like what we've provided here or do you have a data set where you have huge holes i have provided one in in the rmd file so you'll see there the effect of predicting into holes the further out you go from your nearest observation the closer you will get to to the mean of course there was no dependency carrying over quite often i just assess prediction uncertainty by assimilation because that's faster compared to a full-fledged evaluation of the conditional variance expression we haven't really talked about good practice with simulation here you need to be very careful about numerical issues if you work with huge spatial matrices you know lots of machine precision can add up to render your matrices not symmetric anymore and for that you may have to prune your matrices or to symmetrize your matrices ideas are also the full-fledged script here what we've done is we've essentially used a manual approach to our spatial modeling when should i use a manual approach well with the manual approach i have a full-fledged handle on my likelihood i really can specify my likelihood and everything works out as it should work out this manual approach i can handle data sets for which many other tools are not made for cannot be used for and additionally because of this manual approach so to speak i have a full control over all the parameters i just recently had a package in my hand that has done some some simple creaking but it was nested in a couple functions after four function iterations i still didn't know which covariance model approach used here because of this purity because we really need to be careful about memory about allocation and so forth it's way more transparent what we're doing of course in certain settings our approach may not be really useful in case i really need precise estimates or if i cannot use any approximation tools then of course maybe i cannot just use our tapering or misspecification approach when we work on the mean instead of the covariance structure then maybe there are other tools that are better and here i want to do a say with that you know way sometimes i don't have to look at the entire data set to get some answers sub sampling or zooming in to certain sub windows is sufficient i don't need all the all the data points with that i would like to to close thanks for your attention it was a pleasure talking to you seeing you in the breakout rooms i hope you you could profit from from the tutorial and other thanks to roman and federico for helping me set up everything and also for helping here today for rita and the suric user group for giving me the opportunity to talk to you and for the support thanks a lot and enjoy the rest of the day of the evening and enjoy the rest of the conference thanks