 about the spatial data and how to handle them. So, you know, for a few of the talk, I will talk a little bit, quite a bit about how maps are plotted, about the question, what straight lines mean when you're in the Earth's sphere. I will say something about handling large spatial datasets with our, say, a little bit about a life cycle of our spatial package and say something about our spatial community and how we interact with our communities. Here is a link to the slides that I will try to copy that in the chat to everyone. I don't think that is an option to share that with everyone, but to the panelists and maybe they could forward it to somewhere else. From there, we will find basically the slides themselves and also the R-markdown that was used and the R-code that was used to generate most of the pictures, although now and then I cheated here. Yeah, so how do we plot maps? This is an interesting question. The problem is that unless you are actually in the business of creating globes, then basically everything you do is a two-dimensional plot of things, of Earthbound data, and that involves projection. The Earth is round, it's a sphere, and anything that you look at, even the image you're looking at, the image you look at and a little animation on the bottom left is a projection and that's a 2D thing of something that is not 2D, so that is a problem. So you quickly run into this problem. So I had an example here of a fairly arbitrary, but well-known and respected data scientist active on Twitter trying to make a map and running into the problem of having spatial data and having to make a map and sort of the problem like, okay, what am I going to do here? There was a nice thread in the follow-up advice and so on that was very helpful. It was good. Anyone doing this basically runs into creating something and then basically you ask what to do and you think, oh, well, there might be a plot command. So let's try this, let's try to plot this map, right, world map, and we plot it, and then we see this. And then you think, oh, I'm done, right, here's a world map and everyone recognizes this as a world map. Of course, it was a couple of issues with it. I mean, besides that it is flat. The thing I learned in primary school actually, looking at world maps, is that this island here, the island of Guinea, is kind of half the size of Greenland, right? And if that is not the case, then something is wrong with that map, right? So you see that there's a strong distortion here and that kind of, of course, you see it if you look at Antarctica that is the largest continent on this map, which it isn't, right? Everyone knows that. So there are some weird distortions going on and essentially where this came from, why we do it this way, I tried to dig it up and, you know, I always think that I came with that, came up with that myself, but of course I didn't. So there's this package maps and the package maps is really an ancient R package. It says something like, you know, 2003, the first version, but it's probably older, this is probably where the chronarchives start. And it says based on older work by, I think, Richard Becker and Alan Wilkes or something like that in S plus. So this is really from the old S plus days. And they made maps like this, you would say map world and map USA, you've got this map and it had an argument projection and it had possibilities to use with a map approach library to do projections. But it said this, the default is to use a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the picture, right? And that is what we see here. Basically, latitude and longitude are basically mapped to x and y. And so on the equator, we have one kilometer east is one kilometer north and off the equator, this is not true, right? Things are stretched, a really stretched kind of in east-west direction. And so you see that the United States, for instance, has a very different, very much more elongated shape here than here because for this map, again, the aspect ratio chosen such that in the center of the map, one, well, it will be one mile, one mile north will be one mile east, right? So we have that at least some kind of, you know, useful scale for limited areas and for the shape we have to see whether this is, you know, this is a good way of displaying things. In any case, this is where it came from and this is now what we do. And it's just one of the many things you can do. There is a very nice XKCD where under Munro comments on this particular projection, which we call Padka Ray, you think this one is fine. You like how x and y map to latitude and longitude. The other projections over complicate things. You want me to stop asking about maps so you can enjoy dinner. And this was probably, you know, the authors of this maps package 40 years ago and me, like, like almost 20 years ago. I think, oh, yeah, this is easy. This is cool. And so, of course, you can go for a glow, but Munro was not so very positive about that, about that either, right? Being sarcastic there. So there is a, you know, a couple of things with projections. The thing is that for small areas, usually not such a big issue because anyway, you have pretty much flat, flat space. You don't, you don't notice that the earth is round and projections cannot preserve distances. Yeah. So you lose that in any case, and then it can preserve something. They can either preserve area or shape or directions or some compromise of these things. Yeah. And then other aspect is that there are so many projections are even, you know, nowadays people who create new ones. There's also no need for the north to be upward, for Europe to be in the middle, right? These are arbitrary choices, but doing, you know, a random rotation of the earth and then sort of an unfolding into a projection is often hard to read. It's a nice exercise, but it's not, it's not easy to communicate these things like that. So here are a couple of alternatives. This is the one that we looked at, which is the default you get from SF, from SP and from GMSF and GTplot2. This is also one that everyone knows. It's the Web Mercator, and it's the one used in leaflet and MapView. So any, any kind of web interface that uses these tile backgrounds, Google Maps has this as one of the options, and that's basically where it came from. And you see there's much more distortion here. If you look at the relative sizes of this island in Greenland, but if you zoom in and you look at local areas, it's pretty good because it preserves shapes, right? It does not, not like here that, that everything becomes, becomes very flattened. So, so for the purpose of, of being, doing local analysis, Web Mercator is not even such a bad idea. But if we're, for creating global, for creating political maps, for instance, it is, it is of course terrible because Greenland looks like the largest thing in Antarctica. You can't even do it. It kind of disappears because it gets too large. So this doesn't, this doesn't work for global maps. Alternatives are there, for instance, equal Earth projection and Eckhart-4, which is used by T-Map and the T-Map package. And these are sort of very good alternatives that in any case that preserve shape, as you can see here, if you, if you add Tissot's Indicatrices, which are basically circles that have an equal shape and size if, when you would, when you would draw them on the globe, right, and, and, and they are basically here projected. And then you can see the deformation. You can see that they get elongated on the, in, on the Plotka ray and that they get, that they keep the same shape but get much, much, much larger when you look at Web Mercator. Every equal Earth, you see that they get different shape, they get different shapes, but they remain the same area on equal Earth and pretty much also on the, on the Eckhart-4. So these, these would be good alternatives. Looking at, at, at more regional maps, you know, we don't always make global maps, but, but if projections are a problem and we would do this for large regions, for instance, for continent, then of course the most extreme example is, would be Antarctica, right? If we plot it in equidistant cylindrical, we, we get something like this, where we have this entire line, which is essentially the, the South Pole, right? Anything else, Lombard equal area or orthographic, which is basically the globe view, but then centered on, on, on Antarctica would, would give something like this and for, for North America, we see something like this, where, where Greenland and again is incredibly exaggerated and on Lombard, Lombard equal area or orthographic projections looks actually much, much nicer in the sense of that the area proportions are, are more realistic. So they, they give much more better views on this. So then the question comes like, like what is a straight line? How do we, how do we deal with straight lines in, in spatial data? And the thing is that for simple features, which is basically the way we, we nowadays handle spatial data, like point lines and polygons, in any case lines and polygons, is that, that we, we handle, we, we say things are feature, features means is an abstraction of a real world phenomena that can be anything, like a house or a parcel or a country or a region. And it has in any case, geometrical properties, and that for simple features, the simple really means that we describe the geometric attributes as piecewise, piecewise by straight lines or planar interpolation between sets of points. So we have a line, we have a curve essentially that we approximate by piecewise by, by sections of straight lines, right? Otherwise we can't handle it. That is essentially what it, what it comes back to. So why is this such a big deal? Well, it's a big deal because straight lines after reprojection are no longer straight lines. So do you have to ask, you have to wonder where, in which projection are they straight? Because if you do something else, they're no longer straight. And then there's, for instance, the GeoJSON IETF standard, which prescribes how GeoJSON, which is a, which is a JSON format popular by web developers, should handle spatial data. They say a line between the two positions is a straight Cartesian line, the shortest line between those two points in the coordinate reference system and in the next line or in the next section or something like that, they say the coordinate refer system is the geographic coordinate reference systems, meaning degrees latitude-longitude using WGS84 is the datum. So, so these are latitude-longitude degrees, but assuming straight Cartesian lines, essentially in this space, right, in the space of, of this projection. And that is, that is an interesting finding. And the question is whether, for instance, GeoJSON users realized it because, of course, you often, you know, have a database and you confer data, you take it out, you use basically something as an intermediate format and then push it into some web something, some web application. And then, you know, that is the question, whether everyone realizes that the standard basically assumes that. So looking at the few, at here a very contrived example, if we have a straight line between two points on the article, on this particular projection, and we project that line, then that line should actually be like this. So it basically is more than a half circle around the south pole, right, which looks like a straight line here and looks like a curved line here. And then the other, the other way around, if we have a straight line on this Lombardical area projection, or if we have a great circle distance, basically the shortest distance over the sphere between these two points, we would project it back to here, it looked like this. And you see that it crosses the anti meridian, so it goes basically from one half to the other half and continues here. So how do we do this kind of actions, right, if we would just say we take these two points and we say it's a line, yeah, and we project that line, we say we basically project these two points, and we get this line out, right, which is an entirely different thing, right, we want to get this one out. So we do that by adding points on that straight line, yeah, and basically assuming short straight lines between these points and then transforming all these points, and then we have a new straight, we have a curved line here, which is a sequence of small sort of, you know, there should be curved, but if we take them for straight, then this still works out, right, any other way around, we basically note things here, and we get something out there. Yeah, so we can always add notes, and we can also we can also remove them, we can simplify, we could simplify this one to, you know, to these two end points, but then if we sort of transform that again, then of course, that leads to confusion, you don't end up with the line that you basically have in mind, it's a model projection. And so these are things to do to be aware of to take care. One of the one of the bigger things that I've been involved with in our in our spatial for the last one and a half year is instead of spherical geometry, right. And so since about a month ago or so, we have some packets SF, which is used by a lot of other packages to basically to represent spatial data in this point lines of polygons or factor data with current reference systems with with coordinates. Essentially, we have now the case that if these data are represented by ellipsoidal coordinates, so expressed as longitude, latitude degrees, we use spherical geometry. So we basically assume that they are on a sphere rather than in a flat plane, right. And that sounds like a crazy idea. Like, why wouldn't you do that, right? The same people would say, obviously, you would do that. Well, the thing is that like for 50 or more years, we have not done that. We have basically assumed that these two live in the Plotka Ray space of a flat like like do you Jason just assumes that writes that literally on paper that we should assume it's like that. No, we don't do that anymore. And now a lot of things actually run much better in the sense that we can do buffering, we can do geometric predicates on the sphere, we can do distances on the sphere and so on. So you don't have to by doing that, you essentially don't have to worry more anymore about going to a particular projection and that choosing this projection has an effect on what you do. They just do things on the sphere. Of course, ideally you would do things on an ellipsoid, which is the even a better approximation of the sphere, but the difference between a sphere and ellipsoid is really, really very small and sort of un-comparable to the difference between something that is flat and ellipsoid. So one could go back to the sort of the pre pre SF one zero behavior by setting a couple of flags and you get the old you get the old behavior. And there's more discussion on this issue actually in the upcoming book on spatial data science that Roger Bevin and I are finishing up. And you also need to sort of look at the work that Dewey Dunnington did. Dewey Dunnington mostly wrote the S2 package, which is basically the underlying the SF package for doing all the spherical geometrical operations. So we have now two engines, basically one spherical engine, one flat engine, depending on when you have un-projected data, ellipsoidal coordinates or projected data, which are done handled by the sort of the flat space geometrical libraries. Um, right, um, another issue that that comes up that is sort of worth discussing is that of handling large spatial data sets with R. So there is, you know, in general sort of handling large data sets with R is an interesting topic and different, you know, different groups or lines of thought have have done that in different ways. If you look at tidy first, it is much more sort of an interface to databases where the data might live. Yeah, your data might be in a big query SQL Google database and you basically write your code in tidy first and have an interface to the database that carries out the hard work on the large data sets. And basically you look at the reduced results of operations on that with spatial data. It is a little bit different in the sense that a lot of spatial data doesn't live in in databases, doesn't live in tables or the tables don't don't work so really well, although, for instance, for factor data, you could use Google BigQuery GIS, which is which isn't which is a way to do that or other spatial databases that are that are there to to do similar things. And there have been reports of successfully doing that actually with the DB player interfaces, which is which is very interesting. So you can essentially have the case that all your data can can be held in memory. Yeah, I always sort of buy laptops with the maximum amount of memory that I can afford, or my Institute can afford so I can do experiments with like, you know, up to whatever 48 gigabytes of RAM or something like that. Yeah, so packet the most spatial packages basically hold elements in memory. And some of them go further and sort of say, OK, no, I assume my my data is on local storage is on hard drive. And they will not load everything in memory. So roster and Terra and to some extent stars also are packages that that work with the roster data mostly. So image data. And those are mostly larger and basically allow you to make expressions. And then if you want to compute things, it's going to go through all these imagery and so without sort of trying to load everything in memory, because that will not work anyway. And then there's a third category and that that is basically the category where you have data that are not, you know, that you're not going to download, right? So so there's a lot of data now available for free, like weather data, there's error five, whether reanalysis data, the climate modeling intercomparison program, CMAP six and the nuanced amount of Earth observation data that is all in principle for free. And you can usually download sections of those, but you're not going to download everything simply because the network is not going to allow you. We think we have fast networks. But if the problem is suddenly that you have to download three petabytes, petabytes, even if you had the local storage to, you know, to hold that, it's going to to take, you know, years or so to to move this data. And so this is really a thing where where the network bandwidth hasn't holed up with with the data with the volumes of the data that we collect. And nevertheless, all these data are very relevant for, you know, for questions related to sustainability and for the effect of climate impact or weather extremes and so on, or for emergencies, satellite data. So this is stated that we would like to be able to use much easier than we can. A platform that can do that is for instance Google Earth Engine and our couple of other platforms that in the similar fashions allow you to work with large data sets in the cloud through interfaces that are user friendly. But it makes it very hard to reproduce analysis independently to scrutinize your computations and also to basically do run your own R scripts or your own R time series model or something like that. There is a couple of couple of projects open your own open platform funded by the European Commission and now the second one by the European Space Agency that are part of a larger initiative for allowing reproducible any open source and vendor independent computing a large cloud based data archives. And we are also involving R there in the sense that we want to be able to basically run R scripts, run R code on the on the pixel on the pixel level of these these data sets. And these projects have all contributed to something that is also of interest that is the stack of spatial temple as a catalog, which is basically formal description of a catalog that allows you to find the imagery. And the questions you would say, you know, finding data, how do I find it? I go to a cloud and I ask for a directory listing. Yeah, but if there are like 50 million images in this directory, right, because it is a document stored and this is not going to work, right? So you and you wait endlessly before you get it in, then you get directly listing and then the directly listing is like 50 gigabytes. And you think what I'm going to do with that, right? So stack is a very lightweight and simple, but in modern approach to basically finding images, finding an image collections. The idea of opening O is basically if we have all these kind of cloud platform cloud storage ways of things, and then we have these software layers on top of them that are that are here is open data cube is a popular recent modern one. All these software layers that have their own interface of allowing you to work with large imagery data. And that this API basically gives you a uniform front end where you can from different clients, quantum GIS or are Python or web interfaces. Basically, you can access any back end through a single client, basically carry out the same analysis, use the same script to run an analysis on this cloud or on that cloud and then see if they give you the same give you the same answer. This is a project that is actually, you know, I was one of the initiators, but it's much larger. It involves, you know, it's like 10 programmers or so taught 10 software engineers and a lot of institutional support of organizations that actually run these clouds and and are trying to make this data in these clouds available to a wider user group. But the nice thing is that these that these organizations actually see the the need and see the benefit of of of using this and are actually while we are developing it are actually using this in production so that it's basically not not like a proof of concept or a prototype or something, but it is something that they that they think might be viable, you know, if it if it if it starts to really start to really work. And right now we are at the stage we're close to probably probably in a couple of months or in one months or so we are we will be able to offer public access to these kind of systems. And then of course you need to think about when people are starting to do massive computations that there is a cost to that, right? Cloud computing is not something that's for free. And the next thing I'm going to talk about is that of the life cycle of spatial packages. Spatial, you know, our packages have a life cycle. Yeah, this is I think it might be something that that they are studio community came up with, like, you know, this is experimental, this is mature, this is retired. And if you are, you know, somewhat familiar with the our spatial community and you follow the Arctic Geo mailing list, not, you know, not everyone these days follows mailing lists, but this isn't mailing lists that has been around for over for for nearly for nearly 20 years. And that Roger Biven has actually has managed all the time, then you can see you would be able to see in his in his email signatures that he is now an emeritus professor, yes, that so that means essentially that Roger retired from from his job. Yeah, and people who retire, they actually deserve to, you know, to enjoy their retirement and to take on, you know, the good things, the good things in life. And that might be, you know, that might be answering questions on our mailing list and so on. But one of the sort of harder things in life, they are allowed here entitled to actually drop that. Yes. So I'm reacting on this announcement of my keynote, at least when the when the abstract also came up. I commented that in my abstract, it says, when our GDAL and our Geos retire in 2024. So that was a bit of an announcement that I made. I coordinate this with with Roger. And actually, we started talking about this five years ago. And our GDAL and our Geos basically formed together with SP, formed sort of the first foundations for for special for our special packages. Our GDAL did the IO, the reading of factor and raster data. And our Geos did all the geometry, of course, in two dimensions. But you know, that basically there with that, you had the components that that would build you that would give you a GIS. And I said, anyone volunteering to take over maintenance should contact Roger. And Roger answered, I'm not sure that taking over maintenance is a sensible use of effort and add map tools to that list. Right. So that is that is a clear sign that there needs to be. So we have been working actually on working hard on on replacements on modern, more modern invocations of the same ideas of SP and our GDAL. And that is basically SF starts and there's another Terra packages that the Terra package from Robert Hammons. That is a replacement for the raster package. Raster uses our GDAL really for for reading and writing and Terra includes includes the GDAL directly links to the GDAL library. So it does not does not use our GDAL, the R package for that. This is the same thing that the ASF package does. And so this is basically a signal, right, that everyone should take serious. Our spatial isn't isn't is really in a very open ecosystem that is that is relatively, relatively complex in the sense that for doing this, as I mentioned already for doing this geometrical operations we do with, you know, intersections or unions or buffers and so on about geometries. But also for reading and writing data and for doing for handling coordinate reference systems and so on. We use a lot of you we use a lot of tools a lot of infrastructure that is essentially used by a much wider community the the open source spatial community, I would I would say. And that is we do that on purpose here, right? You could you could use you could write your own projection library. There is that I mentioned earlier there is this map project library. And there was another library that had projections in it. The thing that you can do it of course, some projections are simple and are, you know, you can write a five liner or 10 liner R package. It does it. The thing is that does it keep up with all the other kind of the other changes of the world. And it is very convenient that the sort of all people working with open source spatial open source software for spatial data essentially look at the limited number of of libraries and focus their effort in making them good and agreeing on what they do and what they should do and how things can be improved. And that is basically depicted in this image, which just sketches the dependencies of the SF package and, you know, for the Terra package this would be similar that would also link to Geos and to project GDAL. And so those days these three GDAL for for IO, for reading, writing, vector and raster data approach for handling coordinate reference systems and for computing transformation. So doing projections but also data transformations which is one, you know, one level more complicated that's basically going from one model of an ellipsoid of the earth to another model of the ellipsoid which is an approximate which in the approximate operation as opposed to projection which is basically mathematical formula. The project in the Geos library which does two dimensional geometry those are all the main sort of the main workhorse. If you would use PoJS or quantum GIS or if you would use a Python package that does anything spatial like pandas or by porch or raster IO all these Python packages use the same exactly the same libraries and so they all look at you know the same mailing lists and communicate with the same set of developers that work on that. Recently the Geos project has actually secured a lot of structural funding through I think through NUM focus but also the organization that also does NUM-PAI and they managed to secure folks because there is so much infrastructure in this world also all these cloud platforms like Earth Engine and the Microsoft variety or so all basically lean on using GDAL for reading and writing cloud optimized DOTIFs and so on. Nobody is going to duplicate those efforts. Other libraries are NET-CDF for array data and UD units for handling units and S2 geometry which is basically the spherical geometry library that is a contribution an open source contribution by Google so that essentially powers also the geometry engine behind Google Maps Google Earth, Google Earth Engine and so on and Google BigQuery GIS. GDAL is a very complicated dependency in the sense that it's like a meta library that uses something like hundreds other libraries for actually reading things there's LTIV and LTIV and so on and SQLite 3 for reading and writing geo packages, et cetera. So this is a complicated thing also for package managers if you direct the link to that and there is very valuable work from, for instance, Simon Urbanek realizing this on the OSX binaries and also Brian Ripley helps a lot in looking at versions of these new versions of these libraries and Jeroen Owens who also does a great work on the windbuilder thing so making it easier for packages to basically link to a very complicated dependencies like GDAL. So that essentially brings me to the end of my talk to the conclusions summarizing up many data scientists will run someday into challenges with spatial data one of the earliest challenges is then that of projections how to deal with that. Our spatial is an open and friendly community of people using our package ecosystem for handling and analyzing spatial data and there are a number of people in the R community that I just mentioned but in particular also the Cron team who have been very much instrumental in making this thing succeed that has taken a lot of effort from them and still takes a lot of effort to get these packages all the time running with new versions of everything. And we are successful I think because we use and we interface a lot of software that is used by a much larger community so we basically reuse it we can talk about the same thing. A large part of that is the OSDO Foundation the open source to your spatial foundation and we are trying now with our spatial to become a community project with a number of key spatial packages become a community project in the OSDO organization so we have closer we develop closer contact with them. Robin Lovelace has been very instrumental in setting that up and we are also having an R spatial panel session at the Phosphor G the free and open source software for Geospatial World Conference which is held this fall in online in Argentina. As I mentioned the SF package which is sort of the new central sort of holder reader writer of geometrical of vector data now uses spherical geometry which is a new thing. So we need to think about straight lines they may need noting at some stage we may want to automate that at some point or not we have to figure that out. You can do simplifying but only after projecting in your target projection and we may want to automate this noting as I said at some stage. As I implied already a little bit we I think we should really reconsider the way we do the way we plot data now if data is unprojected if it is in latitude longitude degrees we still choose some projection we choose plot carrier which is a bad thing I think. So we should get rid of it and do other things and also for smaller regions to different things probably an orthographic as the S2 plot library by Dewey Dunnington already does. After all strengths and factors is also no longer true that took 25 years but it's never too late to reconsider and anyway the spherical geometry was a big step and I think a large improvement. Analyzing large spatial datasets is and will remain a challenge because there is the whole cloud sort of administrative thing involved and there is datasets that all the time become larger. And we have this retirement not only of Roger Biven but also of RGDAL and RGDAL packages that will happen in 2024 and it has strong consequences. We have been working on good alternatives they're there and these three packages other might be others and users users and developers will have to migrate to these new packages. And there are quite a there's a large number of packages at this moment still depending on RGDAL or RGDAL. So there's a lot of work to do that we will be happy to help with. So that was that brings me to the end of my talk. Thank you Edza for the very interesting keynote taking us through projections, large datasets and life cycle of spatial packages. We have a lot of interesting questions coming up and I would encourage attendees to keep posting their questions in Q&A or voting any questions that you might want asked. And probably the first to ask a question that I have you have given an interesting discuss on projections. I wonder whether the R spatial community do take great care in choosing what projection to use when they are doing a specific type of analysis. Maybe these analysis is good for this projection and analysis be as good for this projection. And if not, why that might be not the case of taking into consideration that I should be using a type of projection for my R spatial analysis. Yeah, that is a good question, Peter. I'm also not the projection expert. I just have been sort of hiding all the time and saying, well, we do this because we always did that, right? And that is basically the case now. So I think the case we have now the situation for default projections, plots of defaults for unprojected data is very unlucky and that we can improve there. And I think anything that's equal area is there a much better idea than what we do now which is very non-equal area because a lot of larger area plots, even if you are doing global predictions of both grounds biomass or something like that or maps of forest coverage or something like that, equal area is always better because it represents equal areas, areas is equal. So you're not sort of blowing up one part of the world where it looks that you have very low predictions and decrease in other parts where you have very large. So I think even if it's not about political data, equal area is much better. If even it's landmiles or ocean coverage or something like that, it is just a much better idea. Thank you. And I'll go direct to the Q&A. There is a question that has been upvoted here by many attendees. And the question is, learning R-SPASHU is a huge challenge for most students. And my co is asking, could you suggest two to three skills that are most important to R-SPASHU analysis that are later in your career or job? Yeah, so let me think about that. I think that understanding geometries and what you can do with them is a very useful approach. So we do that in the introductory chapters of the upcoming book, basically thinking of measures, like area, length, distance between objects, and what does it mean? What does a polygon mean or a set of polygons? What does it mean? What's a hole in a polygon and how they represent that representation is not so very important, but kind of what are the implications and then how do two geometries relate, right? What are the possible relations? Do they touch? Do they overlap? What does intersect mean? Are they disjoint? Which words do you use for these kind of concepts? And the next step is kind of how do you use these things in analysis, right? How are we going to use that in analysis? I think those skills are useful. That is sort of one angle. And the other angle is obviously raster analysis. So handling raster data in a sensible way and doing operations on that. Thanks for the insights. More questions are coming. A very important question here. What is the relationship between stars and Terra now and in the future since what seem replacements for raster? Yeah, they're not entirely replacements for raster. So there is, I think that Terra is really meant, is written as a replacement for raster because it has the same author and he also moves parts of the code base from raster to Terra for obvious reasons. And stars has a little bit of a different, a different idea. Yeah, that is more the idea of we have array data. And array data is a somewhat more generic concept than raster data because we could also have, like if we have time series of polymers or time series associated with points, right? You can't put them in raster data, but you can put them in stars objects. So we basically have a spatial dimension and a temporal dimension. And then how are you going to do that? Are you going to put like columns next to each other, the white form or the long form? Both is very inconvenient. The logical is basically an array where you have one dimension time and one dimension spatial features. And so the stars object, the stars model is more that of arrays and high dimensional arrays or more than three dimensional arrays. So if you have like a time series of multi-spectral images, then you have a time, you have the x and y dimension of your images of your layers, right? And you have a spectral dimension and you have a time dimension. So it's more sort of meant to do those kinds of things. And the Terra is more directly of aiming at raster stacks, although it also includes now its own classes for vector data. So it's basically a one fits all package, right? And so Robert and I have somewhat different views on that. Obviously, stars probably fits closer to SF and to the sort of the tiny first verbs that I like to work with and to implement and what he does in Terra, closer matches what he did in raster and it's sort of a new and more performant iteration of that. So there's overlaps and there is differences. I cannot say anything about, well, you can look at the number of lines of codes in Terra, it is like five times the number in stars probably. So there's much more, but then, you know, stars reuses a couple of things very cleverly and it is just different. You just need to see what is best for your purpose, I think. And a slight follow up to that question, me is asking in which life cycle stage is the current stars package? Oh, that is a very good question. Yeah, because it has kind of a zero something, zero five version. So I think it is, it can be used for serious work. There is just some things that don't work that easily, right? Some of the things work really good for smaller rasters that they have kept in memory. I think it worked great for larger rasters. A lot of things, you know, for things that are really have to be kept on disk because they're way too large to be handled in memory. So a lot of things work, but not everything works, right? There's a lot of things where the idea is right, but then you run into the implementation and basically it's not always work. But it is sometimes the case with Terra maybe as well. I don't know. So it's a hard, yeah, so it's a, you know, if people run into problems then please respond with issues on GitHub or the mailing list that really works good. So help us progress both things. And a lot of interesting questions keep coming. You mentioned about the special data science book and a lot of people are wondering when it will be released. Right, so it is already, sort of the things that we are writing is, you know, it's slightly late, is already available online and it will remain available online. But we are basically finishing up the first complete text that we are going to submit in the next weeks. So that would be, you know, then it needs to go to review and it needs to go in editing again and it needs to go and print. So print versions will not be there within six months, I expect. It always takes a lot of time. And thanks for reading the book online for people who would like to see the soft copy before the hard copy. Another question is, could you elaborate a bit on the challenge of linking R with QGIS and GEE? Are these very doable? Linking R with quantum GIS, yes, there is, I think there is, there is different, there have been different attempts to do this. One way would be to use GIS and essentially to do R as a processing engine. And the other way would be the other way around where you use R as your client, basically, and then call quantum GIS processes, right? That might involve other GIS software, like whatever, Saga GIS or something like that. So there are a number of packages and I don't know how stable there are. These are, there is this new thing by, again by Dewey Dunnington, I think called Qantum GIS Processes. And I'm myself, I'm not a quantum GIS user. So I try to do everything with R and see where things break down. For GEE, there's also a package called RGE which I think is kind of an R interface to the Python interface to Google Earth Engine. I think it uses reticulate to translate commands, to translate instructions and to obtain objects back. And then I hear people, I hear good stories about it. Yeah, that is very useful. Thanks, I think that that question by Nita is an interesting one given that, like me, I started from ACMAP QGIS and when we tend to migrate to R, we always wonder, do we have something that can link us up instead of having like a baptismal by fire and going direct to the other one having started to the graphical user interfaces kind of softwares? Right, yeah, they are important and there's a number of things that I can imagine that you really want to do with quantum GIS and you want to keep going doing with quantum GIS. It's good software. It is also complicated software, right? It is another, you know, it takes together a lot of things and then combining it with R is a challenge, yeah. Yeah, and Luca is asking, is there any thoughts about let SF works as with data.stable? Yeah, that's a good question. There's one or two issues that some of them might still be open or not on the SF GitHub site and also on the data table side. I heard Tim Appelhans, the author of MAPHU was successful in doing these things. I think you can, in data tables, you can handle SFCs, you can handle geometry list columns, right? So that basically means you can also, you know, work with data frames and have a geometry list column in them and then they're not SF objects but of course they carry on their geometry. You can work with that, right? Or tables or so. So that is similar. The thing where it breaks down is that SF tries to basically take over a number of methods from data table and does it not entirely the way data table likes to do it. So I think there are some conflicts if you wanna sort of work with SF objects that basically are data table objects. I think there are some conflicts there. I'm not entirely sure whether we can resolve that. I also haven't seen people putting much effort in that, right? And I didn't do it. And so that simply has somebody has to sort it out. I have the feeling. I think it can be done, but you know, because it's R, anything can be done. Yeah. Right, this is the question. Who will do it? Yeah. And we have a few minutes to the end of the session. We have about 10 minutes to go. And I see some concerns people asking about access to Slack where the conversation can go on. I know, Roussio will probably mention towards the end on how people can access it and if the email has been sent out. A further question here by Gabriel that has also been wondering about. And he says that I've played with special data in R for a few years, not many, and have found interesting packages of great value that grow on their own. Is it on the roadmap for R's partnership to integrate all of them into a broader ecosystem? Yeah, that is an interesting question. And the question is done, what is an ecosystem, right? And the ecosystem, you have collaboration, but you also have competition. And competition can also be very healthy, right? Can be a very good thing. And of, you know, things in ecosystems grow and have success and are big. And then after some states, they retire or they die or something like that or they get killed or whatever, you know, you couldn't, something happened and somebody found something on Chrome and you never could solve it. And then it falls apart, right? So we are not like, if I think of, you know, if I think of the sort of the RStudio, the tidyverse packages as a successful ecosystem, you have to think that there is a, like whatever 100 software engineers, a large number of people that are very highly skilled that basically have all their, they can put all their effort in doing that, right? And if you look at what are special, who we are and the amount of time that we have for package development, then we are much smaller. So it is actually, you know, it is a miracle that we, I think that we got that we are now where we are. And in that sense, you have to think also about capacity and of course, we, you know, we get occasionally some seed money from Yarkon charging for things, but we don't have like capacity in the sense of software engineers working consistently or constructively on things. Yeah, so of the competition, we already, you know, we, I mentioned the stars and terror thing where there is a certain overlap of things. I think that's good. And the thing is that Robert and I, you know, we work very differently and Robert is really somebody who focuses on software and then makes something that is brilliant and that does everything, right? And, but he's not like, you know, he's not constantly communicating with everyone, how shall we do this? How shall we do this? And that kind of thing, right? That is a different way of doing things. And that creates brilliant products that are just not, you know, that are then also there and there's alternatives. So that makes it maybe easier for users. If we were like clear, if every, all the developers who clearly should do that with that and that with that and that with that and that with that and that's now not the case, right? But I think that is in general a characteristic of the, of our community, of many our communities because a lot of packages are very much individual contributions, right? And you do what, you know, what you burn for and how you think you should do it and then take it or leave it, right? So that is one of the struggles that you have to deal with. I think that it is fairly similar in the Python world where things are even much less coherent, I think, than in the R world. Thanks for the reflection and insights. Also, somebody here is having some deflection and saying it is humbling to be reminded how difficult it is to define a straight line on a sphere. Then he is asking, given all the existing knowledge of geometry and calculus we have, does it faxmate you that sometimes that we are still grappling with the problem of figuring out how to code and represent that in 3D on a sphere? No, I don't, I think I'm very optimistic. I look at it from a different side. So I'm, the reason that I struggle so much in explaining this is it's hard to comprehend first to think on spheres as sort of, you know, unless you're a mathematician being grown up and have studied geometry or something like that. But I think it's the legacy. It's, I think it's the 50 years that we've worked with today, to the, you know, with global data essentially on the two-square flat screen, right? And then we now see, no, of course, Google started 15 years ago not doing that. They didn't sort of look at what the GIS community had done. They just said, oh, we're gonna solve this problem, right? And then they said, oh, well, here we have this library anyone can use. So I think it's a matter of catching up and the difficulties are really coming from the legacy from how we've done it all the time and from these, you know, these things written in the DOJs and standard where you think why on earth, right? Why on earth would somebody write that down that way? Maybe, yeah, maybe that's a good idea. We'll see. We just have to see sort of how things develop. And I think they're getting better. And probably he are taking the last question before I will come to see you to talk us about if the slack and all things are working now. Colin is asking, any thinking around support for this grid, global grid system in R? Yeah, there is basically the S2 library that we are using essentially has a grid index. It has a six-phase cube and then quad trees or sort of space filling curves on these cube phases that is essentially an indexing structure. So it has an index that works effectively on a sphere. Other systems are H3, there are several H3 packages, are packages linking to the JavaScript, linking to the C libraries that are recently. So there was also that do hexagonal grids. There is also a DG grid R package that has now been archived because there were problems in the C++ libraries it used written by Richard Barnes. That is now in archive, but you can still get it from there and install it on your computer. So there are several packages actually doing this for various purposes and with various interfaces. Yeah. Thank you, I'll stop taking the questions at this moment and I'd like to take this opportunity to thank you for taking us through and answering all the questions that we had, those that have not been answered. I hope people can continue on the Slack. And at the moment I would wish to welcome Bruce here to tell us about Slack if people have been able to access and how to go about it. And again, thank you to our sponsors for the day Epsilon and app projected. You can see the upcoming sessions for your information. Over to you, Roussia. Thank you. Thank you. Thank you very much, Etzer for this great talk and Pierre for being such an amazing chair. Just for the people who missed the messages at the beginning, we have sent all of the participants an email invitation to Slack. Please check your spam in case you have not seen that. And if you have missed it anyway for any reason, probably a fault, please communicate, send us an email through Comfort Tool or any email that you should be able to communicate with us. Many people have already entered the new Slack space. So please, it is very similar to what we had at the lounge. So there's one channel per session. And at the beginning, you're not going to see the whole list of channels, but you're going to see, you're going to be able to click on plus and then see the whole list there. So see you there. Thank you for your patience. And yeah, see you soon.