 Hi, all. Welcome to our session for B on database and special applications. We'll have three great talks today. And my name, I'll be the chair. My name is Natalia more and data. I'm an ecologist in Buenos Aires, Argentina. And our soon facilitator is Rachel. Hey, so please remind that this session like all space in the conference are is governed by our code of conduct. And if you want to make questions to the presenters, they can ask, ask, ask it at the Q&A button at the end at the bottom of the screen. And you can vote questions by liking them. And you can also make questions in our Slack channel, which is hash talk database special. So, our first speaker is Sherry, Sherry, sorry, Sherry is a PhD student at Monash University in Australia. She is supervised by Diane Cook, Patricia Menendez, Nicholas Langren and Ursula. And her talk is called Visual Diagnostics for Constraint Optimization with Application to Guided Tours. Thanks, Sherry. Hi, everyone. My name is Sherry, I'm a PhD student from Monash University in Australia. I will be talking about Visual Diagnostic for Constraint Optimization with Application to Guided Tours. This is the work supervised by Diane, Ursula, Nicholas and Patricia. The project is motivated by Ursula and Diane's work that apply Guided Tool to physics problem. This is what they write in the paper. They are also revising adequacies in the tool algorithm optimization that may benefit from newly developed techniques and some software tools. They find that for simulated data, the optimizers often failed at finding the expected maximum or would only get close but not reach the maximum. For noisy index functions, it failed completely. On the right is an example where the optimizers could have finished at a better place but it doesn't. To understand why the optimizers were failing, ideally we need to visualize the space in the path that the optimizer takes through the space. This is an interesting visualization problem because the space is the set of all the dimensional projections of p-dimensional space. Now I will introduce some background information on what is a Guided Tool, what's the optimization problem and what's the optimizers that we have. To understand Guided Tool, we first need to talk about projection pursuit. We denote the data as a matrix of dimensional n times p and projection basis a of dimension p times d that categorizes the direction from which the data get projected. The projection y is the multiplication of x and a. In index function maps the projection from an n times d dimension space to a scalar. And throughout this presentation until the case study in the end, we will be using an index function called host index. I will not show the full formula of the host index here, but we know it is proportional to one minus something that looks like a standardized normal density. On the right, I have calculated the index value of the host index for different one d projections. And we can observe that for those projections that have high density regions, for example, the x2, x3 and the ones on the second row, it has a higher index value. The optimization problem we have is to maximize this index function subject to the orthonormality constraint. To solve this optimization problem, we have three optimizers in all of which are random search algorithms. The creepy random search randomly searches a basis that satisfy the orthonormality constraint and evaluates its index value. A basis is accepted if it has a higher index value and discarded if lower. Simulated annealing follows the same as creepy random search in sampling and accepting better basis. But it has a different design for basis that has lower index value. These bases has a second chance to be accepted with the probability show here. In this equation, i is the index value, t is called temperature in simulated annealing and l is the number of evaluation in the iteration. As l increases, the temperature decreases and the probability for accepting an inferior basis decreases. This is because the t sits in the denominator and there's a negative sign here. That is to say it is less likely to accept a worse basis after more basis has been evaluated. And lastly, we have pseudo derivative. It has a flavor of gradient ascent as a first find a promising direction is then computer step size. We don't use derivative here because it is hard to operate on a matrix. Instead, we randomly sample five directions close to the current and pick the most promising one as the direction and search along a straight line on the sphere to find the best candidate. The pseudo derivative use the same acceptance rule as the creepy random search. This illustration here shows how projection pursuit works with guided tour. The projection pursuit maximizes the index function to iteratively find better basis of projections in the blue frames. In the guided tour, change these projections together through interpolation, the wire frames and produces a smooth animation, which I will be showing the next slide. Here is how the animation looks like. We use histogram to display one deep projections and the data here includes five simulated variables. X2 is a mixed or normal and all the other four are rendered normal. We expect X2 to have a weight close to one and others close to zero because X2 is the only informative variable here. And in this simple example, you can see the optimizers works well to find these X2. But this is not the case for all the problems as we have seen in the literature. So we need some visual tools to help us diagnose where things go wrong. And aside notes here, this will be the data that we use in the later examples. And we're always aiming to find X2 in this data set. So this work leads to an R package called FUR where we create four diagnostic plots, two for exploring the trace and two for exploring the space. There is also a botanical theme for scale the color. And to produce a plot, you need to pipe the data collected from the optimization into one of the explorish explore function and make the color adjustment using the botanical palette. Due to the time limit of this talk, I will go through the two space plot here and later you will have the chance to see the trace plot. This is the first space plot. So at first we're going to talk about how this plot is made and what we can learn from it. We have all the bases evaluated in the optimization. These are all the colored dots in this plot. We also have random bases sampled from the 5D space. These are used to draw the space. We then perform PCA on the two sets of bases and take the two first two PCs to create this plot. In this plot, the star here is the theoretical best basis corresponds to one in X2 and during others. And we have two parts, the creepy random search in brown and the pseudo derivative in green. The creepy random search evaluates random points in the space so you can see dots everywhere. And the pseudo derivative evaluates five directions locally so you can see the dots scattered around in accepted bases. The parts here are the interpolation between the accepted bases in projection pursuit. And this plot tells us that both optimizers find the optimal but the pseudo derivative gets closer. And it also helps us to visually understand how the optimizers work. We can also make an animated version of this plot. Here we can see clearer where each optimizer starts. And additional information we learn here is that the creepy random search find the optimal faster. What I'm now showing you is the same two parts but on the full space. So let's watch this animation together here. The full space is a 5D unit sphere. So also the melody constraint requires the square of all the entries adds up to one. And here we have a frame in the animation that is more similar to the PCA view. What we learn here is that the PCA maximizes the variance. So it gives the most spreadable view while the full space plot allows you to see the same parts from different angles. And similarly we can generate random 2D bases and animated with 2D paths embedded. And here what I do is I generate the 2D basis space and capture some frames from this animation. This time we can see the basis space is no longer a sphere but a torus. Now I'm presenting a case study on how visual diagnostic informs us about optimizing a noisy index. The noisy index we use here is based on the Conmogorov test, which I have the equation here. The plot on the left shows how the index value changes when interpolating between two random frames. What I do is that I generate two random frames to random bases in the 5D space and ask GuidedTool to interpolate 100 frames in between. The index value of all these frames are calculated and arranged by its natural order to make this plot. What we see here is that comparing to the host index, the noisy index has a different range as you can see on the y-axis of the index value and also it is non-smooth. These are challenges to the optimization. On the right are the trace and space plot of the three optimizers that I introduced earlier. In the trace plot, the x-axis is the natural order of time and the y-axis is the index value. The horizontal dash lines is the index value of the theoretical best basis. What we see here is that the first optimizer pseudo-derivative fails to optimize the noisy index while the creepy random search and the simulated annealing are getting close to the theoretical best. From the space plot, we can also see the success of creepy random search and simulated annealing, but they are taking a different approach to find this final basis. The creepy random search extensively evaluates points in the space before accepting a new basis to interpolate. Thus, we see a lot of dots in the space, but the interpolation path is short. While the simulated annealing widely accept new basis and perform interpolation at first and always start to reject points towards the end. Thus, we see a long interpolation path, but few points being evaluated. So in summary, we have developed specialized visual tools to diagnose optimization in projection pursue guided tool. This allows us to understand algorithms that are not available to not not obvious to the end users. For tool developers, this provides tools to test new optimizers and optimization of new indices. And for algorithm developers, this can be seen as a work to enhance the interpretability of black box optimization algorithms. And that's it for my talk, thanks. Thanks Sherry for your talk. It was very interesting work and I think that the guided tool optimization is great for end users. I have a question for you. Are your diagnostic plots accessible for blind persons? And if not, do you think that that can be implemented in the future? You mean the diagnostic for blind person? Yes. Can a blind person user read your plots? I don't think currently it can. But yeah, we can see if that's a possibility for further work. Right. Nice to hear. I also was curious on which type of data, which type of data have you used your guided tools for? Have you applied your diagnostic tools in sampling data? So here in this presentation, the example that I use is a single five variable dataset with X2, I can share the screen again. In here we pick X1, X2, X8, X9, X10. So only X2 is informative because it's not normal, which can be detected by the host index. And we also perform with different other dataset and with this different combination of these variables. Thanks for sharing. And I want to remember that we have a channel in the Slack, which is called Talk Database Special, where you can interact here during the session and also after the session to keep the conversation going. So our next talk is our next presenter is Markus Kainu. He's a senior researcher at the Research Unit of Finnish Social Insurance Institution, KELA. And he has a special interest in the special dynamic of social inclusion and social security. Markus is presenting his talk, the XEOFI package facilitating the access to key special datasets in Finland. Thanks, Markus. Hi, my name is Markus Kainu. And this is my video presentation for the USR 2021 conference. The title of my presentation is GeoFi package facilitating access to key special datasets in Finland. I work as a senior researcher at the Research Unit of Finnish Social Insurance Institution, KELA. KELA is a government agency that provides basic economic security for everyone living in Finland. KELA carries out research to analyze and provide information on the Finnish social security system and the way it functions. Among other topics, we look at the family benefits, pensions, the labor market and unemployment benefits, health insurance, housing benefits and rehabilitation. GeoFi package is a co-operative effort developed and built within the R Open Gov project. R Open Gov is a community of R package developers on open government data and analytics. The network was initiated in 2010 and has since led to many R packages such as Eurostat or PXWeb. Several independent authors have contributed to GitHub and written to the R Open Gov blog. Please check out our projects and come on board. So the question is why world needs GeoFi package? There is a great demand for solutions to present and analyze data spacely and the COVID-19 has boosted it even further. Countries vary in availability of open statistical data. However, many have government agencies providing data on national accounts, labor markets, health or environment. Researchers and analysts are well equipped to utilize statistical data, but geospatial data not so much. Even with GIS software skills, you are often trapped with two separate environments, one for statistical data and one for GIS. GeoFi package is a tool to access geospatial data on municipalities, zip codes, population and statistical grades from Statistics Finland, WFS or Web Feature Service API within R. As we all know, R is great for analysis of statistical data and for interacting with web services, but it has also become a solid GIS software. With a little help from GeoFi package, R becomes a great one-stop shop for geospatial analysis of open government data in Finland. Okay, let's move into features and examples. GeoFi was published in C-RAN in February this year. The current version at the time of recording presentation is 1.0.2. But the package has been in development for many years before this in the form of various GitHub and in-house packages. GeoFi is primarily a WFS client to Statistics Finland API, an API that disseminates official administrative borders and various grid data as open data with annual additions since 2013. Package ships with time series of municipality key files that can be used to aggregate spatial or statistical data to higher-level regional breakdowns. Package also contains a point data with locations of central localities of each municipality from the Topographic Database of National Land Survey of Finland. Finally, we are fans of GeoFacet package and therefore we have included a GeoFacet of regions and GeoFacets for each region at the municipality level. Image on the right shows all the spatial datasets that you can obtain using GeoFi package. On top row we have municipalities. Municipality is aggregated to regions that is one of many possibilities to aggregate and the zip codes. On bottom row we have statistical and population grids and central localities of each municipality. Then let's first have a look at the municipalities. I will walk you through the code in each slide. First, we can install the package from CRUN using install.packages. Once successfully installed, we load the package and obtain the municipality breakdown from year 2019 at the scale of 1-4,500,000. Next, we assign the onboard data on central municipality localities the new object called point. Finally, we create a plot using ggplot2 in the following manner. First, we create the base layer, then we create the first space layer with polygon data on municipality borders and on top of that, we add another space layer with point data on central localities. Finally, we abbreviate the fill label to code. By the way, in this presentation I will only use ggplot data, although there are a number of great alternatives for visualizing spatial data in R. Then we will obtain some attribute data, still at the level of municipalities to be joined with our spatial data. To keep things simple, we pull data from Statistics Finland, PX Web API using the PX Web package. We choose the municipality key figures that contain several indicators from each municipality. Here we get the data and use janitor makley names to get rid of the Scandinavian characters and other troublemakers in the column names. Then we print the top six rows using the head function. In order to match the municipality's spatial data with the new attribute data, we use left join from dplyr. Then we pick a variable share of persons aged over 64 of the population and map that as a fill variable for the only data layer on our ggplot. So that is how you operate at the municipality level to join data and plot maps. Let's then have a look at the aggregates. Here we are just taking the municipality data we pulled from the API, meaning the spatial municipality data, not the attribute statistical data. The get municipality functions function that we used adds the aggregate columns for each municipality. So from those columns we select the ones that end with underscore en to have only the ones with English names. So from year 2019 we have 18 regional breakdowns with English names that we can use to aggregate statistical or spatial data from municipality level upwards. So in this slide we pick two of those regional breakdowns, a rural urban divide and the constituencies of Finland. And then we aggregate the spatial data at those levels. So we are not working on attribute data, the statistics here, but we're just aggregating the spatial data. And here we're using the patchwork package to combine two maps into a single plot. So this was all on spatial data. Next let's take a look at the facets. On this slide we are working at the municipality level and first filtering only those municipalities that belong to Usimar region, which is the southernmost region with capital Helsinki. We use the pull function from dplyr to get the vector of municipality names and then filter the statistical data we obtained while ago to have a data on employment rate only from those municipalities. Then we create a regular ggplot using the facetrap function to make it into a faceted plot with panels arranged in alphabetical order. As already mentioned, we do like the geofacet package and therefore geofacet package contains a grid to rearrange the municipalities closer to how they relate to each other in space. Geofacets are never perfect, but this puts the eastern municipalities to the right side of the plot and makes the neighboring municipalities closer to each other. As we switch to the geofacet version, the panels are rearranged and to some extent match with the geographical reference map at the bottom left. We can see that the Rasepuri and Hanko are here on the left and Mantela and Pukkila are the ones on the top, meaning they are the northernmost municipalities. The package includes vignette with the title Spacial Data Manipulation and Analysis. That demonstrates some basic spatial data manipulation steps using geofeedatas and the magnificent SF package. On this slide, just for the sake of it, we have picked the central localities of four metropolitan area municipalities, created a 15km buffer around them and cut that size of piece from the 5km by 5km statistical grid data. For that data, we have then computed the centroids for each grid cell and then created a Voronoi polygons around those points. Finally, added a random fill to each of those polygons. Finally, let's have a look at geofeed from comparative perspective. I picked a few packages for this slide that I went through. I should encourage you to do the same. First of all, there's a Geobra, download official spatial data sets of Brazil. It's a nice one with a great variety of data sets. It also has a Python package. Then there's a GeoUi, geographic information of Uruguay. That is a WFS client, but it makes a nice use of SFST read to interact with the API. Then there's a Tigris, downloaded and used to Tigerline shapefraz in R. So to work with US Census data, you can use Tigris with the TIDIS Census package. Then there's GiscoR, download map data from Gisco API Eurostat. This is a European Union-wide service. Very nice API, very nice implementation in this package. Then there's a MapSpain that is somewhat overlapping with GiscoR, but also has quite a lot of unique Spanish data sets, pre-processed as GeoPackages. Lessons learned, all the packages return data as simple features. SP is hardly even optional anymore. The backend technologies vary greatly, and at the moment it's hard to see any potential for developing more generic methods for these download tasks. There's zip-safe files, REST APIs and WFS APIs in many, many flavors. All packages return the data transformed in the same package country-specific CRS. Diobra also offers a Python version, which we should all consider, I think. Only the Uruguay package has a Vignette in national language, and I think that would be a nice extra to have in the country-specific packages. Most countries do not have their own R package for geospatial data, so please hurry up and fix it. And well, to be honest, package name Geofi was shamelessly taken from these two packages, the Uruguay and the GeoBrazil. Thank you. Okay, thank you for your contributions to my co-authors, Joana Lehtomäki, Juusaparkkinen, Janimiettinen, Pyrugantanen, Sambovesanen and Lea Lahti. And thank you for your attention, whoever you are, wherever you are. My name is Markus Kainu and that's my email address. And thank you for the creators of Ksaringan and Ksaringan themar packages. And I'll see you on 7th of July at the USR 2021 conference. Thank you. Thank you, Markus, for your talk. It was a very nice talk. Thank you very much. Your comparison with the package from Uruguay and Brazil and the other ones. And we have a question from the audience from Rainer Walk. He says, how do you deal with the change of municipality borders over time? Okay, that is a good question. It kind of, let's say, emphasizes how privileged I am as a software developer here in Finland, because I don't have to do that. There's statistics in Finland and it's doing all that work and providing the time series of municipality borders every year in January. There's a new one available. There's quite a few of mergers and changes happening. So we don't do that, but the data provider does it. Okay. Thanks for your response. You also mentioned that you encourage other countries to have their own R packages. What would you recommend to other countries? What general recommendation would you give them? Well, I think this is heavily dependent on, let's say, the institutional capacity or kind of a legal prerequisites of a country, whether it's even able to provide such data, not to mention to maintain it and to provide it openly to API like we can do here in Finland. So I think if this doesn't exist, if this institutional and technical, let's say backend doesn't exist, then it is a lot harder, especially to maintain it over time. Because here we don't have to maintain any of the data actually that this package provides access to. So we can just focus on making it work fast and to be compatible with the R ecosystem. So I know it is a lot harder. I think there's a tutorial on AfriMap R, which is a more wider project on African countries trying to tackle these issues. And I think that's maybe a one that you should also have a closer look. I will put it in the Slack chat. And can the development of your package be useful with facilitating the work of other countries with your code? I think there are a few good design choices that we have made. Of course, it's mainly a client on top of web feature service APIs. So if you don't have that, then a lot of the packages is maybe useful for the technical point of view. But then we have these routines for aggregating data that you often have to do. And we provide some of the onboard additional data that maybe could be also sort of copy the idea to other packages. Yeah, there's a one question in Slack. Yeah, the time series, from 2013, they provide the archives for the space. We do have archives for, let's say for a longer period, but they are in different resources. So we haven't so far, we haven't included them because they're not that much required. Yeah, the question was how far back in time the municipality border data is available from a statistical Finland. Yeah, so it's 2013. So we have almost 10 years now. Okay. Thank you very much. We may have additional times. We will have additional time for questions at the end of the session. Okay, I will be around here and in Slack. Okay. So now, thank you, Marcos. Now we'll continue with our last presenter. Michael Mahoney is a PhD student of the graduate program in environmental and sciences of the State University of New York College. And he's presenting the Terana package. And his talk is called virtual environments using our as a front end for 3D rendering of digital landscapes. Welcome, Michael. Hi, everyone. My name is Michael Mahoney. I'm a PhD student in environmental science at the State University of New York College of Environmental Science and Forestry. And I'm excited to be talking to you today about using our as a front end for producing virtual environments and landscape visualizations inside of video game engines. I'm going to be focusing today on the trainer package as a way to retrieve spatial data for the United States and to visualize spatial data inside of game engines. I want to talk a little bit about the challenges of using game engines for data visualization and then talk about why we think it's worthwhile to deal with those challenges and why we think R is the right language for the job. So first things first, what is Terana? Terana is a new package focusing on the retrieval and visualization of spatial data. It can be used to download data for use in analysis to visualize the outputs of other spatial processing. Or the two focuses can be combined to download and visualize data for areas within the United States. Importantly, though, is that those two focuses are designed to integrate well with the rest of the R spatial ecosystem. So you can feed the data that Terana downloads to other R functions or visualize the outputs from non-terana workflows. The two halves can be used separately or as a cohesive whole. So to start off by talking about the data retrieval side of things, Terana provides a standard API to access public domain spatial data for the entire United States, with data provided by the United States Geological Survey's national map program. The national map provides access to a number of data products for the entire United States, including elevation data that's accurate down to three meters or even one meter in some areas, as well as aerial ortho imagery and other base map layers like contour lines and hydrography. Now, each of those data products exists as a separate service from the national map, which requires differently shaped queries, returns data in different formats, and might produce different types of errors. And so a lot of the code in Terana is focused on dealing with each of the different surfaces. But from the user's perspective, Terana aims to provide a single simple interface to download whatever data you want. The user just needs to specify their area of interest and the type of data they're looking to download. So say, for instance, you have some SF object. So here I'm going to download a GeoJSON file of campsites in Bryce Canyon National Park out in Utah. And I'm just going to read that directly into R from a data portal as an SF object. We can plot that with ggplot2 to see how our campsites are arranged. And if we decide that, yeah, this is the area I want to download data for, I can pass that SF object directly to the getTiles function from Terana to download data from the national map. So here I'm specifying using the services argument that I want to download elevation data and ortho imagery for my area of interest. If I want to download contour lines, I could just add contours to that vector. It's that simple to move between the services you download data for. And I'm also specifying that I want to download the data at a 30 meter resolution. One of the very nice things about the national map is that it will actually resample its data to whatever resolution you request. And getTiles then returns a list of where it's saved the files to. If your request is too large for the API, Terana will automatically break it into pieces and save multiple images. So this list can help you identify how many tiles were returned and get their paths where they were saved to programmatically. And for a lot of users, this is what Terana is. It's an easy interface to free public domain spatial data for the United States. But I personally get excited about the other half of the package, which focuses on visualizing spatial data in both 2D and 3D. So to start with the 2D, it's already pretty easy to plot most spatial data in R. For single-band rasters where each pixel only stores one value like our elevation data, we can use the plot method from raster. If we have multi-band RGB images, so each pixel has a red, green, and blue value like our ortho imagery does, we can use the plot RGB function. And we can also bring single-band rasters into ggplot2 using geom raster. But it's a bit trickier to add multi-band images to ggplots if we want to do things like use our ortho imagery as a base map. And so Terana introduces a new geom, geom spatial RGB, to do exactly that. By providing either a data frame or a raster stack or a file that can be read in as a raster stack, you can add arbitrary multi-band rasters to a ggplot. We can then go ahead and say add our campsites on top of this map using geomsf to give our simple map from earlier a little more context. This function makes it a lot easier to include base maps underneath the maps you're making in ggplot2. Now as you might guess from the name, geom spatial RGB was designed with spatial data in mind, but there's really no requirement you use it for spatial data. Most image formats can be read in as raster stack objects. And that means that this function is surprisingly flexible for adding any image to a ggplot. And I think there's a lot of interesting potential and off-label usage here that I'm expecting future versions of Terana will make a bit easier. And so that's how this package interacts with 2D visualizations. Moving on to 3D, well, R has a very active ecosystem for 3D spatial visualizations. And I think it's worth mentioning that the parts of Terana we've talked about so far, the data retrieval components, work pretty well with those. For instance, we can use the fantastic ray shader library to make beautiful visualizations of our 30 meter surface plotted using RGL entirely within our R session. Terana offers another way of visualizing these surfaces by helping to transform data so that it can be brought into the Unity 3D game engine. And that way we can produce these high resolution, fully physically rendered terrain surfaces inside of a game engine. I'm really excited about the potential for visualizing spatial data in these engines, and I want to talk about that in a minute. But I think it's useful to walk through how these visualizations actually get produced first. So first things first, we need to redownload our data at a one meter resolution, which is the default for get tiles. We can use any other data here, so any single band raster can be used as an elevation height map. Any image can be used as an image overlay, but we're going to stick with the same area just for simplicity's sake. Because this is such a large area, get tiles has to split our request into multiple files. So we use another Terana function, merge rasters to merge those tiles into single files. And we're left with elevation.tiff for our elevation data and ortho imagery.tiff for our imagery. We can then use another Terana function, raster to raw tiles, to transform those files into a format we can import into the game engine. Now, the Unity game engine expects our elevation data to be in a binary format, which we save as a .raw file. So we set the raw argument to true there. Meanwhile, the ortho imagery doesn't need to be in that same format, so we set raw to false. And with these few lines of code, we're able to download data tiles, merge those tiles into single files, and transform that data to be imported into Unity. And we end up with a number of files containing either our elevation data or the images we'll use on top of the train surface those will create. The only thing we've got left is to actually import these into Unity. Now in the very near future, this is going to be handled by a function, but at the moment we need to bring in each tile manually. This process is documented in a vignette in the package and it takes me about three minutes to do, so instead of making everyone watch that, I'll go ahead and just show off the end result. And we're left with a surface that looks like this. So this now is inside of the Unity game engines user interface. And here I'm able to use my mouse to pan across this image or use my keyboard to interactively explore the landscape that we have created. And I want to stress again that these surfaces can be created with any data. So any single band raster can be used as elevation, any image file can be used as an overlay here. And I think that the surface looks great. And it's relatively easy to produce, but I think the obvious question is, why bring in Unity at all? After all, Unity and most of the other game engines would use for this are not capital F free software. They don't cost money necessarily, but they're not permissively licensed like R is they're not open source. And then interacting with these game engines requires you and your users to have a completely different skill set from the programming that's required to download and process facial data. So why bother? And I think there's a couple reasons. First off, Unity is incredibly efficient and can handle massive landscapes with no problem. So for instance, this surface is a one meter resolution terrain landscape that goes about 13 kilometers in one direction and 10 kilometers in the other. That means that this surface has a total of 130 million individual pixels, each with their own coloration and elevation to render. And I can fly over the surface without my laptop fans kicking on. Unity progressively renders terrain as you move around at different levels of detail. So areas closer up are rendered at higher resolutions and areas further away, which lets it handle just massive landscapes that can be a challenge for our based rendering tools. Secondly, there's a lot of space to expand beyond just these static surfaces inside of Unity. As I mentioned earlier, Unity is primarily used as a video game engine, and so it offers a lot of ways for users to interact with this landscape. Our landscape is entirely physically rendered, which means that I can add a small little script and actually create a character of sorts, which can then, using the keyboard and mouse, interactively explore this landscape that we have created. And I think that this has the potential to be a really powerful way to add interactivity to spatial visualizations. Now, at the moment, adding characters requires a little bit of familiarity with the Unity UI, but we're planning on changing that. We intend in the next few months to have a function that will create and add a character on top of this world for you directly from R. Similarly, we're working on ways to plant data-defined forests and buildings on top of this surface from R so that you're able to visualize more than just these terrain surfaces inside these engines. By building out ways to create visualizations in game engines directly from R, we can potentially make these tools more useful for scientific communication and improve the types of visualizations that teams can actually take advantage of. Which then brings us to the last question I mentioned, which is, why R? This is a new project, and we could have chosen any language for our front-end, but there's a few reasons that made R the clear choice for us. First off, one of the big challenges with using game engines for visualization is that these game engines don't think of data in the same way that we might as scientists. The idea of trying to create data-defined landscapes in these systems is pretty new, and there isn't much tooling to support data import and manipulation in these engines. So, having access to R's incredible data manipulation tools to help us actually create these scenes is a massive help, and the R package ecosystem is just unmatched for this type of project. Plus, it's a huge benefit to be able to get ideas and inspiration from the group of people working with spatial data and landscape visualization in R right now. But, even more than that, R is where our users are. We're working with terrain and related projects to help ecologists and environmental scientists visualize the places that they work to try and help promote qualitative understandings of large-scale systems. Those researchers are already broadly familiar with the R ecosystem, given just how widespread R is in environmental research. If our goal is to make these visualizations more approachable, then we need to meet our users where they are. And so that's our goal with Terrainer and the projects that we've got coming down the line. We want to make it easy for users to apply their current R skills to make visualizations inside of these game engines without ever needing to think about why that might have been a tricky thing for them to do. And with that said, I want to thank you for coming today. I also want to thank the State University of New York for supporting this project via the ESF Pathways to NetZero Carbon Initiative. If you'd like to learn more about Terrainer, I've included links to the GitHub repo and the documentation website in the slides, which are themselves available from GitHub. And you can find me at Mike Mahoney 218 on both GitHub and Twitter. Thanks again, and I look forward to seeing you in the Q&A. Thank you, Mike. I love your talk and thanks for the fly over the mountains. Very nice to see. I see that you are a PhD student in environmental sciences. And I have a question related to that. Can you, your virtual environments be used to simulate some environmental process at the landscape scale? Yeah. So speaking extremely long term. The goal for this system is to try and create a way where we can visualize these large scale systems over time. So that people can get a very concrete idea of how, so I work with the context of climate change and forestry. And so we want to be able to give a very concrete example of what our models are telling us. These forests are going to experience what species we can expect to survive if we do specific management activity, what we expect to have happen afterwards. And so the idea behind how we're building out these systems is to make it so that it's these data defined visualizations that build off of data that simulation programs would actually be creating. Right. So you may add, for example, a forest layer or other vegetation layers over your mountains. Exactly. And so right now we're working on trying to transform. Because when people are measuring forests, you usually don't go out and measure every single tree in the forest, you take a sample. So we're working right now on ways to get from those samples to realistic forests, so that we have realistic trees to put on top of these landscapes. And so that by experimentally manipulating your samples, you can create these forests off of, oh, well, I think all of this species will be gone. What will the landscape look like after that? And trying to make it so that people are able to quickly visualize these sorts of alternate realities. That's very nice. I saw that your approach is adaptable to other regions, just changing the raster layer. So, for example, in my area, I work in wetlands, I think that it also would be very nice to model flooding floods. And to see how it's very good to share work with the non-specialized audience, too. Yeah, and we've definitely got, I have met with a few people at the United States Geologic Survey, who actually run the place that we downloaded the data from. And they've used this tool for things very similar to that, trying to simulate floods and other disturbances. I have a question from Shuri Visashi. I hope I said your name right. And they say, speaking as a civic cultural technical officer, what are the limitations you have encountered so far? Would the software be able to be used to measure specific trees in that forest? So, we're thinking from the perspective of, and so I'm sorry if I lose anyone, this might be a little deeper into the forestry side of things. We're thinking from the perspective of, you have already gone out and done a fixed plot inventory, so you have some number of plots in the forest that you've gone out and basically put a stake in the ground, measured all the plots within a certain, or all the trees within a certain distance of that stake. What we're working on right now is a way to get from those plots to a realistic forest. So trying to figure out trees of certain sizes and certain species will only ever exist so close to one another, for instance. And so we're trying to figure out how we can make it so that these trees are realistically positioned throughout this landscape so that we can then grow them so that if we have these 3D models of these trees, we can place them appropriately throughout the landscape. We aren't necessarily as interested in how we can use this to say measure the trees outside of these plots or use it for estimation of stand characteristics. We're coming at it much more from the visualization focus than the actual statistical analysis focus. Right, thank you. So, now we have additional time for questions. Sherry says thank you still amazing software. I have an additional question from Andrea Carranso is or will be it possible to import those kinds of data into blender. So the answer is for the landscape that I was showing the terrain surfaces, not planning on it. There are some very nice add-ins for blender already that handle a very similar feature. They aren't called directly from our but I think that they work better as a result of that. As far as the forest and things that we're intending on planting on these surfaces, we're looking to actually render these forests inside of blender. And then you'll be exporting those models or well the package that I'm working on at the moment will be exporting those models to import into the game engine for visualization. Right, thank you. So now we have about 30 minutes left, 25 minutes left in the session for questions for the three presenters. So I saw that Marcus and Sherry you can turn on your video if you wish. I saw that Marcus had additional questions in the Slack channel and we are also, you can also ask questions here in the QA with the QA bottom. So do you want to comment something of the conversation you were having on Slack, Marcus? Yeah, sure. Tom Gemet was asking in Slack or pointing me to open geography portal of UK office for national statistics and asking like that he has a plan to work on a similar package. I mean, I just had a quick look at it and seems to be that it's providing a lot of different kind of data sets on various topics and in different technical formats. And I think the approaches, let's say that were applied in our open Spain or the Brazil package could maybe apply. They used, because they had several distinct data sources. So they used, I mean, the package packages had a separate functions for each data source, and then they could control. I mean, the functions could be quite simple and of course it makes the whole package API quite complex if you have a lot of those sources, but it's maybe easier to maintain. Maybe my, from the main, there's one thing I think, I didn't look at the data that closely, but they are zipped. I don't know if some of them are shape files, some could be just CSV files zipped. So there could be a need, I mean, the one approach could be to pre-process those data sets into like our objects or something similar and provide them and make the package use those pre-processed data files to make it run a bit faster. I think it may take some time to first download the zip CSV and then convert it into a space. So there is one thing to consider, of course, it adds up to the maintenance burden, which I especially try to avoid. So I prefer to keep things very simple so that I can ensure that it's maintained and always working. I don't know, these are just my random thoughts on this question. Thanks for sharing. I have a question for Mike. Sebastian Raiden, I don't know how to pronounce your name, sorry, says, from what I understood the package so far covers mainly US data. Do you plan to cover more parts of the world with the package? Yeah, so as far as the visualization end of things goes the ggplot extensions the game engine, half of the package, that's able to use any data that you provide it. So most of the functions are just taking raster objects, just a geotef or anything that you load in with the raster package as arguments, or possibly SF, but they should all be relatively agnostic about what the data is actually representing. As far as the data retrieval end of things goes, I am interested in adding in other endpoints to cover more than just US data, particularly for those elevation and those ortho imagery layers. When I have gone asking people who work in the spatial visualization area, I haven't found any endpoints that have the same granular level of data that the US has available. So the US Geological Survey is currently reflying lidar for the entire United States, and I believe most of Canada as well, attempting to get that one meter resolution data for the entire country. And I just haven't been able to find that freely available for anywhere else. So if you know endpoints please point me to them. Thanks for sharing Mike. So I, since here we have two early career students and a senior researcher. I like to ask you both advice, would you give to other professionals or our students that are starting their careers in data visualization or special applications. It's a question for any of you. To start things off, I think my advice is always find something that you wish existed and do it. I find a lot of people tend to start off. We talk a lot about this sort of project driven learning, especially in I think data science and within our of Oh well once you get past the point of you understand our syntax, find something to do with it. I think it's important to find a project that you care about to do with it that you're going to use yourself to do with it. Because otherwise I think it's very easy to sort of fall off the wagon there and lose some of your progress. Whereas if you're building something that is useful to you immediately. You're going to have a continuous drive to improve your skills and to improve your product as well. Sherry, how you started your career. I think for me, one of the largest things that I have is visualization is actually larger than I thought because in the past I thought there was only static visualization. And later there are interactive visualization and there's also dynamic visualization so they're actually a lot of area and inside just visualization itself. So there is actually a lot to explore. Yeah, I am Marcus here. I followed Mike's advice and this project for instance started as a I needed to make a I need to make maps and actually to teach my colleagues to you to make maps so that I don't need to do them. So I made a package and now they can do their maps. And I can only maintain the package every now and then. Thank you. So and what are your favorite. What are your expectations of new start are you expecting. What what are you that what is the the talk or the session you are expecting to see that fits your interest. I saw Mike that you were in the previous special application session. I couldn't get it. I'm thrilled with how many talks are talking about spatial data and spatial analysis. This user. I honestly, I just keep sending links to my lab mates of these packages and approaches that I think are absolutely fascinating. So, I feel like I, I get that this is a cliche answer I can't pick one, but the overall amount of spatial content has been really exciting. And as, as, as working for a large government agency, I try to follow also that the big data and the enterprise enterprise presentations that I could also send the links to my colleagues once they are back from summer holidays. Very, very relevant stuff. Yeah, and for me, there is a sense that visualization tool a series as well so I'm quite interested in that. Thank you for sharing your thoughts. So, I don't have any more questions in the QA or in Slack. The conversation can continue during the conference on the Slack channel. I would also like to encourage you to fill our user 2021 survey on on the diversity and participants of the conference. I have shared the link in the in the chat. Up next, in our conference at one p.m. p.m. UTC we have the second session of elevator pitch to participate. You have to join the hash elevator pitches channel and then look, showing the channel of the elevator pitch you want to hear if it's short talk or the technical note you want to read, and you can interact with the presenters, both in their channels or with a call if the presenter wants to open a call. Next, at 2.45 p.m. UTC we have a keynote from meta docencia that will be present presented in Spanish. It's called enseñando enseñar sin perder nadie en el camino translated teaching how to teach without leaving anyone behind. And it will have English captions for everyone who doesn't speak Spanish to follow the talk.