 Good morning Susan. Hi Brian, nice to meet you. Nice to meet you too. It's kick things off perfect. So, welcome everybody to mapping spatial health data. This is a workshop for the our medicine conference. Myself and Susan taken will be speaking to you about about these about this about this stuff. So we're both at the healthy regions and policies lab. That's housed at the Center for Spatial Data Science. So, a little bit about the lab in general and then we'll kind of go into more intros of each of us. So the healthy regions and policies lab is based at the Center for Spatial Data Science. We integrate GI science public health and statistical approaches to explore understand and promote healthy communities. We believe in spatial data science for good, and also therefore maintain a commitment to open science and open source methods and applications. So we're a bit unique in using coding like our to do a lot of the actual GI science and statistical work that are our shop does. So I'm running a collect. I'm a health geographer slash spatial at the slash spatial data scientist. I actually come from the physical sciences I was trained as a geologist and then I shifted over to people, continuing my love for databases, but kind of extending that to stats public health. I'm interested in how where you live impacts interacts with drives magnifies health outcomes. Susan, do you want to introduce yourself. Sure. Hi everyone. I'm a researcher with the lab and also project manager. My background is in social science research and spatial data science, particularly with a lens on public policy and policy analysis. I also my my background is also in sustainable agriculture and farming. And so I've worked a lot in food systems particularly looking at food access and issues around equity and access and food and nutrition spaces. So really glad to be bringing sort of integrating some of that work with broader public health topics. So in a minute also give the chances for the TAs to briefly introduce themselves if they're if they're interested. Today's workshop, we're going to be doing a couple things. This is a crash course in intro to using our as a GIS and as well as kind of learning about spatial analysis and mapping. So we're going to understand the basics of those things and are a big part of that will be standardizing data at the neighborhood level, which is one of the biggest tools that our group uses our for as well as many other groups. And also integrating data from another source. If you have spatial and non spatial data, how do you connect those things will also calculate some new variables. And then of course, because that's usually the reason folks want to join these workshops in the first place, we will also make lots of maps. You'll learn that our groups focus in just within geographic information science maps, maybe the actual last stage of of, or it's the very first stage or the very last stage. So the maps are not in products for us. They're kind of part of the journey in understanding more complex phenomena that are occurring at the neighborhood level. So we're going to be taking about five minute breaks between section will have an intro to our spatial section mapping neighborhoods, adding health resources, calculating spatial metrics and then a final q amp a resource sharing session. That last session is there also is slack in case we get a little over on any one of these sections so it's helpful to build that in. So along with the live coding, I created a workbook as an art book down that includes all the materials here. So feel free to, or we can repaste all the different links that will be useful for you. Okay. But at this stage, with the TAs want to quickly introduce themselves and also just so that you're familiar with who they are. So if you have questions as we're going through our medicine conference has this great feature where TAs are available to help. So feel free to unmute and introduce. Hi everyone. My name is shell myth. I hope I'm audible. Okay, my name is shell myth karaoke, but most people in the community know me as shell. I am based in Nairobi, Kenya. I am a data analytics consultant who is always very interested in working with special data and I am happy to be here. Thank you. I'm more Alex, I'm a general pediatrician, and I help organize our medicine I don't know anything about special data but I do know about our and so happy to travel for stuff like that. I also helped set up the R studio cloud workspace with you. So you can put messages in the chat now I'll be able to help troubleshoot you for that if you need. Hi everyone, my name is Maria Kamenetsky. I'm a PhD candidate in epidemiology at the University of Wisconsin Madison, and I work on statistical methods for spatial cluster detection. Thanks. Hi everyone, I'm will I'm a statistician and our package developer in the life sciences. I am a complete novice at spatial data. But I would, I'll be learning along with everyone, and I'll be helping out with our related technical issues and anything else that I can be useful with. I think that's it. We had one more TA but he wasn't able to make it today, but he's here in spirit. Awesome. So, again, TAs are here to help message one of these awesome TAs if you get stuck or have a question I think nails was the TA who was not be able to be here today but again, where he's he's here in spirit. And again, this course is or this kind of short workshop. See me credits are available at this stage I'll switch over to Susan to kind of explain the process for this and then take us through some of the intro concepts for our first chapter. So, I just dropped a link in the chat for if you are interested in and would like to claim CME credits for participation in this session. This session is eligible for Oh, it looks like the slide didn't update but I just updated it's a maximum of three. The link in the chat is just links to a PDF with instructions of how to do that through the University of Chicago CME continuing learning office. So if you have any questions. Feel free to email me or the Tom at the CME office, his emails posted there and it's also in that PDF. I think that's it for that. So, with that, we're going to transition to the introductory part of our workshop today. The way that the rest of the, I was going to say morning but it's afternoon evening and lots of different parts of the world but the rest of the session will go is that I'll sort of introduce some of the concepts that will be working through in our analysis and our live coding. And then Marina will take it from there in walking through exercises and actually applying a lot of these concepts. So again this is a crash course, we do have three hours but we've we've stuffed a lot of information into these three hours so hopefully, you know, this first part with the slides is helpful in framing some of the big picture concepts and then you'll actually get be able to get your hands dirty in doing some of the live coding in our studio cloud setup. So, with that, first we're all in this mapping spatial health data workshop but at its foundation what is spatial data. So, spatial data refers to data that contains information about specific locations. So, in other words, both information and location are two key elements in spatial data. On some occasions, spatial data might only contain the information about the location of where something is, but oftentimes, we're talking about data that has the location but then other information as well. So this could be something as simple as the location of a hospital and then the hospital name, the hospital, like address and any other information about the hospital. So, yeah, so key here is that spatial data must contain location data and also enable that in in the in our ecosystem which we'll see how to do that shortly. Next slide please. So, when we talk about spatial data, we're generally talking about two types of spatial data, there's vector data and raster data. So, we're going to be, we're going to be primarily working with vector data in the workshop today. But no, so I'm not going to spend time discussing right raster data raster data is often you see it as satellite data, but know that are you can definitely work with spatial data with raster spatial data in our specifically. Yeah, there are lots of resources, talking about how to work with raster data in our as well. But back to vector data, vector data represents the world surface using three main three main models so points lines, poly and polygons or areas, as it's described here. So for example, a group of individual hospitals or clinics might be represented as point locations on a map, whereas the zip code areas in which the hospitals are located on the boundaries of which would be represented as areas. So we can go to the next slide please. So there are a number of data formats that we can work with when we're talking about working with spatial data and specifically spatial data in our today we're going to be looking at a few non spatial and spatial data formats. So with CSVs know that a CSV on its own is not a spatial data format, but it can store spatial data. If it has columns with latitude and longitude that represent coordinate locations. So those columns on their own are not spatial, but when they're enabled in our spatial ecosystem, then they represent actual points in space. Again, we'll see that in action in momentarily. Next slide, but two other sort of key spatial data formats that we'll also work with today are shape files and geo JSONs. So shape files are a data format that was actually first developed by Esri. They are made up of at least four extension files. So you see them listed here, the dot shp is the file that you'll generally be working with. And then to do JSONs are excuse me are also a standard format for encoding a variety of geographic and spatial data structures. So that's another common format that you'll see a lot and working with spatial data as well as it's very easy to use generally. And just know that other spatial data formats include KML and geo package and there's actually there's a whole long list of other other formats. So in our one of the most commonly used spatial libraries is SF SF library uses a few sort of key data structures. There's the data table, which is, you know, a table or the two dimensional array structure, the table, which is the tidy verse versioning or reimagining of the data frame. And, and then, again, an SF, an SF structure refers to simple features, which is a formal standard actually that describes how objects in the real world can be represented with an emphasis on the spatial geometry of those objects. So, again, simple features is terminology that exists outside of the R spatial ecoverse, but it, but we also refer to the SF library and SF data structure within this world. So another key concept that we wanted to introduce before diving into some of the coding is the really foundational concept of coordinate reference systems or CRS. So this is sort of one of the key really important elements that you need to know really upfront when you're starting to work with spatial data, because it can really trip trip you up. Especially when you get to generating maps or any sort of calculations between points in space. So a coordinate reference system, or a CRS communicates what method should be used to flatten or project the Earth's surface onto a two dimensional map, which is of course what we're looking at when we're creating maps on our, on our devices. So different CRS is can apply different ways of projecting and different ways of projecting information and can generate substantially different visualizations. So if we go to the next slide, we'll see it looks like, oh, there we go. Okay, so we'll see a few different examples of different projections of ways of looking at the world. And these different projections basically can can really alter the way that we foundationally understand, you know, in this context we're looking at the world so we're looking at like how, how big different continents and countries appear to be. And how does that measure in reality, or change in different projections. And how does that change both our numerical and quantitative estimates as well as our perceptions. If anyone's seen that West Wing episode about mapping projections, where CJ's mind is blown about the different mapping projections. I think that that really drives home. What a difference that that this concept can make in in how we visualize spatial data. So we'll be working with some different CRS is today during the, during, during the coding, but we wanted to highlight this here. So CRS is can be referred to using a, what's called an SRID, which stands for Spatial Reference System Identifier. So there's different types of SRIDs but one of the most commonly used systems that are that we use our EPSG codes. So the EPSG, EPSG database is one of the most comprehensive databases that store thousands of different projections that are used across industries and in different context, particularly in the GIS geographic information science context. EPSG 4326 is one of the most commonly used projections today. We'll see that one. Again, we'll, we'll use that one as sort of the foundational projection in, in our, in our code. And you'll often, oftentimes your data will start at 4326 but then you'll transform it to a different CRS that might be more adapted to your local area that you're studying or mapping or the, or the region. So we'll dive into that more in a minute, in a minute. And also in our, in the SF package, you can use the function ST CRS to check the CRS used in the data, or ST transform to reproject the data to a different CRS from, if you're missing a CRS to add a CRS or if you're changing from 4326 to a different EPSG code for example. Next slide. All right, so I'm going to hand it over to Maria. Hopefully that provided some good context as we dive more into how to apply these concepts in our excellent so just bear with me while I, I shift a few things around here and getting ready for this. I'm going to share my whole screen. And we'll kind of see how this works. All right. So, okay, I've got to move to the zoom thing. Okay. So first things first on. So again, the code, the code for what will be going through will be available here. So that's, I believe, has been added to the chat a couple different times. And that will be important because it turns out that some of these facial packages like slightly different versions will have slightly different outcomes. So I'm showing it to you here as the goal. And then. So if one part doesn't work, just kind of bookmark that and come back to it later. Okay, so I just wanted to say that kind of starting out. One of the issues we're going to hit right away will be related to CRS CRS is and how so there are thousands and thousands and thousands of different coordinate reference systems out there. There are entire fields of geophysics that are kind of dedicated to this sort of work. So it turns out that with the R studio cloud version. Let's see, this is the link that was shared so the are marked down files and data folder that was current as of this morning. So here as well as a couple of different core packages that you need, but long story short, we'll find that there may be some pieces that won't render it completely the same way and I think that's interesting now and, and we can, you know, we can talk about that one when that comes up. But first let me go to the background section here. So there's three core libraries that we're going to be working with today. SF T map and tidy G coder. Everything that we're doing can those those three packages are going to deliver the core goals. However, I understand that everyone and I understand and respect that everyone has a different are kind of system or ecosystem that works for them. So if you're a tidy verse and tidy user. I leave a couple breadcrumbs throughout of like this is where, you know, this can be further optimized with with the tidy verse see, you know, see what you can do from there. And, and so on and so forth so just keep that in mind. But that being said, in this project environment. Again, the four are marked down files as of 9am this morning are included there, as well as the data folder for the data that you need. And again, to work with this you can either I believe like start typing right away or save as a permanent copy and kind of go from there. So that's why our studio and my system, partly because if you're, if you're a zoom host, it's already a lot of bandwidth and I don't want to explode this whole thing by any means. But again, the it's, it's going to be the same code, and we're working with the same workbook. Okay, but let's start off to the intro to spatial data part. So far what Susan went through was like what is spatial data what is the SF framework within that sort of thing. So, well, and I should note here in the workbook. Normally in my own kind of day to day practice, I'll load all the libraries I need at the very top. However, with this workshop I'm actually loading libraries when we start to need them just so that you know which library you're working with at any given time. I find that's really helpful for troubleshooting spatial workflows. So that's why I'm doing it that way. All right. I will also note, let's go to a lot of the data that we're working with is real data from and in this case the city of Chicago data portal. So, a lot of times, the biggest challenge to working with this stuff is knowing, where do I get the data, right. So we're going to be working with coven data that data was downloaded directly from, you know, in raw format from the Chicago data portal. For example, if we want zip code boundaries. That is also available. And so forth. Large cities will have data portals where smaller cities will not. So we also will give you a few hints of where to get data throughout the rest of the workshop. There will be, we have another toolkit that gives you a nice census data wringling option. And then we'll also introduce some other. Massive like data repositories that have a lot of different clean data that's kind of customized to public health, clinical health audience. Okay. All right, that being said, we're going to first load in here, I'm going to switch over to my R studio environment. And in this environment, let's clean things up a little bit. I'm going to pretend that we're kind of starting from scratch. And we're going. So, first things first, you know, we want to load the library SF, which I've already done but just so that you, you believe that we're going to use the ST read function in SF to actually read the shape file. So here it's really important that whenever you work with a shape file, that all of the extensions are in the folder where you're reading from. I've seen some students. And when I say student, it could be a surgeon who's just learning spatial analysis or it could be an undergrad right. There's a big, big, big range there. So sometimes you think that you just want to work with the dot SHP file, but the dot DBF extension is where all the attribute data is held, and the dot PRJ extension is where the actual coordinate reference system is held. So it turns out to be pretty important to include all of that. Okay, so I'm going to read that. And when it is read SF gives me a nice kind of overview of what was just included. We were using an Esri shape file. It's a polygon. And that's important to kind of confirm that it is what we expect. You know, we were learning loading zip codes. So if they were points that would be kind of odd. So fortunately they are areas which is what we expected. Okay. So as we load something in. And I should note here too, this is an Esri shape file. How many of you have heard of ArcGIS or Esri. You can use the raising hand function. Okay. Well, that's like the predominant long time proprietary way to do GI to do GIS for a really, really long time. And it turns out they have a proprietary data format as well, which is as reshape file, they're kind of like, I mean, Microsoft is different now they were kind of like the Microsoft of the 90s. You know, you know, similar to that to systems today like super massive or another analogy might be Oracle like they're the Oracle of GIS massive contracts throughout the country. And really, really, I mean, I worked with ArcGIS for a decade before I started to switch into open source systems, but they're still really visible and predominant in the GIS workspace. And so this is one of the ways that we'll start to see see them right through this kind of spatial format. If you work with GIS analysts in your, in your place of work or through collaborators. You might want to have the data as a shape file so that's another way that you might see it in this way. Okay, but as soon as you have that. So one of the ways, as soon as you load something in, especially, especially in when working with spatial analysis is to look at it, you want to look at it as soon as possible just to make sure that it is what you think it is. So we see that we have zip code information. There's also some community areas, census tracks that sort of thing. It's not a super clean data set which is pretty common for data portals. But we also have this geometry column here polygon in a bunch of information. That's actually the SF data frame that is working in action. Okay. So, so that is so just to look at your data. If you use tiny verse you might use the function glimpse. I like to use a function head just to look at the top six. But we also want to look at the spatial data. So we can use a base our function plot, just to look at this. And I might have to move this up again. The base are function, the way that it's connected, it gets rerouted essentially with SF, so that the plots are going to be plots every single attribute will be plotted. Okay. I think we had a question, feel free to ask. Oh, Irene, or Irene can I might have been left over before. Yeah, I'm going to do it all good. Okay, and yeah, feel free to just ask questions in the chat to. So this is kind of a really nice feature of SF, the older framework SP doesn't do this. So I mean I just wanted to note that so here SF is repurposing that plot function from base are to give us a look and we can see that, you know the only, yeah, there's there's a lot of different data here, not super useful to us but we're going to kind of work with it. So let's look at here actually I'll come on this side because some of these aren't necessarily worth live coding immediately. And so let's look at the structure of the Chicago tracks. I can see that it's both a data frame, as well as an SF object. Okay, so just keep that in mind because that's kind of information might be useful when you're troubleshooting in the geometric object the geometry type is SFC polygon. So if we want to look at the coordinate reference system of the tracks. And we can just by using this function st underscore CRS, and this can be really useful. It tells us that the coordinate reference is 4326 that's again the most common one that Susan already mentioned, and it gives you a little bit of information. So the most important thing here is this unit call right here what is the unit, but how do we measure distance in this coordinate reference system, while we measure it in degrees. So degrees week when we talk about like how far away is, you know the nearest hospital we don't say it's 10 degrees away, because the earth is curved and speaking in degrees that low at small scales is doesn't make a lot of difference. Maybe if you're flying somewhere on and you're super super nerd, you might talk about distance and degrees but that's not really common anywhere else. So that's really a pretty important part of knowing what coordinate reference system is this is in because if we want to do any type of analysis to this later, where distance needs to be in feet or meters we're going to have to change it right. At this stage, we are shifting to the exploration of coordinate reference systems. And here, some of you may see an error when you are when you're coding will in our cloud instance I'll explain that in just a second. Okay, so first we're going to try. And these are not coordinate reference systems you'd know off the top of your head this is just me showing you a couple different options just to see how using a different coordinate system can really change your place. So first we'll try them all wide coordinate reference system and that does a pretty good job preserving area across the globe. And that might be important because you know if you are looking at a global phenomenon. And that is beyond a very small area preserving area could be could be pretty important. So we use a SD transform function. You essentially just pass the spatial object and then you type in the coordinate reference system that you want to project to. So in my version, and again I've spent a lot of time trying to figure this out I can't figure out why. So in my version when I run this. A plot that looks like this we see this kind of sideways looking. City of Chicago. So this is not the city of Chicago that we're familiar with. However, this is more true to the actual view of the city of Chicago on the actual planet right so there's this kind of disjoint between how we think of things and how things actually are right so this. This can be observed so on and so forth. However, if you're running this on our studio cloud, you might be getting an error. What does the error say, I don't know if anyone's gotten it. Or maybe no one's slide coding, I'm not sure. Well when I was trying this on our studio cloud instance I got an error saying that CRS is not found and I thought, what that's that shouldn't be right so that. As mentioned before there are thousands of different coordinate reference systems and different utilities that store all that information somewhere on your system. For some reason, I'm not sure why it seems like the, at least for some, again, if I'm the only person who experienced it on the cloud then then then that's not that bad. Maybe that's even more odd. But if you're getting an error here it might just mean that that specific coordinate reference system or that library of coordinate reference systems may not be available. In your like in the R studio cloud setup right troubleshooting that is going to take more time than we have today. But just kind of keep that in mind. If you go through the standard way of loading things in on your own system, this shouldn't this this probably won't come up. It might just be a storage issue or something like that I'm not quite sure. Okay, so we've got that. So we have this lovely map. Next let's try the Winkle coordinate reference system. This is another compromise or this is another projection that is trying to minimize distortion for area, but it's also trying to minimize distortion for distance and angles. There are three things that you always have to think about areas distance angles. So we're going to use the same approach recycling the code, but with a new inputs. So here, instead of having CRS equals another way of inputting it is just typing in the actual EPSG number. I find that to be a little bit easier sometimes. For my version, this is going to output correctly for your version, maybe not. And if it doesn't, it's because the CRS library isn't isn't present. But fortunately, you're probably not going to use these very often so I wouldn't worry about it. So here we can see that the map is a little bit less smushed, you know, it's minimizing distortion, but in that minimizing of distortion, it might mean that the areas are not as preserved as well as the one that we saw before. We can also try a completely different type of projection. So I searched, so a lot of when you're searching for EPSG projections, it's a Google search. So I literally searched Hawaii EPSG, and a bunch of different projection options showed up from the EPSG website like the first thing that popped up in the search. So I chose old Hawaiian UTM zone for N, and let's see what that looks like. It's clearly the wrong projection right we know that the Chicago is not bent like this so again I just really wanted to emphasize that your coordinate reference system can impact your findings. And furthermore, at least half of the errors I've seen for beginners or, you know, and intermediate, sometimes advanced users of our spatial a lot of times the issue is in the coordinate reference system itself. So I just had a student a grad student who is doing a really complex point pattern analysis and like probability surface work with spatial analysis and he was struggling with an error for over a week. It was a coordinate reference system issue. Okay, so now let's go to something that we feel a little bit better about EPSG, an EPSG that is going to preserve distance as something that we're familiar with, and that is really focused on our area. So because we're looking at Chicago, I literally Googled Illinois feet EPSG. Right. And one of the first things that popped up as EPSG 3435. I know that doesn't sound very scientific but it turns out that that is like EPSG 3435 is the standard projection used for kind of mapping and sort of survey use in the Chicago area. And you have to think about the Google search algorithm. Who are other people that might have typed in the same code, you know, the same words. Well, other people who know what EPSGs are, which are likely a pretty small kind of GIS again GIS nerd group so so this is so Google searching is really your friend when it comes to our spatial stuff. So this is more with how we're familiar with it here distance is in feet, which we can easily convert to miles for us based work. And, you know, it's not maybe preserving angles, you know, areas that sort of thing as much as the previous maps, but for our audience. This is going to make a lot more sense. Okay, so now we're going to switch from and I should I should also note, I've been extending the use of base our plots in these maps I usually don't use base our ever for plotting but I just wanted to show you that you can if you want. And you're going to be plotting just the geometry value of the objects. You can adjust the border, the thickness of the borders. Give it a title subtitles on and so forth. But what I really like to work with, especially for rapid kind of exploratory spatial data analysis is a package called T map. So, with T map, there's a lot of customization available. We're going to approach mapping with one layer at a time. Essentially, so first we're going to load in T map. And actually, having it rendered on this side might be a little bit easier than sometimes the plots can get a little bit buggy over here. Okay. But so we have loaded T map. And when we're working with T map it's going to always require two function calls per layer. So first you have to call the shape that you want to map. So in this case, try tracks. That's what we want to map. But we could also call, but then we have to add some styling parameter. It has to have something even if you take out the parameter specification and just have T and borders, you know, with the parenthesis that's fine. But here we're going to make the borders a little bit transparent. And this is your standard T map. Okay. But what's really nice with T map is that you can start to add a lot of other. Again, you can really customize it a lot. So you can add a specific color for the borders, but then you can color the fill. So the inside of the polygons, a different color, we can add a scale bar, we can see that down here. And then if we didn't like that frame that's around it, we can just add a function. And then we can add a T and layout frame equals false. I should note here, I did not have any of this memorized and when you're working with team up and things like this, it's going to be pretty rare that you remember exactly how to put the parameter, you know, input in the right place in the right space. So it's, it's highly recommended that you find suites of, you know, existing team app tutorials, the documentation, all that stuff just to look at different options. It's really, really similar to geology work where the more outcrops you see the better geologists you get. So here the more coding snippets you see the better you get at coding, because you just see more different types of options available to you. Okay. So, I really like this specific link that gives you a lot of the it's kind of like the full on recipe book of all the different function parameters. So you'll see some nuances like instead of team fill team polygons seems like it does a similar thing but they're, you know, there's some there's some nuances between that, but this is part of the journey it's meant to be fun to explore those different options. And then another thing you can do is arrange multiple maps. So sometimes we want to look at multiple maps at once. This is what we call a map panel. So there turns out many different ways of doing this. This is, again, the wonderful world of team app can look at do 10 facet maps, a lot of different options. We're going to use one version where we just take the map calls and assign them to variables and then we're going to call those variables in team app arrange. So, here, I'm going to take my team app of from 4326 which is like the standard, you know, EPSG projection, it's also the projection that Google Maps uses, and that sort of thing. So we're going to take that map, then we're also going to look at a map that was very different and let's see. And I think 54019 was the Winkle map. So we're going to write this to tracks 4326 and we'll write this one to 54019 and then we're going to arrange them as such. Alright, so here we're, you know, creating a different mood. We have this light gray background and so here and then just the name of the EPSG on each of these. I'm intentionally creating a different mood here because here the focus is not on the tracks themselves. I'm actually using a hack to erase the boundaries and just show the fill so it looks like I'm just looking at Chicago, it's almost like I just have the Chicago boundary filled in right. So I just wanted to gently show those two different types of projection side by side just to show the difference. But again, so team map arrange is pretty pretty nice. So at this point, I'm going to switch back to this side and move this up a little bit. And what we've been working with so far is the mode of team map that is static right all the maps that we've been saying are just our plain maps. Another option is having interactive maps. And for this we need to switch to a different mode view. There's only two modes it's either plot mode or view mode so this is this is an easy thing that you can memorize after a little bit of time. So here we're going to take the exact same map as we as we just did, or at least the one that we did a couple steps up, and we're going to to view that so let's let's see what happens here. Okay, you get a couple warning messages which is totally fine. And actually the nice part about working on this within our studio is that you can actually click the zoom right there in your window, and you can actually really see the whole, the whole thing. Okay, so it's taking a second for the tiles to render probably because I'm just on, you know, hosting a zoom call, and it does that. But as we zoom in here, we can see that this city of Chicago is being appropriately placed on the Great Lakes. Okay, and I really like using the interactive mode, not just to make cool maps but also just to double check that things are plotting where they're supposed to be plotting. So the nice thing here is we also have an option to try a different base map. So those are all options to us. One thing that's maybe not great with this version is that there's there's there's not the tracks are not very transparent so that we can't see the base map very well. But that's something that we can change for sure. Another thing that's nice is you can click on each of these tracks and it gives you all the information that you need. And you can, you know, clean up the data set a little bit better, and essentially you're creating your own web map, right. As you zoom in and zoom out the scale bar will also change, and you can export this as either an image but you can also save it as a web page. So I've made web maps, I mean super rudimentary web maps this way, and saved it as an HTML page and then hosted it on GitHub. So that's definitely an option. Alright, so in this next script right here we're just, let's see here so here we have all this information I noted that the fill was maybe not as transparent as it could be. So let's change that a little bit. Sometimes I'll forget the direction of the transparency so I'll start with point five and kind of go up or down accordingly. Yep, but we can, you know, render that and it will do the same thing again. And then our last step here is going to be overlaying something on top of this. So this is nice so far. Again, we're just plotting a map so it probably doesn't seem that exciting. But this is going to be the basis of a lot of work in the future. From here on out we're going to work a lot with zip codes, but I wanted you to get familiar with census tracks, because that is the ideal scale for a lot of a lot of analysis and public health, right. Census tracks tend to roughly approximate neighborhoods where, you know, better metrics may not be available. They're much smaller than zip codes so there's a lot more variation that's possible. That's really ideal. Sometimes you could be lucky and be able to work with an IRB where you can preserve that track level information, but usually that takes a lot of advocating on the behalf of the person doing the analysis. All right, so we're going to read in the zip code, the zip code data. So these zip codes are coming directly from the city of Chicago data portal. They may technically be zip code tabulation areas, but that's another can of worms I won't get into now. Okay. All right, so let's look at this. I'm going to just pull this out here so we can look at it this way. So here when we're overlaying things, we're literally just going to lay things on top of each other. So we started off with this census tract information. We keep some of the cartographic style pieces here, but now we're going to add another layer. I want to take my census tracks and then I'm going to add zip codes on top of those census tracks. And I'm going to color those new zip code boundaries with slightly thicker border and a different color so that it's just easier to look at, right. Let me see here. I'm going to first switch to this plot mode. All right, so we take the first layer we add the second layer on top, meaning it's going to happen after this first team shape. And then I like to add the cartographic style at the very end just to kind of keep things a little bit simple. Okay. And so I should note here we're going to come back to this a couple times throughout the workshop today, like the same kind of team team map on layering system. But here we can see that on as you know we have you know this it's about 10 kilometers from there to there. We can see that there's a lot of census tracks within each zip code and again I just really wanted to highlight that, although we're often stuck with working with zip codes. It's probably not the best system because there's so much variation occurring below that level. Also note that there are going to be different standards for each type of geographic boundary and object. So for example here we can see that the census tracks are kind of extending a little bit beyond the shoreline of Chicago. And so this where's the zip code boundary seems to be really nicely cropped. So there are ways to deal with this using GIS operations in our that are really are not that bad. For our purposes it's not a huge issue but again I just wanted to show you that just because you have a spatial data set doesn't mean that it's 100% accurate in the way that you're expecting it to be. So that's essentially that for now I include a few more resources. I'll just jump back here. A few more resources for those of you who want to kind of dive into this a little bit more. I love the intro to the geocomp Robin Lee Robin love or the geocomputation tax by lovelies at all. I posted on Robins GitHub. So that's a really nice intro if you want to get more about spatial databases. The geota center or our toolkit also has a pretty nice introduction that's going to be similar to what we already talked about so if you just need a refresher on that. And for projections I include a couple great tutorials and then for team up I give you a couple more. Okay. At this point we're going to go back to our next stage and at this point I'll I'll release back to Susan. We want to take a break Brunia, or, or keep going. Any, any votes from from participants in this is Barra in the workshop message to attendees we did say we do a five minute break every hour. So, so yeah we're coming up on 11 so why don't we do a five minute break. So, should we come back. Let's just come back at 1159 we'll get started at 11. Sounds great. Great. All right. See you all soon. I'm going to pause. Okay, welcome back. Get comfortable as we prepare to start. And yeah, I guess that's time. I'm all right I'm going to switch to, and I should say I've seen a lot of really I can't see the chat at all on presenting. There's a lot of really great questions. I should know there is a lot of questions will come up in this and it takes a while to kind of get familiar with with with with some of the answers so I really really recommend. I actually teach this in my classes. Like, like teaching the art of Google searching it's really like getting the right search parameters is is tough, but I've learned that anytime anyone tells you this is a better way to do it. I can find three other cases where that might not be the best way. So I think learning to be flexible. And maybe one part will work but you know, one part might not work but you should continue going to see if the other parts will work. I'll just, you know, I'll just throw that out there but I'm again we'll try to get to as many questions as we can but also, you know, definitely use the phenomenal our spatial community that that exists out there. We adapt we adapt. Extend. Okay, so let's kick things off. All right. I'm going to come back to that in a second. And I will just really quickly note as well. In the background section of the workbook. If you go to the GitHub page for SF. If you do decide to try to install locally. There's a lot of really great tips there so for those of you who are trying to do this on your own instance. Feel free to go to that GitHub page on that SF hosts and go through the common troubleshooting things of of installing it. Installing it is nine times out of 10 the hardest part of if there are hiccups it might be there but again for this workshop you're probably working in our studio cloud and that's not an issue. Okay, let's kick things off from there. Thank you, Maria. So in this next section, we're going to build off what we did in the intro where we, you know, we just got intro to some of the basics of spatial data and what we're talking about when we talk about spatial analysis in our, we created a couple of maps using the different projections to see the difference of what a different projection and CRS can generate. And then we also started building this map that we're going to build off of looking at the city of Chicago. But the map that we created was really simple right it was just in gray we're really just looking at the borders of first the census tracks, and then we overlaid the zip code boundaries. And then we saw that one map that was just the gray outline, or just sort of the gray polygon of the city of Chicago. And so now we're going to look into how do we build off of those maps and create maps that thematically map places. And in this case we're more, our lab is really interested in neighborhoods of the context of actually where people live. So we will look a lot at the neighborhood level data. So we're going to look at in this section. And what that the main concept that that introduces is this idea of thematic mapping, which is also referred to as chloropleth mapping. In the GIS and spatial world you see chloropleth very often, but once you get out of that world that word often throws people off so thematic mapping is just another way of describing the same thing. So thematic or chloropleth maps are maps that we are all familiar with, really, whether or not we use them in our, you know, use them and receive them in work in the, in the medical or health field, but also just we see them all the time in the same politics they're used everywhere to represent quantitative data through colors or patterns or shading sometimes in different geographic areas. So here, the map that we the map presented here as an example of a chloropleth map this is just a little snapshot of the southeastern United States. The units there you see are US counties. And this shows us. I forget even what rate it is what what data it's representing but it's kind of irrelevant here but it's showing a different data. Classified an equal interval classification with five things. So what that means is what that refers to the different ways that you can classify and then thematically map data. So there's the data classification method, and then there's the number of classes or bins that you use within that classification method. So if we go to the next slide, we're going to highlight a few of the many options that we can use to classify data in mapping and visualization context. So we're going to generate each of these maps in in the tutorial ahead. So first, the different data classification methods can arrange your data in different ways, using different boundaries to separate data classes. So for example, in a quantile map data is grouped into. Again, we use the words class or bin kind of interchangeably, but data is grouped into classes that each have the same number of observations. So, if you, if you, you know, if you're pinning data into three different groups or three classes it's technical technically quantile map for classes, the quartile map five classes as a quintile map and so on. So as you can see here, this is a map of the city of Chicago using coven case rate data. Again, we'll be using, we'll be looking at some of this similar data in the tutorial coming up but it's grouped by this is a, this is a quintile map grouped in five different bins. So here, you can see that this is the same data set, but the map looks slightly different. So this map was is a natural breaks map. Natural breaks is also sometimes referred to as a jinx map jinx was the was the as the scientists that developed this method for data classification. So in a natural breaks map data is grouped so that the within group homogeneity is maximized. So this can mean that in contrast to the quantile map, the number of observations in each class might be highly different. So, for example, you see here, the darkest color. The darkest color purple or maroon on the map on the sort of left side of the city west side of the city of Chicago is the highest has the highest coven case rate at this snapshot at that point in time. There's only it looks like there's like four zip codes that have that, that are in that category, but then you see like one of the lighter blues, there's, you know, 10 plus zip codes in the city of Chicago, that's in that classification so you can see the different number of, it's not about the number of observations in each class but it's about maximizing that within group homogeneity of the data as you classify it, hence the term natural breaks of the data. And lastly, we'll be looking at today, the standard deviation map. So again, this is just a different way of visualizing your data standard deviation in particular is very useful for helping to identify outliers. This, you know, this is the same concept of in, you know, more classic statistics. And so in this in standard deviation map, the variable of interest is transformed into standard deviational units. So covering the range from the lowest highest so showing which, which, in this case zip codes are two units of two standard deviations above the mean or or one or below the mean as well. So we'll look at this data. It doesn't look that different because we're just looking at the city of Chicago right now but this can often yield a very radically different map versus the quantile or natural breaks map. So if we go to the next slide. I just wanted to highlight, we're going to look at those three maps, we're going to generate those three maps today, but there are several other data classification methods that are commonly used and that you might be interested in knowing about or using in your work, but there's equal intervals, which is pretty self explanatory but it's classifying the data into equal intervals. So again, that's, it might be different number of observations within each interval, but the data might be cleaner presented in that way. There's manual intervals which again those are the intervals that you can set and predetermined to show data in a way that that might be more appropriate. And then there's also a box map or box plot which is again another way of showing the data kind of similar to a quantile map where it's identifying the inter quartile range of the data but then above and below that range it's identifying the upper and lower outlier. So, again, these are all just different maps and here we see graphs that are visualizing the data showing those showing those different classification methods. And next please. So those are these three. Here are these three city of Chicago maps of the quantile natural breaks and standard deviation maps. Choosing the best one for your data is is definitely not a precise or exact science all the time. It often takes exploration. It often takes some trial and error and looking at your data in different ways. Choosing the best is really going to depend on your data and also the questions and answers that you're trying to investigate and the findings that you're trying to convey. Another factor in this might be the audience that you're trying to that you're sharing this information with. That's why we really like this as Murni mentioned before the process of exploratory spatial data analysis which is the which in which we look at maps really at the beginning. So we look at the data and maps in different ways in the beginning of the research process and then at the end as well as as an investigative method and making sure that, you know, we're, we're considering looking at the data in different ways before choosing which one is most appropriate. So with that, hopefully those will that will be some helpful context as we dive in a little bit more. And I think that's it for me. I'll have, I'll pass it back over to Maria and dive into the code. I just want to note here that there is also, like, there'll be a very, how do I say this, oftentimes I find the health community goes to quantum maps immediately. But that's often not the most, like, if there's such a thing as correct, that's often not not the most correct according to most data that's out there and there's a lot of different limitations that come with that. So that's another really important part of, on, yeah, of doing what we're doing here. Okay, so we're going to go back to sharing the screen. So if you who didn't hear me earlier, again, if you have this more specific questions about SF, check out their GitHub, that will be really useful for you on for a lot of different pieces. But before I get into the more more details there I also wanted to talk about, like, you know, why would we, why would be, why would kind of extending what Susan talked about already like why would be the mapping neighborhoods. So this workbook is following kind of this, you know, thought experiment of a case that you might do it yourself. So, we're interested in, you know, the health of people but that is going to necessitate some thinking about the neighborhood environment right. So when you're looking at a neighborhood, that could be the neighbor population level health outcomes at that neighborhood level that could be premature mortality at the census tract scale or cumulative COVID rates by zip code. So we're going to use cumulative COVID rates for a week in September, as our example. So sometimes we're interested in like what may be associated with that, that might be explaining or driving the disparities. So that could be neighborhood factors like poverty, access to affordable housing distance to nearest health provider distance to pollution facilities. So these are the quote unquote social determinants of health at the neighborhood scale that are increasingly urgent in modern public health thinking. And these are thought to drive and or reinforce or magnify racial social and spatial inequities. So oftentimes when we're looking at the neighborhood level environment, we're looking for those inequities essentially. Because if there is no inequity, you'll have a spatially random map right there won't be any obvious patterns, but oftentimes we see geographic correlation between these different terms and that's going to lead us to more sophisticated thinking and hypothesis generation and that sort of thing. So again we start with the quarter plus mapping, just to kind of get a grounding of what we need to do. So I'll go back to my environment. And we're going to switch to our quarter plus map. So in this case, we're going to be pulling CSV data. These, this is this came straight from the data portal. And it is, as you might expect, for data portal data, a little, you know, it's not in the format that we need it for for our analysis. So we have a zip code, we have a week number week start weekend. A lot of different parameters. Notice how super super long the column names are. That's, that's usually a no no and spatial analysis, because we come from the days of data storage, or there's limited data storage. So, like, for example, the shape file requires, I think no more than 10 character attribute names. Right. But again, a lot of this kind of epic work will be done outside of that. So you'll find, you know, this is this is pretty similar data. This data set also has zip code location. But if you look at it, you shouldn't trust it. Right. This is zip code level data and this is being indicated as a point and look to the points are identical. So, so it could be the same zip code but zip codes are not represented as points that represented as polygons. I'll give you a hint. It's probably the centroid of the zip code. But again, we're not interested in the centroids of the zip codes. We're interested in mapping actual neighborhoods. So, again, just be cautious and skeptical whenever you're working with data. So this data is in what is known as the long format where every single row is corresponding to some, you know, health outcomes at a different time period. Okay. So this means that the same zip code will be repeated multiple times. That's useful for some type, some type of analysis but in our analysis that doesn't work we spatial analysis doesn't make sense with long format data. We need to shift to wide format data for a lot of work not not all work but for a lot of work. What that means is, you know, each zip code will be a separate row. And if you want to retain the temporal data that would be represented as different columns. Okay. So if you do want to do spatial temporal analysis, it would need to in some GIS systems it would need to get converted into that way. So for our purposes, we're just trying to kind of get familiar with things. So our goal is going to just extract one week of data, and then just use that subset for our work. Okay. So first, how many, how many weeks are in the data. Okay. Let's see here. All right, actually let's do week number. I think that might be an error. So it seems that there are 40 weeks in this data. Okay. So, again, if you get a different result on that that's not a huge deal. This is essentially just we're trying to understand what are the total number of weeks, just to get an idea of what we're working with. And then our goal is going to be just a subset and inspect one week. So I've decided let's subset and inspect one week week 39 and week 39, we can see starts from September 20, and it goes through September 26. We're going to be interested in the case cumulative rate. This can be a really useful tool when looking at spatial temporal trends for COVID specifically. And because of, I mean, the because is a long reason and we can chat about that at the very end. But for example, Susan and I helped lead up the US COVID Alice, where we've been doing spatial temporal mapping of COVID since last March or March 2020. And so just looking at a cumulative rates is going to give you some idea of what's happening, especially by this point in September. So we're, we're taking it as it is. We're not saying it's anything more than it is as well. So we're going to just subset and inspect. So, so those of you who are tidy users, you may have, you may have audibly gasped at how I'm sub setting. And again, so I promise that I was providing information here in a very, very basic way, but if you want to show off your tidy skills, this is a great place to do it so you could either show up those tidy skills reshape that whole data set. Or you can just, you know, use it to subset it in a different way. There are a million different ways to work with the attribute piece. So that's totally up to you. But at this point now we're going to have data for zip codes. Let's see what the dimension of our data set is. So we have 60 zip codes in our data set. Okay. So we're going to keep on going here. So let's clean up our data a little bit. For our purposes, let's say we just want to include the zip code ID, and then the variable of interest. So again, so the real view might be audibly gasping because, you know, it's so long you want to rename it. You could have just pulled this all together up here. That's totally fine. But again, I'm just trying to show you the very basics so that it can be flexible according to how each of you have your own, like I want to respect each of your own ecosystems and give you that chance to update if you need to. So here we have kind of a slightly cleaned up data set the zip code is listed as a factor in the case rate is included as a double. All right, at this point we're ready to merge. We're going to merge this data set with our master shape file that's at the zip code level. So if you need to, you can reload your zip code. So if you're working in one straight session, obviously you don't need to reload it, but that's just something to consider. And before we join, we have to make sure that we know what we're joining on. So this is the trickiest part of merges. So we have on one hand, the COVID data is going to have the key zip dot code like that. And then our other, our master shape file that we're merging to is going to have zip as a lower case. Okay. So here we're using merge. I've been finding that merge can be very buggy. But I'll just that'll be for another discussion. So here we have our spatial file. Remember, we always want to merge data to the spatial file. The spatial file is our master file. That's what we're merging everything to, you know, later on we can out write that master file to a CSV. We want to just include that like the attribute information, but always always start with your spatial file and then merge non spatial data to that. Okay, so we take our spatial file, we take our COVID subset on the left hand side so the zip file is going to have a key of lower case zip and then the the COVID data will have that information. So we're on this one up here that I X out I would like to do that but that creates issues later on. And I might have gotten my. Yeah, that's that's for later discussion but for now on this, this is going to be much more workable for you and following sections so here we have again each zip code. And it's the same spatial file right it still has that object ID area length geometry but now we've also added this field right here of case rate cumulative. Okay, so starting with the classic epi approach we're going to look at our data as quantiles. I already noted that this is not my favorite method, but it's the most common. I think it's mainly because there's, you know, there should be equal number of observations in each one. One thing I will tell you ties. So if you have many, many, many values that are zeros or ones, or if you have a lot of data that has the same value. Quantile maps don't really work well with ties at all. And if you don't check for that, you may have wildly different observations in each group. And again, quantile maps don't really care about the actual like the, the distribution or like the clustering within like the jinx maps do so to just be really cautious when you're working with quantile maps. Okay, so we're going to start there though, we're going to use our favorite, you know, initial mapping library the team app library. And what we're going to do here is just add another parameter called the style are called style so here in style we can specify what type of map we want to make. So if you want to look up the documentation, it can either be under team polygons or team fill, you'll find it in both. You'll see what the options are that are that are available to you it turns out there's many different options. We're going to just look at three. So we're going to, you know, take our zip codes plot the code case rate number. And here we get to use a palette of my favorite palette boo boo, be you pee you it's like blue purple. And there are a lot of different gorgeous color palettes out there so color brewer, it turns out there's a whole art and science to choosing color palettes. It has to be pleasing to the eye. You might have to be thinking about if you if you want it to be accessible right so having reds and greens is a big no no if. And if that might not work like that that won't work for a public community for example. And, and so, so thinking about palettes is really complex and then color brewer is a really. If you just look up the color brewer library online. That's like a little app that will let you explore different colors. And then it turns out if you just Google our color blur palettes. You'll find all the different palettes that are available to you as an image. So let's go the hardcore way, load the art color brewer library, use that function and you'll see all that as well, but I think that just googling it can be a little bit faster. All right, so let's map this. So here we go. So here we have quantiles. And so, you know what is this tell us. So I didn't tell it what type of numbers I wanted to how many bins I wanted so it gives me automatically five bins. Oh here this one zip code didn't have data. But there definitely seems to be anomalously higher COVID case rates on this west side of the city. And also this one really tiny zip code in the downtown area. Remember when you're working with aerial data, it's there's a visual distortion happening where smaller areas are usually smaller because there's higher population so there's more people there. So they're visually smaller so they don't seem as important but again, I'm keeping out keep your eye on that to see how that how that is affected. But you know, what we can see with this initial one is that it's not randomly distributed throughout the map here. That suggests there is clustering behavior happening. So in a statistical decision yet I'm just observing that there might be some sort of clustering which might make sense for an infectious. You know thing. All right, so let's try turtles turtles or turtles, turtles. I've seen a lot of turtles or turtles and academic journals lately, and you can see here it does not the best job in capturing the, you know it is doing a job. But we're starting to lose the nuance of, you know places where there might be a slightly more intensive COVID being experienced so keep that in mind. So we can try the standard deviation map. And again, I'm not going to go through all the documentation and team up. That's for you to check out. If you're interested in this to really understand like which standard deviation measure is occurring at each one. Also, our group also uses Geota a lot and Geota also like they they visualize the brakes a little bit differently so that it's more clear what the standard deviation breaks are but here you'd probably need to double and triple track to make sure that you're you feel solid about it. Okay, but we're looking at this map and again here at the intensity definitely seems to be occurring across these four zip codes on the west side. It seems to having this area of slightly less COVID between these two hotter areas is important because that suggests that this might be very vulnerable right to COVID because you have rates increasing in two areas. If you just look at it like this you're treating all of those areas the same way, whereas the actual vulnerability is probably pretty different. And here you can see that tiny little zip code in downtown is maybe more vulnerable than the highest numbers. Okay, so finally let's try the jinx map. The jinx map is all like another way thinking about it is it's like almost doing a univariate clustering algorithm on the data to look for like the groups within. So we're going to start with that. And here, and the reason that we want to do maps multiple times like this and try different data classification techniques is because real trends persist. If it's a real trend. I'll see it in every map. Whereas, otherwise, we're not sure if how if the method it's actually artificially, you know, is the observation coming from an artifact of the method or is it coming from the actual data, we have to always think about and wrestle with that with that bias. And here we again see those four zip codes show up so I feel really confident that those like that's that's a real trend that's occurring, and that there is that dip between those two areas. And here again we do see that downtown section again. But if we look at this a little bit further. One thing I'm not super excited about is this lower field 385 to 385. There's not much variation there. And it looks like there's just one group for that belongs to which, which is not super great. So let's try four bins instead of five bins to see if that takes care of that. Okay, so now we see that this lone guy down here joins this category right here. And that's pretty helpful. Okay. And so here I've also added something a little bit, you know, I. So in the testing maps we have the map and the legend populates automatically, but here I've added what histogram. So, this is, again, the fun of team app. When I first found this I made made sounds I was so excited about it. And there's actually just another parameter that goes in your tea tea Phil or tea tea and polygon on call where you're simply saying the legend that histogram equals true. It's automatically on false but if you switch it to true, you're going to get a gorgeous histogram that shows you how the data is distributed. And you can see from this histogram that this group of four they're notably above that that kind of the next group so there's something really unique happening there. Here, also just by looking at different team app, you know examples out there and that sort of thing. And I have added the scale bar but I've also pushed the legend outside of the map. And this is a really neat trick where on the set of cluttering your map and it will get cluttered in team app if you have everything falling on top of each other you can just push it out of the map and that looks really nice. All right, so at this point, and we're pretty happy with that. But you know we might actually want to integrate more data right so this is just coded data but there's a lot of other data that you know even as we're talking about a hypothesizing. Is there something uniquely different about this part of the city than other parts of the city like what's like what are some of the potential drivers of the patterns that we see. So we want to bring in more data on here. I'm going to be bringing in data that's already cleaned on which makes life a lot easier from the opiate environment policy scan database. Maybe I'll ask Susan to talk a little bit more about that in a, you know, 30 seconds feel as she as she does the next one. But this is something else that our group produced that the GitHub IO is linked in the workbook. But essentially this includes dozens and dozens dozens of variables that represent social economic policy. Since you know many different aspects of that reflects the social determinants of health that may impact specifically justice populations and persons with opioid use disorder. But that ends up being just about every, you know, a lot of the factors that influence that also influence a lot of other disease and challenges. And so we're going to pull from that. And I'll, yeah, I don't want it we're starting to fall a little bit behind on time so I'll maybe share a link or I'll take you to the side a little bit later. But essentially here we're going to pull in a data set that is just one report so one report that comes from this on this is from the 2018 American Community Survey. We have for every zip code, a whole breakdown of racial, ethnic, age ranges. Some. So here is no high school diploma. Here we have over 65. And then, you know, also younger groups which might be interesting and useful for different analysis as well as percent disabled. Okay. So this is a clean data set. Again, it would be a whole other tutorial to go through the data cleaning but on, but here it is so we're going to now merge this to our data set. Okay, so we merge again. Just and again we have to be careful about our keys. So our master data set has the lower case it. And then in this example, it says the CTA, which is zip code population area. So we're going to look at that. And now we see, we still have the cumulative year, but now we've added a lot of other information. Okay. Sometimes when I merge it does this thing where it splits into X's and Y's that doesn't always happen. So I don't know what to tell you guys it didn't do it earlier today, but we're going to come back to that in a second. So if this happens for you. You might have to just add a dot x or a dot y a little bit later. Okay. All right, so here, we're going to create a thematic map panel. And we're going to essentially take a very simple case rate map using Jinx, the blue, the blue, purple palette for bins, give it a title, take out the frame, but then we're going to also replicate and do the exact same thing for several other variables, because we want to think about what are the potential drivers, right? What are the potential disparities that are existing. So disparities you might think about racial disparities is the burden the same between black Hispanic white populations in Chicago. But you might also want to think, you know, is it race or racism, which is the direction that, you know, thinking is fortunately moving. So I want to think more about what are some of the contextual population based observations that we can find. So for example, are there parts of the city that have disproportionately more seniors, which happens in Chicago and many other places, or because of high economic disparity where folks move and the ones that remain maybe very old and very young, very vulnerable. Or it could be more of a Arizona, Florida phenomenon where seniors are intentionally moving into places, right, for a different type of retirement experience. So in that case this may, we know that that was a population that was disproportionately impacted by COVID so maybe we want to look at that. But then also we may want to look at no high school diploma just as one of many, many different kind of social economic indicators. So we're going to do this. I actually don't think this will work the first time the way that I have it set up because of these X's and Y's. I'm going to just quickly troubleshoot and just add an X to each of these. Again, you have to think on your feet and many times in my experience working with this what works fine one time may not work a little bit later and just restarting my session can fix a lot of stuff so just keep that in mind. Okay, but it did render correctly, which is wonderful. And here we see our maps. Okay. So, and again this can get you can you can continue to refine and update this. So I because of that really long COVID name, I rename that COVID rate and you can use the same kind of coding snippet to name the other variables accordingly. But here we can see some pretty interesting things like there are some striking geographic correlations between potentially between COVID rate and then where the Latin X Hispanic community is existing as well as, you know, as well as other features so again we have to move beyond. I think I have some some work here about that. Right. In modern spatial epidemiology associations must never be taken at face value. We know it's not race but racism and other isms that drive multiple health disparities, simply looking at a specific group is not enough. Especially in spatial analysis we want to look at the spatial disparities right we want to explore multiple variables and nurtured curiosity to understand these intersections. So kind of keep that in mind. Right. And so, and again this is just a snapshot of what work hope it was for one week in September, as you can imagine that shifted dramatically. Before and after and since so there's this is an ongoing thing. But then also as you're looking at this, hopefully you're, you know, intrigued and thinking like, huh, I wonder if it's that or I wonder if it's this. So this this is essentially where you are going to pivot to think about what are some other variables that you need to bring in to refine your approach. Maybe you want to bring in essential workers. Maybe you want to bring in a different age group. So, specifically in Chicago, a lot of the Hispanic Latinx community is very, very young, which is interesting, right. Because we know there were different school challenges in different communities. So maybe looking at percentage families or percentage young children. What about internet access. All right, how did it access to internet influence these things. So, and on the other hand, we also know that comorbidities were pretty important in this field so you could also look to something like the Chicago Health Atlas or work with your family to just pull in data like on asthma hypertension and other data at the same or similar scale to start to compare side by side. And I just wanted to emphasize again that this is really meant as an exploratory investigative type of phase. If you have already decided what you think is important or if you're only care about one variable than this approach is, you know, it tends to kind of unravel that pretty quickly. And here, you can write to your spatial file of choice. I will note I have found that the drivers in SF can be a little bit buggy so those of you who are passionate about package development, you should chat about that. But another thing that I can do that's pretty foolproof is just write your final zip code merged information to a CSV, which is much less likely to have any bugs or issues, and then you can just rejoin that CSV in a different session if needed. But at this point let's start the next chapter and then we'll take our break again. Perfect. Let's see, Marnia, do you want to share your or I can share my screen actually doesn't matter. Why don't I do. Oh, she got it. There we go. Perfect so in this next section. Now that we've you know generated chloropleth maps plotting data by by community area by neighborhood in this case we're looking at the zip codes in Chicago. It's hard to add additional resources and bring in additional data variables to our data set into to our maps. So, next slide. I'm just going to give an overview of geo coding, because we're going to be adding some point data in the context of this invest this exploratory investigation that we're doing and looking at in Chicago, we're going to be adding. We're looking at the intersection of the COVID pandemic and the opioid crisis in the United States, specifically in Chicago. There have been a lot of intersections of these two public health crises over over the past year and a half. And as I mentioned that our lab does a lot of research and thinking about is how what is access to resources for opioid use disorder look like at the neighborhood or community level. Of course, in the last year and a half of COVID, how has that been impacted and exasperated by COVID as most all facets of life have been impacted in some way but specifically how how does COVID intersect with community access to opioid use disorder treatment and opioid use disorder rates. There's a lot of access to treatment so we're going to be geo coding point data using the addresses of methadone clinics in the city of Chicago. This is real data. It's data that we pulled from the SAMHSA, which is the substance abuse and substance abuse and mental health services administration, which is the administration the US that oversees at the federal level a lot of the drug treatment and mental health resources. So, we're going to be geo coding these points. At the foundational level what is the process of geo coding. So geo coding is terminology that we use a lot. And we use this process a lot in in spatial data science and spatial analysis. This is the process of converting addresses. So, like residential addresses street addresses into actual geographic coordinates using a known coordinate reference system. So again that CRS is really important here in the geo coding process because we're basically taking a piece of text and translating that into an actual location and a point in space. And we can use those coordinates, which are usually latitude longitude coordinates to spatially enable data. So again, the input is usually an address. So it's a piece of a text address. And in our case it's going to be the address of different methadone clinics, and the output is we're going to get latitude and longitude coordinates for our specific CRS that we're using. Next slide. So in our geo, geo coding is actually, it's actually pretty easy in our, I don't want to say it's too easy because of course we run into issues all the time, but this package is really does make it really simple and easy to geo code, if you have all of your address So we love the tidy geo coder package. And it's pretty consistent and generates usually pretty pretty good results and in accuracy and precision. So with the tidy geo coder package. We, you start by reading in your CSV or text file with your addresses. So we'll see here that we have the address column in this data set that's like 4453 North Broadway Chicago Illinois and then the zip code. So we're going to read that in an R into a data frame. In the next slide. We're going to be doing a little bit of cleaning and preparing the data because to put it into the format at which the tidy geo coder package and the geo code function can read, but we'll be doing some cleaning and preparing of the data and then using the geo code function here in dark blue will use that to geo code the addresses to generate the latitude and longitude coordinates for our specific CRS and that methadone clinics. data set has already been projected into our into a CRS will see that in the in the tutorial, and then using that geo coded clinics. The data set will be transforming or converting that to a spatially enabled data set, which we'll call methadone SF, but will that for to do that will be using the ST as SF function and transforming that into spatially enabled data that will then be able to plot on the map. And next slide please. So, again, after the addresses have been cleaned geo coded and then spatially enabled, they can then be added as a point layer on a map, and here we see a methadone clinics portrayed this is again just a very simple map, but shown as points overlaid on the data. So we'll be walking through how to do that in the tutorial in the next section. And I think, well, why don't we take our five minute break now. So, set your alarms and come back in five minutes, that'll be about 115054 am central time us 1254pm Eastern time us. So set your clocks, and we'll be back and see you in five. Hi everyone welcome back. I think we're still recording. So, just to sum up if you're just joining us, we just talked about thematic mapping of neighborhoods, and we just did an overview of geo coding data. We had a great discussion in the chat about CRS and projections and where CRS is is needed in the geo coding steps. So I think we'll dive right into that, and I'll hand it over to Maria to bring us to our. Oh, you're still muted Maria. Okay, so lots of great questions for sure. You guys are on top of it. Yes. So, okay, talking more about what Susan was just saying. So I think a lot of times clinicians think of the only time they might want to do quite code something our patients and that ends up being complicated for a number of different reasons, but I'm really encouraging everyone to also think about all the other things that exist in our world as locations. So how do we use point locations for different types of spatial IP research, what we can think about health providers hospitals clinics pharmacies, you know all that stuff will be talking about medication for OUD here today. So this can be translated into distance to nearest or distance to the hospital as a way to mitigate you know if someone is living closer or further away from the hospital is that potentially confounding some of the health outcomes that you're saying right. So area resources so if you're interested or need to kind of control for while your patients are coming from different places some will have more or fewer grocery stores playgrounds daycare centers schools, etc. So we're being able to digitize that as as location information is going to be very useful for a lot of different things, then also area challenges. So crimes crimes are often included as addresses or lat lungs already, but also things like super fun sites, you know, pollution and emitting facilities. That's the correct way of saying it's I don't want to just say factory because a lot of different people do emit pollution. So that comes from the National Emissions Inventory. So, you know points obviously can also represent people. And that's what a lot of the chat was on fire about right well you guys want to geocode your patients as soon as possible. So here lies. I mean, a good open source toolkit is not going to use the same exact tool for everything right you want to broaden your toolkit so geocoding patients specifically may be something that exists in a different element that you then bring into something like this so I just wanted to note that. So I said, if, if in one you have access to those individual locations, you can work in, you know, work with that patient level data in a secure environment under an IRB board, or under their approval, and that sort of thing. And I'll give a few hints of, you know, the geocoding piece specifically but please note that geocoding is separate than working with the actual data. So you may need to get a different geocoding service that is outside of our. However, you once you get those locations from a geocoding service you may be able to use our in an offline secure environment to do initial some of the additional data wrangling that I'll be showing you for the rest of today. Okay, so in this example we're going to start with addresses of a few methadone maintenance maintenance. Services in Chicago for those of you who don't know it's one of the few the very few evidence based medications to reduce mortality from opioid overdose. And so that's why we're talking about it today. There's all sorts of policy challenges but that's a different discussion. And I found a ignore that little extra in there so we start with addresses which are characters, we need to change that character into some coordinate system. And that process of turning it into a coordinate is going to require a court, you know, a coordinate reference system and that is all going to be supplied to us through a geocoding service. So we're using the tidy geocoder to get ours done. But because this requires an API, you obviously can't use this for HIPAA protected data. For offline geocoders there are a lot of different options out there. Please be wary of, you know, consultants who promise, I don't know, or promise a lot not deliver as much. If you are working with a group that is pretty technologically savvy from our team, we recommend Polius. It's an open source geocoder that you can actually have a local install on so you can kind of create your own geocoding service using Polius. I believe MAPSEN, the folks from MAPSEN, which is a really big startup, moved to or transformed to Polius. So this is a really phenomenal service. If you feel that that is something that your team can do, it has kind of a docker installation and that sort of thing. So that's doable. They also have a version where you can ping the cloud. Basically, if you're building your own offline geocoder to comply with HIPAA, then you'd have to go a different route and do that local installation. And the way that geocoding works is essentially you have a network topology of all the streets in the US, and each street has a lot of information attached to it, including addresses. So it's converting the address as character to a specific place on that street network topology. Oh, can you guys see my screen? I got a weird message that now you are able to, but hopefully you've been able to for a while. We can see the Polius, but. Okay, that's right. Okay, yeah. So, but essentially, so the way geocoding works is that you're working with a street network topology. Street network topologies are very, the census offers one for free. It's not that great, which is why the results of geocoding are not always the best. Polius has spent billions, I mean, I don't know if billions, they've spent a lot of money on improving their network topology, look up the ground truthing project from 2013-14. That's why their routing is so good because they spent a lot. But again, there are other versions. So Polius is one, they use a street network topology that you can locally install, and then search addresses that way. And the topology itself has a coordinate reference system in place, which is why you're getting the coordinates from that. But then, in addition to that, some of the more conventional options are going to be Esri. So like, for example, our group, like, you know, we could use Polius, but our library offers Esri geocoding, both online and offline. So that is, you know, that might be a specific contract. And when you're working with folks in this area, be very specific about what your needs are. Have an idea of how many, you know, addresses you'll need to geocode, how often. Is it a couple thousand? Is it a couple million? These are things that you have to really think about. But again, that might be a specific investment and that's okay. But that being said, let's switch over to geocoding as we are. And because it's already noon, I'm going to do a little bit more on the just rendering side versus console side, just because I want to leave some room for questions. So we're loading in our library. We're going to read in a new data set that we haven't worked with yet. And this is just a CSV that has name, address, city, state, zip. This is what a lot of data is going to look like. So these, this data came from SAMHSA initially, which is, or you can just look up the acronym. This is going to be publicly available information about where some methanol maintenance locations exist. However, whether or not they're actually open or accepting clients is another story. So that potentially would be a separate analysis is looking into that and updating this data set further. At this point, you never want to throw all of your data at a geocoder at one time, you start with one, you start with one example so here we just manually type in, you know, this 2020 to 2060 Elston Avenue, Chicago Illinois. I'm not even including the zip code here, which is not great, but it still works. So the cascade method within tidy geocoder. That is actually going to ping first the US census and then if the census doesn't return a latitude longitudinal then ping open street map. So more sophisticated geocoders will also give you match percentages or map criteria to tell you like what is the likelihood that that is a real match. And that's, again, another discussion but you can easily find out information about that if you search online. But in this case we're assuming if we get a latitude longitude, we're fine. Next, we really want to do some data cleaning to work with this data set to prepare it for geocoding. And that's the same thing you have to do for any API service, right. So check out the documentation of your geocoding or whatever API you're working with and get your data in the right format. You can see, because I just use a very basic read that CSV and not some of the other options out there. Most of the data that should be a character is listed as a factor. So I'm going to change all of the address city state zip to a character, and I'm also going to paste them together so we just have one line one attribute the full address to push into the geocoding service, and that's pretty common for geocoding services. And so we're going to do that here. And at that point we're ready to go. So you click on it. There's not that many clinics. So it takes a couple seconds. But again the services are really going to differ depending on your internet connection, the efficiency of the code behind that you may not see of the function and that sort of thing but we've had pretty good luck with this one for publicly, you know, available All right, so we have this information if we look at it, we can see here that there are two that did not geocode properly we're getting any values. So we can never convert null values to coordinates. And in fact, SF won't let us SF will give us an error if we try to push something. If we push this data as is to a spatial object, we will get an error saying, you know, hey sorry I can't change null values to coordinate system so we have to make a decision. So this is a workshop so we're going to take, you know the easy way out and just omit it. Or, I mean, and even more dramatically like omit in anything with with null values. But in a proper kind of environment you should inspect further and start to determine like is there a pattern are the nulls happening only in some places and not others. I've worked a lot with medical examiner data, and you'll have six or seven different medical examiner data is having six or seven different ways of recording their data. So, sometimes it's worth to really try to understand what's happening there. In an extreme situation, and you might actually have to, you know, search this address, and go to, you know, go to Google Maps search it because again they have the best though proprietary network topology. And then there's an option of what's here. I think if you right click anywhere on a Google map that option comes up as well. And then another option is just inputting the latitude longitude. And they do that because they also use EPSG 4326 right so they use the same coordinate reference system as the ones here. That's the only reason you can do it that way. Okay, so we've omitted this. And at this point we're going to convert to spatial data. So here we're not changing anything we're not changing a coordinate reference system or anything like that. And we are actually respecting and acknowledging the coordinate reference system that that longitude and latitude are already in. So this ref this coordinate reference system it's going to look like this, you know, 40 minus 80 for the Chicago area. When it looks like this. And in this case we also know it's EPSG 4326 right. So when we're converting to coordinate system, we do not add a new coordinate reference system, we have to use the coordinate reference system that that latitude and longitude are already in right that XY. There are thousands of different ways to represent XY. This is one way that is being used by 4326. So we pass in our CSV or our data frame object. And then we pass in what CRS was used to generate those coordinates. Okay, I'm here. Note that a lot of times when people talk about latitude longitude they say it like that. What's your latitude longitude. Well it turns out that it's actually flipped around. That's why actually corresponds to longitude latitude. This error is so egregious that you'll actually find developers mislabel the latitude is longitude and vice versa. This has happened before. So, so be careful, right. Your first steps should be say it's longitude and then latitude, especially if you're using a geocoding surface because you can be certain that they did it correctly. But in the future if you're taking just another CSV that already has address level data and converting it you may need to double check. And what do I mean by double checking as soon as you finish this you're going to want to map it out so here I'm going to switch to this interactive version. I'm going to map those dots right on top of Chicago so now I know without a doubt these geocoded correctly. You have to do this after you convert. Otherwise, I promise you there will be much misery today later, because this it worked this time, but many times it won't work and that's those are classic GIS errors whether you're using our or Arc GIS or whatever. So, you know, what could have gone wrong did you potentially flip the longitude latitude values. Maybe that was an error in your part. Maybe it was an error on whoever developed and shared the CSV that you're working with. Okay, another option is did you input the correct CRS. We knew that the CRS was 4326. Many times you get data from the internet or from colleagues, and that is not shared with us so we think it's 4326 but it could actually be something different. That's a really tough era to resolve, but that is something that you'll have to do. And then for the first part, like for example, specifically in this this area in Chicago if I flip it. The coordinates will plot I think on the west side of Africa. So that's how I know that, okay, I, that's like the inverse of the long here. So that's how I know. But again, depending on where you are with your location you might want to get familiar with, you know, where is the place in the world that's your inverse. So that, you know, it'll be, it'll become a familiar error to you. So at this point let's overlay, we're going to bring back our, our zip codes. And at this point and at this point I'm bringing a zip code that I've written to as a geo JSON in the previous one. And we're going to switch back into plot mode. We're going to start plotting so we're going to plot the same piece from last time, the COVID case rates, jinx, we have all that information, but now we're going to overlay in our clinics. When you're overlaying clinics, we, when you're working with point data and there's more description about this in the workbook, when we're working with point data we're either going to show them as dots as individual points or as bubbles. So I'll talk about the bubbles in a second, but first let's talk about the dots. So dots are just points, right, we're going to have a pretty small size point to we're going to give them a dark gray color. I like to work with the gray range, just to kind of adjust for, I don't know that that's just my style. We're going to also add. So we keep our cartographic styling will put the legend outside and keep it to the right. Additionally, in SF right now, symbols don't get the same legend TLC that areas do. So we have to actually add like hack a manual symbology to add the, add the dots in the legend. So this is a trick that, you know, from the depths of some stack overflow exchange on TM add legend on this symbol is going to be like that's the call for a dot basically. We're going to color the same exact color, same size and label it method and moud. So we're going to do that. All right, so here we go. So we see that and again, and we don't we didn't map this right away because we want to go through the steps of ensuring that the points are plotting where they're supposed to be. We're going to be using this at code quite a bit or this at code area data set so we don't have to worry about, like we know that that's being projected properly and that sort of thing as well. So these these two things are being overlaying on top of each other for overlays they don't technically have to be in the same CRS, they could be in different ones, but it's a little bit open to potential errors so just kind of keep that in mind. It's projected however it thinks it should be projected. So you could get some strange behavior happening but in the next chapter will will show you how to just to kind of, you know, project everything to the to one standard projection to start but though you already have some hints from that from the first thing today. All right, so next step. Let's bring in some more data, right. So oftentimes we, this is another option so we more often or not will have data that we find that will have an address that we want to plot. So this might be with crime data for example right you'll have crime data that has, it may have the latitude longitude already. So you don't have to geocode it. So for that kind of thing. Here we're looking at affordable rental housing developments. We're bringing this in just as another way to kind of think about this could be useful for analysis on maybe better outcomes are associated with better access to affordable housing options and mass. Or because it's COVID we could also hypothesize that on these more population dense places could be more vulnerable to airborne disease so you could do the same thing for nursing homes for example right. But in this case, we're going to hypothesize that access to secure and affordable housing is a plus. Or at least for our moud population of interest. So here again this is straight from this the city data portal. We have the address and I can just tell you now looking at the address this would be difficult to geocode because you can see some really strange ways that it's being recorded. So this is corresponding to total number of units per housing development. So there's an additional attribute to our point level data set. And then here we also see a lot of information we see X and Y coordinate in this, you know, so this is one coordinate reference system, essentially, and then we also see latitude longitude which is a totally different coordinate reference system. So, and, unfortunately, the metadata for this file is not very, you know, as is very common for data portals we don't, we don't know what the coordinate reference systems are. So here we essentially have to guess, and we have to take a chance pick a coordinate reference or pick a coordinate XY pair, and then, and then go from there. Okay. So, so let's do that so here I already told you if it looks more like 40s and minus 80s. It's very likely to be the EPSG 4326 which is most common here it's also labeled latitude longitude which is another good sign. Notice that whatever coder coded this as the location flipped it around. So they said that the latitude is the X parameter and the longitude is the Y parameter which is incorrect. Okay, just learned to put that out there. And then here the XY coordinates to me this looks like a UTM coordinate reference system. But there's a lot of different reference system so I'm a little bit more nervous about that one so I'm going to take this latitude longitude and just try that one. So first we're going to remove nulls because again it turns out there are some here and we're just trying to be efficient with our time. We want to look at the structure of what we have just to make sure that the latitude longitude is a numeric value. If it's not numeric we would have to switch it into numeric value. And from that point on we don't have to do any of the geo coding stuff we can jump straight to that STS SF function. So here again longitude latitude. We're hoping fingers cross that it's 4326 and let's see how that is going to work. I'm going to map that out and you know does this kind of look like Chicago to me it does but I'm from here and so you know but you can also switch this into an interactive map as we did it in the previous step just to just to be certain. Alright and so at that point I mentioned as well or when we were looking at the data we also have this parameter or not this parameter this attribute of units, which will give us total number of units by development. So this is also kind of giving us a weight to the developments right all developments have different capacities so they're not equal. And if there are more developments with more units available in an area that'll be very different than you know one one development with a few units and vice versa. So, here we're going to use what's called a graduated symbology. Essentially, and I'm going to bring this over here on this side. Just so that we have a little bit more room. So we're going to take our zips and you know map them just with a very simple gray background, but now we're going to layer over that the individual housing locations but we're going to use a graduated symbology and team up they call this team bubbles so essentially larger units will have bigger bubbles and then smaller units will have smaller bubbles and just to make a nice contrast with the gray we're going to color that purple. So we can run that. And here we can see that pretty nicely so here the units it's getting shifted into different categories. So up to 100 200 400 600. And we can see all the different options available so we can see there's a lot in this section right here I don't know if you can see my cursor moving around but kind of south of this central area. There's quite a lot on the north side and there's a scattering through other parts of the city. Okay. So this can be useful and then in the last step. I'll maybe not render that part but in the last step we essentially bring all of that together on add some labels. Here let's see, I guess, maybe just doing this one part won't hurt on but essentially, when we bring all of this together, we can start to see some really interesting trends. So, by adding labels of the zip codes on to the coven map we can now really quickly identify which zip codes were of concern right those areas that were persistently high no matter which classification strategy we took. But then we can also see where our places with, you know, access to methadone maintenance, and this is a medication that's required daily to weekly for many individuals so you need to access it regularly if it's very very far, especially during pandemic the chances of you choosing to not take it anymore are very high. So, so for example 60623 seems to have the three or four facilities which is great. And there's also a lot of affordable rental housing units there, which could be considered that there might be more services in general, than for people so that this is 60623 is doing pretty well. In contrast, look at 6063932 and 29, these three zip codes have no maintenance MUDs, either in the zip code or for the most part nearby, there's, you know, maybe a few are kind of nearby. But there's also very very few units are affordable housing units to begin with right so from our perspective this may be a flag of, you know, potential vulnerability for these groups. I'm going to stop here, and we'll go back to the see. Let's go back to our talk to kind of bring this whole thing home. Great, Susan. Wonderful, thank you, Maria. So, um, I know we're running a little bit short on time but I think we'll be fine, but I'm going to move somewhat more quickly through this section just so we have, you know, time for q q and a at the end. But this is in this last part of today's workshop. We wanted to highlight how you might be thinking about generating some of your own spatial metrics based on your data and and some of the questions and services you might be working with. And in the world of in the world of our spatial that we work in we think a lot about spatial access, obviously. But so we're going to be looking at calculating some spatial access metrics today. But we also wanted to provide some context that there are a lot of other ways to think about access. And before talking about spatial access as one dimension specifically, we need to sort of think a little more broadly and consider what is accessibility. And this is of course really important in the healthcare and medical field, because as we discussed how accessible a healthcare resource or services, this has a direct impact on patients interactions with it. So, accessibility is a very multi dimensional concept and spatial spatial distance of course is only one component of accessibility. In sort of the classical research model presented by Pachansky and Thomas in 1981. They define access as a general concept that summarizes a set of more specific dimensions describing the fit between the patient and the healthcare system. The degree of fit or match between the two sides of the providers and the clients or supply and demand. So they outlined five dimensions of accessibility. This includes availability, accessibility, affordability, acceptability and accommodation. So we'll be talking about the spatial accessibility factor today. Next slide please. I'm actually just going to actually let's go to the next slide. I'm going to skip over this just in the interest of time but I think that the, you know, this next piece sort of gets at gets at what what that last slide had but there's sort of two key questions that I think are really essential when we think about how do we choose the right measure of access. So, the first question is how should distance between the user and the facility be characterized. So for example, should we use the travel time from a residential location to a facility with the driving distance be more relevant than travel time or is there sort of another unit of measurement in there that we might think about. And related to that the second question that we need to consider is what assumptions about folks travel behaviors across space across place are most important, or, or, or what, what assumptions are are most appropriate here so for example if we're using travel you know if we're talking about an urban center such as the city of Chicago or any other large city with public transportation. Or even not a large city with public transportation are our users or our patients or individuals members of the community going to be using public transit so should our travel estimates feature public transit times or are they going to be driving, or will they be those are all very different travel modes and travel times of course. So, another way to think about it is that if we consider accessing different resources so accessing a trauma center versus an ambulance, or accessing a playground for kids versus accessing a grocery store. So these things are going to have different travel behaviors, and those, those are all questions of course that are related to, to public health, but that folks use very different travel modes to access the grocery store versus a public park for example. With this question in mind. Now we can think about different measurements of access. We're going to, I'm going to present a very high level overview of some density access measurements as well as proximity, and then we'll actually calculate those as well. So, I'm going to actually this map just highlights a couple of different options, but this mostly highlights proximity access measures for this is the state of Illinois, in which Chicago is located in the top right. So these are all sort of again different ways of presenting access measures of individual communities access to different medications for opioid use disorders for some other research that is forthcoming from our lab. The next slide please. So with the container method. So this unique focuses on usually the count of facilities, or a measure of services provided by any geographic unit. So in the context of MOUD for example we might ask questions like how many methadone providers are there in each zip code, or within a one mile buffer of each zip code, or within a one mile buffer of the provider of the clinics. We might ask what percentage of each zip code is within a reasonable threshold of the methadone providers service area. So these are sort of just examples of the type of questions that we can try to start to answer with the container method. And in the next slide proximity methods are another approach to measuring access, we're going to be focusing on calculating the minimum distance as a as an access measure proxy minimum distance is generally not including travel time. It's not the distance that that's not necessarily the road travel distance but rather it's the minimum straight line distance. But this is actually quite commonly used in traditional public health research as a proxy for access, although in, you know, although with the incredible resources in in GIS and spatial analysis were increasingly be able to measure more travel time and count of resources within a set travel time, and just really drilling down and making being able to calculate more precise access measures. So other proxy proximity methods focus on how close to health care resource or service might be. So again, this might be minimum distance. This could be the travel cost or a measure of the total or average distance of travel times between origins and destinations. And then as we get to more advanced metrics, we might use some gravity potential measures where resources are weighted by size and adjusted for like friction of distance. So we actually have a lot more resources on these things we're not going to present them here today but wanted to highlight that this is all available in in our actually in our spatial ecosystem as well. So next slide. And I think we're, we're back to the our coding, Maria. All right, so let's bring this home and I do I really like live coding as much as possible but I think that again we just had so much and some great questions I wanted to make sure that we weren't losing so I'll I'll, I may do more of the just rendering breakdown than I would normally do in a session like this. Okay, but so for here. Again, from the past few like work tutorials. We've mainly been doing visualizations and that's usually what a lot of people think of when they think of mapping and spatial analysis. But this is again from our perspective just the start we really want to start to then actually quantify some of those insights at the neighborhood level. For example, we just looked at a bunch of resources and we made some nice maps but like, what's how can we integrate that into an actual metric at the zip code level, right. So for our purposes, we're going to develop two metrics, using a container method approach which is, you know, it's like the intro level but for a lot of research that is, that's perfectly fine. And as long as you're, you know, contextualizing it in, you know, what environment does that that makes sense. So first we're going to look at what are the total number of providers by zip code. And then number two, we're going to look at, and it will will first generate walkable moud service areas, and then we'll actually get a total number of those services by the zip code. So for this, this is usually when you're doing this on your own, this is usually where bugs will go wild. So, I'm getting really familiar with the first three is really strongly recommended so that you have the stomach to handle any potential bugs that come up here. Sometimes there'll be strange bugs that are not you think it's a SF issue where really it's a data issue or it's an merge issue that went wrong and that sort of thing. So here, just to be super, you know, consistent, I'm going to load the points from that that I saved at the end of the last tutorial, or you didn't see me save it but if you scroll down to the workbook that was the last stage. So also actually, I'm actually going to bring in the initial zip code shape file, because for whatever reason that I, that is working out pretty well so this is here I'm actually going to and this is a good habit to get into. Let's look at the dimensions of the points. So for troubleshooting that be very helpful so we have 25 points total. And then what are the dimension of the areas 61 areas. Here I mentioned that because for example, and one of the earlier, some other Joe Jason that I was working with there was a dimension of like hundreds, hundreds and hundreds of zip codes which didn't make sense. It's still plotted correctly and acted correctly. But the dimensions were wrong or something was corrected in the file. So, in this case we're working with two data sets we feel really good about we check the dimensions we're going to keep on going. So, we look at the point data, and that's what we've already looked at all, you know, in the previous section. If we look at the areas. Again for for this workbook I recommend just reloading that initial shape file, just again to make sure that there's no issues at all. So this is again a very basic shape file. If we want, again we can just reattach all the other CSV information that we pulled in from a previous tutorial. Next step, we have to switch everything into projection that is going to preserve feet, or that will preserve distance as something that we understand and we recognize. Technically that's, you know, I mean, I can be more technical but I find that that language can get confusing and it's easy to misunderstand the concept so let's just say that our goal is to get into projection that uses feet or meters. And if you're outside of the US, you'll probably look for one that uses meters. So, I literally just searched again EPSG Illinois feet, and then EPSG 3435 came up and that is the one that we're going to use. We're going to run this to transform both our areas and points to a question earlier, there are different ways you can either input the actual number or you can input projection equals whatever like there is another version in chapter one. And anyone that tells you that there's a best practice that's the best practice for them, you have to figure out what is the best practice for you. So I feel very strongly about that sort of thing. So this is maybe not even be my best practice but again, trying to make it just really simple so that you guys can adapt it into your own work atmospheres. When you're preparing to do spatial variable calculation. As soon as you inspect the data and make sure that everything's in the same projection immediately. Even if you're sure, even if you're 100% sure you still want to overlay the points on the areas just to make sure that everything is working correctly. If you're not loading in the same place you immediately know that something went wrong. It could be a projection issue, it could be a bunch of different things, but immediately do that. So this is just a check to make sure that we're, we have what we think we have. So next up we're going to do a simple spatial join and so here we're doing it with resource data. However, many of you may want to do this with patient data. So once you have an, you know, a specific location, Latin long that you've converted into points, maybe you want to know how many patients live in each zip code or census tract to develop a clinical prevalence estimate by that area. Right. Maybe you want to develop. Maybe you've been working with crime data, and you want to have total number of crimes by census tract to then develop a crime rate estimate right you can bring in population, calculate your own rate. So here, we're going to do this with a really awesome function st join. So we're going to use st join. And this is just in the simplest way possible. We're going to take our point data and essentially attach or stick all the information of whatever zip code it is on top of. Okay. So the geocomputation text, I think that's been linked a couple times already has a whole section, go to that geocomputation text, the vector operation section, or just search, you know, spatial join search spatial join anywhere in you know spatial join are you're going to get a lot of different feedback. So we don't have time to go into all the different topological intersection modes. But this is just a simplest approach for for here so we do that. And we have the same point information but now we also have what the zip code information so now that is all there. You might be thinking why would I do that if zip code is already in my address. Well, right, so imagine the address isn't there, or imagine you have track level data that you wanted to integrate in that sort of thing. Okay, so it's the same concept, regardless of what areas you're working with. At this stage before we continue. Let's look at the dimension of this point in polygon data frame. Okay, 25 this has to match the total number of points that we had 25 great. If those two are very different, you need to go back and figure out what went wrong. This is a really common point where things can go wrong. For some reason, for example, I was working on a different iteration of this and the point in polygon just multiplied the number of rows by a factor of nine. So we couldn't figure it out, you know, hours later after a couple restarts and just like redoing everything it was working correctly. But again, so check your dimensions. And then just to be sure, you know check your dimensions for your areas again so we're good. And we are just going to take. There's a lot of different ways to do this. So again, SF is tidy compatible. Here we're going to use a really simple lame way to count a frequency of how many times, you know what how many time or how many rows show up for each zip code. So this method we're going to only get zip codes that have frequencies we're not going to have zero frequencies, because it's only looking for rows that showed up, essentially, if that makes sense, if that doesn't make sense I'll show you what I need in a second. And this is called var one frequency so we're going to rename it just to make a join really really simple easier and in easy later on we're going to rename our var as lowercase zip because that's going to match my master file, and then I rename my new spatial variable. So we look at that we merge it. So we're having this all equals true was going to be really important so you can just bookmark that for yourself. If you want to check that out later. This way we're not losing zip codes that had zero frequency. And then we're going to map it to make sure that it did what we wanted. It does. So this is success. And at this stage to you can decide, do you want to change the na's to zeros because we know that they actually are zeros and not na's that's going to be up to you and your our skills. And this is just again, like one way of it once once you have that spatial join on file you can you can do a lot of different work to get it to what you want it to look like. All right, and then to bring this on home very quickly. Let's create some buffers right so point in polygon is is helpful but like especially for especially for providers. You know, these two zip codes that are right next to this one provider have a, it's, it's a very, you know, stark, you know the zero right there is actually very different than the ones by O'Hare by the airport in the left hand side, because they're literally right across from a provider probably these lines often follow street lines but it's not it's considered to not have a provider so that's that doesn't work out very well right. It's a very, very conservative, you know, way of looking at things. Another approach would be to create a buffer. So let's say we'll create a walkable buffer of one mile for each provider location. And then we're going to have a count of buffers by area to see if that gives us a little bit better idea. So in this map we can see this one zip code on the west side has the most providers located within. The map at the end looks so for buffers. It's one line of code for those of you who have done like, you know, ArcGIS or QGIS this is shocking how easy this is that's really the reason I switched over. The ST buffer passing the points 5,280 feet. That's the unit of my CRS here is equal to one mile. If you want there's also a units package but you can, you know, Google one mile equals how many feet and you've got it. And that's it. We have and this is creating an actual new spatial file. It's their polygons, they're not points right. So let's map this really quickly. And we can see that right here. So we can see that there's some concentrations with many walkable and the UD sites and then other areas without without that. So here is another technique to count the total buffers by area. Again, I challenge you to find a more optimal way. And again, you can pipe all of these to have this all done in one setup, but I find that when you're learning it helps just to have a section out step by step. And then we're going to stick the buffers back into the master file. And if I want let's look at the area file. And now we have our zip code area file and we have these two new variables that have been added. Right. And if I want I can also merge the CSV of all the other stuff earlier that we did earlier as well. And if we map that on, we should be able to see some pretty cool stuff. So here is something that shows out. And here I've also gotten a little bit more fancy where for the buffers, I made them. I filled them with light gray, and I've made them transparent. So it kind of creates this cool overlay effect where when they the two places overlay gets a little bit thicker, or like a, or maybe not thicker maybe brighter as a term that I'm looking for. What's interesting here is that that place in the west side is no longer as noticeable. If you see this place on the south side, it only has two providers in that zip code, but multiple ones surrounding so that's actually now considered a little bit more accessible from that perspective. However, note, those three zip codes we are really concerned about up here to the left up here, or and then these two on the west side. They also are not even within walkable distance of MOUDs. Okay, so that's kind of extending that hypothesis a little bit further. And from here on out. So we commonly develop metrics like this and then we'll use these metrics and statistical models. So that's how you start to operationalize the trends that you see is by trying different stuff out, developing a metric that matches what you've learned and something that's backed by the evidence as well, hopefully, right, and then from there, we can use that in a statistical model, or, you know, another type of model would be a point pattern analysis but that's that's another topic. So I include here some questions just to think about. But I think at this point it's a good place to stop so that we have at least 15 minutes for q amp a or we may get booted out to to give everyone some time to to chat. Any questions, I guess, or resources that folks want to share or, or actually, should we go back to the here I'm going to switch actually switch to the, and we had a really, or Susan had a really nice final wrap up piece. I want to share. I think we've talked we've touched on all of these resources at some point over the course of the last few hours but share these links as additional places to to learn more and of course there's also been a ton more shared in the chat so thank you for thank you everyone who's been asking questions and participating in the chat as well. I'm just going to drop this the links from the slide right here. I wanted to that to the opiate environment toolkit also has a tutorial for pulling data from the census and then having it ready for mapping immediately. So that's based off of Kyle Walker is often documentation but so check that out and then also there's a tool kit or a tutorial about how to create the minimum distance measure that Susan was talking about earlier. So just before opening up to questions we also wanted to highlight that there is a session today of highlighting all the great posters that folks have submitted and in this in this cool spatial chat platform. So definitely check that out later today. I was trying to add a little like celebration icon but it misunderstood what I wanted it to do. I feel free to connect with us. I feel like Twitter is the new stack overflow in so many ways. So if you have questions, feel free to, you know, I think our spatial like as the as the Twitter handle or spatial chat I forgot. There's there's there's just if you just start to follow. I mean many of you are probably here because of that as well. But just resolving questions. A lot of the developers themselves are on Twitter so that that's super helpful. And so concerns, but are things that are on like for example with the Palaeus off at maybe would developing like a short one pager about offline geocoders be helpful for folks. Just yeah and just feel free to let us know it would be helpful going forward. I think you guys I'm going through the chat now you guys are super kind. Oh, and then a question about pan us geospatial data. Yes, so open street map. Oh, no, no, no. US, US wide zip code polygons. So actually I'll be for the US wide zip code polygons check that's that's already claimed and ready to go in the OEPS database, and we literally created that partly because people kept on asking us for data. So we were like, let's find an excuse from one of the grants just to put it all together. So that I think would be a great resource all the I think all the zip codes are available as a geo Jason there. So that's good. You can also get it through. If you Google tiger zip. I will note that the new data. Census site is not as intuitive or easy to use as the older one so so note that, but then also for folks outside of the US open street map is a really phenomenal resource for data outside of the US. I think one of the TAs also is slightly famous for her developed package in. I don't know if you wanted to talk about that for a minute, or about even like geospatial data access for places outside the US. I think that was shell. I also just wanted to know, and I someone else put this in the chat already but the Tigris, our package has the tiger line shape files like built in as an R package so that's a great way to to integrate accessing those shape files with our. Oh, she'll have to step away. Okay, well that was shell. I'm going to show her up. Look, look, look up her name because be sure that in the slides and I'm sure we'll see her throughout the rest of the conference. Cool. And I think that creating more data packages is something that's of great interest. So also if you're interested in that to kind of shift to the development side. Like we've been thinking about shifting the OEAPs database, which is, again, it's really, it's about opioids for the for the grant but it's really about just like a repo to data that all the researchers or social science researchers need. We could also flip that into an R data package. Right. So if there is interest for folks to help with that, just reach out that could be a fun project. I know SP data and SP data large also have a lot of really great data available for different things. Oh, and it's Tigris, TIG, RIS. Let me see. Yeah, I think that with the hardest part with this field is just getting used to the jargon, because unless you know the jargon it's hard to Google search your answers. But all good. All right, well thank you all a huge thank you to the TAs for I mean you guys have been not only like supporting the workshop and answering questions but generating new ideas so this is super super interesting. Yes, and we are geospatial nerds. So sounds like a plan. Thank you so much. We'll see you at the rest of the conference okay. Bye guys.