 You can hear me? Hi, everyone. Welcome to Map It with Python. In this talk, I'm going to introduce you to Maps and GIS. And then we're going to do some exploratory mapping with Python's mapping modules. My name is Christy Heaton. I'm a GIS analyst for an environmental consulting firm in Seattle. So I make maps and figures for our reports and meetings. GIS stands for Geographic Information Systems, and I'm going to talk more about that later. And I'm in Seattle. And I also organize a group in Seattle called Map Time Seattle. And we organize monthly workshops on various mapping technologies. And we also have fun mappy hours. So all of our workshops use open source data and software. So if you're ever in Seattle, please check us out. We're on meetup.com. So I love Jupyter Notebooks. This entire presentation is inside of a Jupyter Notebook. And just to prove that, this is a live Python slide. I'm using a library called Rise to make this notebook into slides. And it would allow me to run code right here in my slides. But that's the only cell I'm going to run live, just live demo kind of thing. So I ran the rest of the slides that I'm going to present just before coming up here. But that said, all this stuff is in a GitHub repo. So if you have any questions about anything, you can look at my repo. And there's instructions on how to get all set up in here. But just a quick overview. I mean, I'm running Windows. I'm using Anaconda. I'm in an Anaconda environment. And I'm using various mapping modules that I'll go through. So let's talk about maps. Maps are the most quick and easiest way to understand spatial information. So if you get off the plane in Washington, DC, and you go to the metro, there's no better way to understand how to use it than by looking at a map. And maps can also be very beautiful and really, really cool. And maps have been around a long time. We love looking at historical maps to see what places look like a long time ago and what was important to the people at the time. They've been used as spatial analysis for a long time as well. Of course, in this day and age, they're all in our mobile devices. And you may have used a map even today just on your mobile device. So we're kind of in a whole new era of mapping now. So I said I make maps for an environmental consulting firm. And a lot of times when I tell people I make maps, they can say, what do you mean? Hasn't everything already been mapped? So in order to explain this better, I'll talk about the difference between base maps and thematic maps. So a base map is a map that's used as a reference. And it shows you what's there, or a subset of what's there, depending on how zoomed in or out you are. So this is some aerial imagery from a source. My source is near map. I use this at work. So this is aerial imagery. It shows you what's there. This is actually Petaluma here. So this could be used as a base map. And base maps can also be based on data layers. Like Google Maps is a data-based base map. But I'm going to be showing you some open source options. So Stamen has created a bunch of maps that I really like. I'll just go to their website real quick. So they make some nice kind of traditional, I'm relying on internet here. So sorry for the little slow to load. But so some really nice data-based base maps. These are all based on open source data as well, as opposed to Google. They also have this really nice watercolor, really stylized base map. And this is the one I like to use for all my side projects, because it's fun and I can't use it at work. So not all base maps have been made, because there are always changes in landscapes and infrastructure. Sometimes political boundaries change, or natural disasters will change the environment. So I'll always need updates. But even if that weren't the case, you can always make a new base map by using different colors, different scales, and styling different features, different ways. So thematic maps are the kind of map that I make at my job. It's a map with a theme. That theme is usually shown on top of a base map, so you have that reference. And that theme is there, but it's not something you would see from above. So to give you an example, this is a cool thematic map that I just found. And the theme is climate destabilization. It's got a base map like the country boundaries, but it's colored in by, it looks like, temperature variation. And there are all these infographics there. So this isn't my map, but this is an example of a thematic map. So you've probably seen maps like this all the time. So that's the kind of thing I do. So even if all the base maps have been made, and even if all the thematic maps have been made, there's also other planets and galaxies that need mapping. And people have actually even tried to map like fictional worlds. I met a professor last year who had some students who were looking at mapping the Lord of the Rings. And Lord of the Rings, there's this hobbit named Frodo who has to carry this magical ring all the way to Mordor and they were looking at if he had taken the best path. And I don't know if they figured it out or not. So what is GIS? GIS stands for geographic information systems. And a GIS is a system that allows you to work with spatial data, to visualize it, analyze it, store it, do different things with different data sets in order to understand spatial relationships, patterns, and trends. And GIS is widely used in organizations of all sizes in almost every industry, like I use it in environmental consulting now, but it's used by NASA, it's used by every level of government, so very widely used. And what I really love about GIS and what brought me to it was that it's a really cool mix of data science, analysis, and maps. So we use GIS to answer questions with some kind of spatial component. So where's the Mystic Theater? How do I get there? You might have looked that up. But it's also used for site selection. So where should we build our next store so that we get the most customers? Where should we build our next wind turbine? Has to be in an area that would allow that kind of thing to be built there and also a really windy area and you can do that kind of spatial analysis. Same thing with solar panels. It can be used for disaster relief. Where's the hurricane gonna hit? Like we can focus on those areas in terms of relief by knowing where we expect it to happen. Something I do at my job is find out like where the highest concentration is of different chemicals and sites we're trying to clean up so that we can apply the best remedy in that area. And my least favorite spatial question, where should we place advertisements people are most likely to buy our product and this is a big use of GIS. So before we get into a workflow, there's a few things I want you to know. So the earth is spherical and all the data we look at is on a flat computer screen or on a printout. So every data set has been projected somehow. And whenever you do that, if you imagine like peeling an orange and trying to flatten it out, you're gonna get like that. It's not gonna be a perfect rectangle after that. So depending on which projection you choose, you're gonna get some kind of distortion in some or all of these areas. Another thing to know about is coordinate systems. This is how you locate a spot on the earth using two numbers. So it's like just like in geometry where you've had your Cartesian coordinates and the origin and your two numbers will tell you where on that plane you were. Coordinate systems can be based on the spherical earth or on a pre-projected area. So in our workflow today, we're gonna be using the geographic or un-projected coordinate system and this is also known as WGS84 latitude longitude. This is the coordinate system that's used by GPS satellites and by the file format GeoJSON, if you're familiar with that. So with this coordinate system, it covers the entire world and it has an origin at the intersection of the equator and the prime meridian and then otherwise it has four quadrants. So something you need to watch out with this coordinate system is it has an origin, a zero. So whenever you bring in data that maybe some of your data doesn't have coordinates, it ends up at Ndoll Island. And over the years so much data has piled up there that there's a whole island and this is the secret spot where us cartographers go on vacation. Just kidding. So when you work with projections and coordinate systems as I go forward, we're gonna, it's gonna be stored as an EPSG code. So that projection and that coordinate system kind of get mixed together and you tell it an EPSG code. That's what it's referred to and if you need to know the EPSG code of something, then you can look that up. Okay, so my special problem today is I wanna know in what cities will we be able to see an upcoming solar eclipse? And this all started last year on August 21st. It was my birthday and I live in Seattle so I went down to Oregon to see the totality and it was so cool. I was telling my sister about it. She wasn't able to go. She was kind of intimidated by people talking about bad traffic and she like didn't have enough time to plan in advance. So she wasn't able to go and so she was kind of talking to me about like, how do I figure out when the next one is and where I can go? So if you're not familiar with a solar eclipse, it's what happens when the moon comes directly between the sun and a spot on the earth. And when you look at the moon and sun at this time, it looks kind of like this. It only lasts for a very short time but it's super, super cool. So my sister wants to know when and where will she be able to see an eclipse? She missed last year so she needs to know what's gonna happen in the future. And she wants to be part of a big event like last year's one. It covered a lot of people, it got a lot of hype. So she wants to see an eclipse that's happening to a lot of people. She wants to see which one is covering over the most people. And then when she finds that eclipse, she wants to know what the largest cities are because it's probably not happening where she lives so she's gonna have to travel, fly into an airport, stay in a hotel. So she wants to find out what those largest cities are that are going to be able to accommodate her. So to do this, we're gonna need some Python mapping libraries. The first one we're gonna use is Matplotlib which is a Python plotting library that produces beautiful publication quality maps and it can be in both static or interactive formats. So we'll go ahead and import that. And then the other two libraries we're gonna use are pandas and geopandas. Pandas is a very well-known Python library for data science. It provides high performance, easy to use data structures and data analysis tools. And then geopandas is geographically enabled pandas. So it's like pandas but it uses this library called Shapely in order to perform planar geometric objects manipulation on those. So we'll import both of those with kind of standard naming conventions. So let's start by making a simple map. So when you install geopandas, you get some data along with it and one of the things you get is country outlines so I didn't need to go in and find that data by myself, I just can load it in and assign it to a variable. So spatial data has a spatial component and a non-spatial component in the form of like, we call it an attribute table but it's like the non-spatial information about your data. So I can just run head on my world data and look at the top few non-spatial rows. So I have names of my countries, I have some population and GDP information and I also have this geometry column and that's holding the spatial information. So before I look at the spatial component, I want to check what my coordinate reference system is for this data. I'm gonna need to know this if I'm doing any plotting or calculations, I need to know what my reference system is and I'm just asking for it to tell me what it is and it's in this EPSG4326 which is that geographic coordinate system that I told you about earlier. So now we can plot it. I'm just plotting with defaults, just simple plot method and you might notice that I was just telling you that this is in an unprojected coordinate system and yet it looks flat so it's been projected, right? So it projects it on the fly in order for you to see it but any spatial calculations you do are gonna be in its source coordinate reference system. So I can customize my plot as I make it a little bit bigger for you guys to see and change the colors of things and just as a reminder, all this code is on my GitHub so you're free to fork and clone and get it up and running and tinker with anything you want. Okay. So I have a basic map but I need to load in some other data. If we're gonna be talking about populations and eclipses I'm gonna need data for that stuff. So I'm gonna import OS and then I'll have access to my operating system paths. I put some data in a data folder. So I have actually GeoPanels comes with some city data but it turns out it's just country capitals and that's not gonna be enough data for me so I went and downloaded a larger city data set that I'm going to load in. And just as I like to do whenever I'm loading in a new data set, I take a look at the first few non-spatial attributes. So cities here has some columns of interest. We have our geometry column. It's a great population we're gonna use, name of our city. So those are the three columns that are important to us. And then as our workflow goes we'll check the coordinate reference system and just we see that it's in four, three, two, six which was the same one that we were using for our countries. And I can plot it out with a few customizations. So this is our city data that we're gonna be working with. Now I wanna plot my cities on top of my countries. So they need to be in the same coordinate reference system if you're gonna plot them on the same plot so I'm just doing a quick check that they are. And then I can put them on the same plot by assigning my base plot to a variable in this case called base. And then any other plot, any other data I want on top of it, I pass in this AX parameter, I tell it I want it to be that base. And then I can put as many layers on that base as I want to. So now we have cities and countries. Okay, so now we're ready to tackle my sister's questions. So she wants to know where and when will she be able to see an upcoming eclipse? So I'm gonna need some eclipse data. I actually went to NASA, I mean, some NASA website and downloaded data and then fiddled with it until it was in a form that was really easy to show. So I'm creating a variable that's gonna load in my eclipses and I'll look at the first few rows. So this year column, that's what I've got to tell me when this eclipse is gonna happen. We're ignoring day and month for now. And then I've also got my geometry column. Check my coordinate reference system. It's in the same one we've been using. Okay, and I can plot my eclipses. So I've made these eclipses look, this is my map, so I want my eclipses to look really eclipsy with like shadowy insides and like golden outsides. So this is like how I think eclipses look and you can customize it however you want. One thing you can quickly do with Geopandas is get the minimum bounding rectangle of each of your entities and this comes in handy when you're trying to like zoom certain things or find the centroid. I'll show you an example of where this comes in handy. Okay, so we'll create a better map of our eclipses because now we've got our base. So we don't know when these eclipses are gonna happen until we can color them in by the year they're going to happen. So in this case, I've created some customizations. I've made my eclipses kind of see-through and I've said, well, you color them in by some random color depending on the year they're gonna happen. So now we have a legend here. We can see, I've actually kept in last year's eclipse just because there was space for it and it kind of looks same. But other than the 2017 one, these are all in the future. Okay, so now she knows where and when she'll be able to see an eclipse. So to tackle her next question, we're gonna ask what upcoming eclipse will pass over the most people. So now we're gonna need to plot our base, our countries and our eclipses as well as our city data all on the same plot. So I'm just doing a quick check that they're all in the same coordinate reference system. And then I can plot them all together. So we see, it looks like there's a lot of cities that are gonna overlap with some of these eclipses. So I think we're on the right track. So in order to get the cities that intersect with an eclipse, we can do a spatial operation. It's a spatial join and I'm telling it, I want to create a new variable that only has the cities that intersect with any of those eclipses. And I can look at the table of my data. So I have all my cities and now they have a year column and that's telling me what year that they're intersecting with some eclipse. So I need to summarize those populations of the cities depending on what eclipse they're in. So I'm gonna pull out, as a pandas data frame, just the population and year column from my new intersected cities data. And that looks like this, it's a pandas data frame just with a population and year columns. And now that it's a pandas data frame, I can do a group by and summarize my years, which are represent eclipse years and summarize the population of all the cities that are inside of each one. So the results of that look like this. So now I know what the population is gonna be of the eclipse in that year. But this is now like a non-spatial pandas data frame and I need to join it back into my eclipse data. So I can merge it back in because my eclipse data has a year column and my pandas data frame had a year column. So I can just stitch them back together based on the year. And then now I have my spatial data, my eclipses, now have a population associated with them. And in order to find out which one is gonna cover over the most people, I can just view my attribute table of my new eclipses and sort descending. So I see this top row has the highest population and that's the eclipse that's happening in 2024. And I can plot this on my map. Now I've customized my plot to show me the darker color. If an eclipse has a higher population it's covering then it's gonna appear darker on this map. And some of them are really hard to see. There's really just two that are covering quite a bit of people. And it kind of makes sense. They're passing through large land masses. They're gonna, this kind of like passes the SNF test. So it was this eclipse that this brown one seen, this is a plot I showed you before, but now I'm just, it's this 2024 eclipse and we can see that it's the one that's going through the United States. So now she knows which eclipse is gonna pass over the most people, the one in 2024. So what are the largest cities inside of that eclipse? So she knows where she can buy a ticket to and get a hotel room. So first we're gonna make a new geodata frame that's only gonna have that one eclipse in it, the one in 2024 and plot again. Now our plot is really zoomed out too much. We kind of wanna zoom in on that one eclipse. So we're gonna use the minimum bounds of that eclipse in order to customize our plot to focus on it right there. And I really liked my cities being orange when we were looking at like the whole world, but at this scale I'm gonna change them to be little black dots with white outlines. So there are a lot of cities in that eclipse path and we wanna find out what exact cities are in there. So we're gonna do that same kind of spatial join that intersected the cities that we used early on to get all the cities that intersect with any eclipse to get just the cities that intersect with our 2024 eclipse. And if we plot that we can see it worked. We see just the cities are remained that intersected. So one way I could see which of these cities has the most people is to size my city dots by their population. So now my dots are bigger if they have more people in them. So this is kind of cool. Still kind of hard to tell which ones are the largest. So I can just sort descending in a table and have it tell me which are the top five. It turns out just write a little bit of code to tell me some nice information. So it turns out there are 73 cities in my eclipse path and the largest five, I'm just sorting descending based on population are Dallas, Fort Worth, Montreal, Cleveland, San Antonio and Indianapolis. So let's look at those on a map. We'll turn those cities into their own geodata frame and plot them out, add some labels. So I've also lightened up that eclipse path so you can see the city's a little bit better. So this is the map I can give my sister. Yes. But my sister's not, she doesn't write code and she doesn't know how to use a Jupyter Notebook. She's more used to like the Google Maps experience and I call that kind of map a slippy map. It's the kind of map you can drag around, zoom in and out. So let's put this data on a slippy map. So to do that, we need a library called Folium, which binds the power of Python with leaflet.js. And so I'll go ahead and import that. And now I can quickly pull in web map tiles. These are all open source web map tiles. As the base maps, I showed you a few of them earlier. So it's really easy to just pull these in. So it's like drawing them in from the internet. And I focused on Petaluma. So we have a few options with web map tiles. So this is the organization Stamen that I talked about earlier. They have a really nice black and white base map we could use. But as I said, I really like the watercolor map because I can't use it at work. So we're going to go with this one. And then we're going to calculate the centroid of our eclipse so that we know when you are using Folium and that kind of slippy map, you can drag and zoom in as much as you want to. But it wants to know what zoom level you want to start on and what area you want to start on. So I just want to calculate the centroid of my eclipse so I can say, this is where I want you zoomed in right when I create you Folium map. So I can add my eclipse cities. These are just the top five cities in my eclipse path to my Folium map. And that looks like this. So this is my sister. She's like, OK, this is something I can use. But we don't really, from here, we can't really see what cities these are. So I can write a little more code, write a for loop that's going to generate pop-up text for me, and then put that all on a new map. And it's going to look like this. So now I can click on these cities and see what their name and population is. So I think my sister will be happy. She'll be able to maybe decide on one of these cities based on maybe some non-spatial attributes like maybe she has a friend in San Antonio or she's always wanted to go to Montreal. So she can, yeah, or weather. So yeah, she'll be able to choose from here. So hope you enjoyed that. Learning about Python's mapping modules? Happy mapping. Did I miss the end of the talk? Oops. So as a token of our thanks, here is a wonderful made-in-petaluma water bottle, designed in petaluma water bottle made by Camelback etched by Andrew on our team. Christy also did a wonderful PSF blog post about last year's conference, which saved us from writing a lot of stuff about last year's event. So we really appreciate that, and thank you very much. Everybody, please give Christy Heaton a huge round of applause. And.