 So, from here what I want you to do is right click on this link that says code for hands on workshop fall 2018 and just open that in a new window. You should see a github page like this and if you use github you'll probably know how to pull this repo down. If you don't use github that much just click on this green button you'll get a drop down menu and then you can click download zip and it will download all of the code that we're going to use today. You can put it anywhere you want and you can open that up as a project in your RStudio as you see fit. But the reason why we're the main reason why we're going to do that is because once you do that, and I'll do that with you, I want to run this patch package program first. Now as I do that let me just note that what that's going to do is ensure that you have all the libraries that we're going to use today and if there's something, some concern that this should not cause any problems on your computer but if you have some concern about the versions of the packages you're running, maybe you don't want to run this. Although I would find that really strange. I can't imagine why that would be the case but I feel like I'm kind of honor bound to at least mention to you that this kind of updates some packages possibly. So from this expanded view of what we just downloaded you can double click on the map-fall 2018 which is the project file, the R project file and even if you're on a windows or a PC that should, or a Mac, it should launch you straight into a project in RStudio which will allow you to have, I don't know if you use projects in RStudio but the nice thing about projects in RStudio is it allows you to reference all of your data and your scripts from a relative position, in other words you don't have to have these really long working directory paths and you don't have to use the setWD function which to be fair is not particularly reproducible so that's one of the reasons why we promote it. So I'm going to launch, I'm going to open this 00 path in package.RMD file and the easiest thing to do from this point is to just click the green arrow right here in the first one that says the following package is enabled code in the workshop. So I'm going to run that while we're talking it's going to do its thing and then I'm going to run a different one on my machine because my machine needs some other patches since we're in the lab. Alright so the first student who walked in, I'm sorry I didn't catch your name, I said by law of probabilities I'm betting that you're a grad student in the Nicholas School and he agreed that he was. So we're a little smaller group than I was expecting but I'm going to still bet that there's another grad student in the Nicholas School in this room, is there? Now there we go, three of them and you're either public policy or medicine or engineering. Nope, wrong on all three. Oh well, can't do so well in probabilities when you only have an n of equals 4 but grad student, undergrad, oh great and what's your project just out of curiosity? Super. Oh yeah, there's an interesting visualization in our lab that have you ever you probably have you seen it? Yeah, yeah it's really kind of cool. So that visualization just to bring everybody else up to speed takes satellite images and has a before and after of mountaintop removal and how that's changed the landscape and it's really quite fascinating. Right so I'm going to launch from here, I'm going to go back to this page and I'm going to just take a look at the slides. Now I will say upfront that I am actually much more of an R enthusiast than I am. I'm definitely not a geographer. I know how to make a lot of maps and do some geospatial analysis in R but if your question gets moderately advanced that's when Mark's going to show back up. Mark is our GIS specialist so there's plenty about GIS I don't know but it should not be a problem in any way this class is mostly introductory but if we need to move farther along in a particular question of yours that's when I would encourage you to either make an appointment or come see me in my open office hours and if I didn't say it I usually work on Wednesdays from one to three in the open in the lab. So this is part of the R fun series we do workshops on other things in R as well as other topics but primarily today we're going to cover mapping and it's really handy to have some sense of what how the tidy verse works and how ggplot works in order to take this workshop. If you don't it's no big deal you can look in these background resources and sort of catch yourself up later. One of my other colleagues one of the other GIS specialists Drew Keener was really keen on calling this mapping as opposed to GIS because he had a strong sense that we weren't actually doing any particular analysis and I well I think that's debatable and certainly light on the analysis but the point being the real point of this slide and Drew's comment is that to some degree doing your spatial analysis in R is still relatively new. The behemoth application in the mapping GIS space is RGIS which is a proprietary tool and it's free on campus but it's not going to be free when you get out into the world. There is an open source sort of cousin to that called QGIS that a lot of people use and you may find that there are lots of people more people using those tools that are for mapping but and if you're really on the lighter side Tableau probably or if you're a programmer you could also use Python but we're going to focus on R. Specifically these three packages in bold leaflet tidy census and SF but there are a couple other packages listed there that we're going to end up using. There are some more advanced packages for mapping that we are not going to use but it might be useful to you to reference these slides and know that these other packages exist. The one that jumps out at me is raster. I don't personally do any raster analysis but if you are one of those people you're probably going to end up using that package. So we're going to jump right in. We're going to do three things. We're going to make an XY plot where we're just plotting latitude and longitude on a grid. We're going to use a tool called tidy census to create a choropleth and then and a choropleth is basically a thematic map and then we're going to go beyond tidy census to do some more fine-grained control of our thematic mapping. All right so I hope this is not too basic for you but I never know who's showing up so this particular map is what we're going to create next. In our case it really consists of just two layers all right. The layer that we're adding on top are all the little blue markers which are the XY coordinates the latitude and longitude. It has a base layer which is really a conglomerate layer right. There's a layer if you look at that base layer closely you can see that there's a road layer there's a labels layer for example where you get Durham and Chapel Hill. There's a there's clearly a water layer where you get waterways and there appears to be some land characteristics in the green and I don't know what the gray is but there's clearly a whole conglomeration of layers that's packaged up as one single base layer and so that's really nice it's a real convenience but you don't always get that base layer depends on how you end up visualizing your map but I wanted to point that out because one of the sort of central ideas not only of GIS visualizations but visualizations visualizations in general is that you have these layers that you build up on top of each other so we're going to end up doing that. If anybody here has used ggplot you're familiar with the idea of layering your images okay. All right so jumping right in let's go to our downloaded files in our studio and I need to run one more package and while that's running we're going to open up the 01 file called 01georeference.rmd and I will just as soon as I can I will expand the font on this so it's easier for you to see. Tools, tools, global options, appearance. Now it's always kind of a luck of the draw on fonts when you're putting them through the projector so some audience participation will be helpful there is that font large enough for you okay and I can make it a dark screen if the contrast is not right but just let me know if it's difficult to see we can change it. So the particular sort of tidy verse coding style that we're using here is called literate coding works really well with this rmarkdown method. I'm not leveraging all of that but there's a workshop on rmarkdown and literate coding if you want to learn more but what it allows you to do is have these things that are called code chunks which are surrounded by pros where you provide more literate commentary on what you're doing. So the first thing we're going to do if we just execute this first code chunk is just make sure that we have the two libraries we need tidy verse and leaflet and then we're going to go down here and we're going to load some data from the data directory and this data that we're loading in is all the starbucks locations in the u.s in 2012 then down here at line 26 we're going to run this we're going to execute this code chunk and we can take a quick look at this data frame I'm just going to pop this out to make it even yet a little bit easier to see you don't have to do this but here's our data frame of starbucks information and the thing that we're really interested in actually is the latitude and longitude and the name right the name vector is right there and it's just the name of each store and if we scroll over this data set actually is sort of a almost an embarrassment of riches when it comes to being able to geolocate because under location we have not only a full street address but we have latitude and longitude there and if you keep on going it still has street and everything broken out into its own fields and then it has coordinates latitude and longitude and then it actually has latitude and longitude broken out so this almost never happens in real life right usually your data is far harder to pull together in it just so happens that this works really well for a workshop and if you need help wrangling your data we're happy to help you but this is the data that we're really want to grab is this latitude longitude columns so that's the starbucks nc that's this data frame right here the first thing we're going to do is send it to leaflet these pipes say and then that's the tidy burst style and then i'm going to stop right here and just run these first three lines which i'm going to do by highlighting and hitting control enter and what you see here really is a very general base map right if i zoom in this is the leaflet base map and as i zoom in it should be no surprise i'm getting more detail right so i can also set the view and set the zoom by adding the fourth line so i'm going to control enter on that and so the view there's a number of different ways if you look at the help let me pop this back in for just a second set view i'm going to highlight set view and on my machine i'm going to hit f1 i'm sure there is a mac equivalent to that but i don't know what it is um but hitting f1 opens the um documentation for this particular library and set view will tell you that there's a number of ways that you can set the view right so that's how you can learn more but back to our code oops and i just run these four four again control enter um so we're setting our latitude longitude to make a binding box that will define what the window is and we're also setting the zoom and then the very last thing we're going to do um taking from this data frame up here we're simply identifying under add markers and again in the f1 help if you look over here um add markers there are a ton of different ways of markers that you can add tiles you can add circle markers you can add circles you can add rectangles the list goes on so it's good to know about all of those we're just going to add basic markers where we identify the latitude with the latitude in our data frame and the longitude with the longitude in our data frame and i'm switching my syntax here just so you can see that there are different ways to do it i'm also taking my pop-up and identifying that in my data frame as the vector name right so just to bring that home a little bit um here's the vector name in the starbucks nc and that pop-up function does probably what you imagine so i'm going to run this one more time and so i've got my latitude longitude markers actually automatically set and if i click on any one of them that's where i get my pop-up right and of course that pop-up gives you a lot of control over um what you put in there but it's stuff that you would have in your data frame to begin with all right um all right so that's step one and what i'd like to do now is take five minutes or less um and we can decide as a group um go back to your files tab and open this file that says exercise one xyliflit.rmd um i covered it so briefly that i forgot that i didn't say more about it um let me so where is that it's over here so it's really just a um shortcut it's the same as saying starbucks nc dollar um and you could do either one right like i could i instead of the tilde i could put this here and instead of this i can put the tilde and that will still work and i i just put that there so you are aware that there are different ways you just feel like you really don't need the tilde at all because it already knows right looking at starbucks nc yeah it's it should i think it won't work and i yeah i didn't which is has something i don't know if that's because it's um leaflet or what i mean i just don't know because all it's doing is referring back to that like you said that it's a little while that you're piping in there yeah it should know it should know that's why let's blame it on leaflet because you know to have an undue amount of respect for the whole tidyverse world i'm going to say that if tidyverse developed this that you wouldn't have to do that oh that's a great question we're going to cover that a little bit more but um it's this add tiles feature and there are ways to later in the workshop we'll manipulate that base layer to choose a couple different ones um and you can actually it's also worth noting if we just comment out that one line um then you know we're still plotting our x y but the context is harder to understand in this case but going back to that concept of layering you know what if we didn't want that whole king glomerate base layer we just wanted rivers so we'd have to go out and find a river layer and the two of those might make sense together probably not starbucks and rivers maybe highways and rivers highways and starbucks would make more sense there's all kinds of um bad scientific assumptions you can make about the locations of starbucks and the frequency of locations of starbucks right like you might assume that farmers don't like to drink coffee if you project it right um that would be wrong i think farmers probably do drink a lot of coffee but they don't drink it at starbucks because there are no starbucks in rural areas right because it has to have a certain population size that's all right uh i don't know did we do pop we could do that we could do the core of population and then you're overlaid with starbucks and basically starbucks are all in high population counties right and along the freeways yep oddly like if you go back up here yeah there's a i like the fact that the very first starbucks listed in this data frame is in elizabeth city i don't know if anybody's ever been to elizabeth city but and i have to it's a nice town it's a small town you know okay so i think we probably spend enough time on this but uh i'm happy to let you guys drive so first question is anybody want more time all right we're going to move on um do now the answer um file once again actually answers everything that we just did but um i have different classes that want to do things differently some ask me to go over the answers some say let's move on so show of hands how many people want to go over the answers okay perfect so we're going to go on to uh corp less all right um there are lots of kinds of maps that one can make and we're only going to cover the x y plots in the corp less um but if you want to get into deeper different kinds of maps can talk with mark and drew and myself later um in case you don't know what a corp left is the idea behind a corp left is that you're shading a region or an area with a variable right so there are some problems with corp less this particular corp left um shows wages average wages from the bureau of labor statistics um over regions and the regions in this case are states um so the darkest blue is the highest wage um one of the problems typically with corp less is that the regional areas are not all the same so it doesn't necessarily represent all the same population right so you can often have to sort of dig in a little deeper to really understand what a corp left is trying to tell you and as with all visualizations and all statistics you can lie blatantly with these and create all kinds of misleading things on the other hand on the positive side corp left can tell you a lot and in this case it tells you i forget exactly if this is wages for a particular um occupation i'm pretty certain this is actually come to think of it i'm pretty certain this is wages for like um uh like overdose clinic nurses and i don't know what exactly one is to interpret by new jersey in this case but um it is clear that uh new jersey and california seem to have the highest wages in those cases um the lowest wage places are uh listed in the yellow and just for another point of explanation my visualization colleague eric monson tells me that and i actually reversed the scale so that the colors would show yellow to dark blue but he said did tell me that typically in visualization you want the brightest color to represent the highest number i reversed it because i thought it actually was easier to interpret the other way it doesn't really matter because we have a scale right so there is some responsibility on the part of the person reading the visualization to figure out what we're trying to say but just by terms of um standards so that's a core plan now what we're going to do now is we're going to use tidy census u.s census it's a it's a r package to learn a little bit more about core plans and learn a little bit more about um the u.s census u.s census sort of two things that you need to know is that the census really is at least two things there's the decennial census that's taken every 10 years starting from i don't know 1790 maybe even earlier um and that's in the u.s constitution and then there's the acs census which is relatively new it's been going on at best 15 years maybe not even that long um and it's sample data the other thing you need to know about uh census uh working with census data and mapping is this concept of census geography so let's cover both of those so this is a decennial census like i said it's done every 10 years it's 100 percent everyone in the country i believe it's everyone in the country i don't know how we handle homeless people but there's an attempt to count everybody um it's used um for creating congressional districts it's used for other kinds of policy points but it's just counting things it's not particularly interesting from a social characteristic point for you that's where the acs comes in um so the acs is sample data there's just a smattering of people in any given year who have to answer the don't have to but are asked to answer the american community survey and then what they do is they ask these same questions um over a five-year run and they do some statistical magic and try and prevent you from identifying any particular individual and then release some trend information about social characteristics of people now i'm guessing that nobody other than mark and i remember but um back in 2000 when they didn't have the american community survey they used to have this thing called the long form so if anybody's ever filled out a census long form which you may have filled out you have a better sense of how the census works is that right i think we might have gotten one too um right so there are those two different types of census and it can be confusing because sometimes people want to know a lot of social characteristic information and it simply does not exist in the decennial census right so you have to know where you're headed the other thing you need to know is you need to know this concept of census geography right some of these are pretty straightforward you can get a total number of people who live in the nation there's an idea of region there's an idea of division states is pretty easy an idea of how many people live in each state counties right are those subdivisions like we are not only in Durham city we're also in Durham county our neighbor to the west Chapel Hill and Hillsborough or in Orange County right so you can get all that kind of information from counties this is where i think it gets a little more confusing but also interesting let's start from the bottom a census block it's a little bit like a city block all right i say a little bit like because that really only holds true in the case of urban areas they have census blocks in rural areas as well but there are no city blocks in rural areas so it doesn't make sense but if you think of it this way it's the smallest unit by which the census counts right census block so if that's a city block block groups are a little bit like a neighborhood it's a collection of blocks and census tracks are a little bit like a region of the city it's a collection of block groups so you have to identify a variable variable is either going to be in the acs or it's going to be in the decennial census and then you have to identify the geography or the reporting unit that you want and you'll notice all these other colors we can kind of ignore but there are different sort of shapes and sizes that you can report out on and the the detail upon which you can report out sort of depends a lot on that visualization but most of the time we're going to stick on the or not all the time today we're going to stick on a very high level like total number of people in states it's pretty easy just by way of information this is an example of regions so you see regions up in blue up there the regions then are this is a nominal notion it doesn't doesn't have any real importance but there's both regions and the colors and state boundaries as the black and white all right so i mentioned that there's this other issue that after you identify the geography or before you also need to identify a census variable how many people live in x region how many you can actually get all kinds of information how many housing units or how many householders who have an education level of beyond high school all that kind of stuff now to be fair there are literally thousands of census variables so i'm going to open this bottom one in a new tab to give you an idea and we're not going to go into this in a great detail but there's three different ways where you can read more about how to identify your census variables or you can send us an email mark in particular has an encyclopedic knowledge of the census more i mean he doesn't like it when i say that but he knows more about that more about the census than anybody else in the library and then a lot of other people um and it just so happens that just you know sort of goes without saying just because you think there's a census variable for something doesn't mean there is so sometimes you have to find these proxies for things um but just to bring that sense that idea home where i said there are literally thousands of census variables this is a list of acs variables american community survey variables for 2015 and you can see you know this is an exhaustive list and so far i've probably not done 10 percent right so it can be a challenge to figure out the census variable you want but i picked out a few for you and we're going to do that now in oops i want to do a little bit too far i wanted to do this gg map georeth let's let's go ahead and go to the choroplet and then we'll come back so we're going to open this file o2coroplet.rmd and let's first run this first code chunk at line nine oh no i don't have tidy census in here i thought i i don't i have this problem with my um apparently not i thought it yeah i'm not sure i'm not sure what i did wrong let me it's right there maybe i just needed to reinstall it i mean i mean what i meant to say was maybe i needed to restart it so i'm gonna run this again real quickly again this is a problem that's only exists in this lab you probably should not have this problem on your on your workstation i'm going to go over here it's a session restart bar just one more time go over here this and there we go was installed by a different version it needs to be all right one more time what in the world is happening namespace loadfail tidy sinkage package wrap durs was installed by a different version it needs to be reinstalled for use with this version i just did that all right let's try reinstalling tidy census oh my gosh i don't know what to do now i've never had this problem before um i'll tell you what i'm going to do i'm going to shut this down and i'm going to open it back up desktop map files run all with this r version okay i know what to do here so murphy's law of doing live coding examples is that something's going to break murphy has struck but i am not going to let murphy defeat our workshop he's going to trouble loading that stuff you know what's really strange is i'm having a similar problem right here but with a different library sorry what about it is that something like it's not is that there's no package component so like i felt like when it's installing the asset the dependency unit not being stored yeah i don't know what that means you could reinstall you should have that maybe that there's a version issue could be is that's like package asset would be built on this so when it's downloading the um oh it's something to do with the version that there are like the units like one of is dependent one's one package is dependent on another yeah and the like the units itself cannot be installed did you try i was having you give just like install packages yes say up and it seemed to install but didn't really try to do library as something wouldn't do it it says like it's successfully installed and it's like it's unpacked like there's some version issue it's working fine on these machines um i'm sure that we're sadly not going to be able to solve all the installation problems at this moment but uh i am happy to work with you in open office hours just to make sure let me run this through yeah let me go ahead and make this larger okay so run all the libraries and then the next thing that we're doing oh right i need that so you need to this is where you need to run your um census key if you brought it with you all right there are a number of ways to run the to identify your census key i'm going to turn this back into a code chunk by putting the squarely bracket the curly braces you guys see what i'm seeing yeah but you can't see what i'm seeing sorry there we go so this this block right here a second ago let me see if i can return it to what it was looked like this right if you will put uh braces around that just highlight it and hit um the uh i always want to say squiggly brackets and you know what i mean curly braces that's what technically they're called curly braces just put it right there and remove the space that'll make it an executable code chunk and then what you're going to do in this place where it says your api key goes here that's the thing that you would have brought with you and it's like a whole series of crazy letters and numbers go ahead and put that in without that and the the api key by the way is free without that this won't work the other thing you might want to do you can say uh put a comma after this and say install again this is all in the documentation install equals true and if you do that it will install the key into your r environment so you never have to do it again um since this is being sorry it's specifically for the census stuff um so i've already done this on this machine and i don't want my key to be exposed into the recording so i'm just going to move on uh but if you're doing that we're going to go ahead and move on and we'll use that in just a second so the first thing we're going to do actually we're going to use it right now we're going to use this get acs function and this is where some of the stuff we just talked about comes into account right county is the census geography level so we're identifying county and because we don't actually want county level data for the entire us we just wanted for north Carolina right into function nc and in that case a lot of times people are interested in smaller chunks of data you can very easily gather a chunk that's more usable right variable name i pulled out for you and i at one point i had that identified as to what that variable is um i think it's population um and then the other thing then you don't always have to do this but if you click geometry equals true what it will pull down is what's called a shape file and shape files if you're used to using um in particular esri products are those polygon shapes that you can then make uh cora plus out of right so for example the states if you don't say geometry true all you're getting is just the statistical variables just fine but if you want to make a map out of it you have to say geometry equals true so when i run that the last thing i'm doing and this is sort of a little trick is if you then view it with this tidy verse function called as tibble if you answer pop i mean you can view nc pop without yes tibble function but it's just a little bit easier to read it throws it into this page view and what we're looking at then is every county name um the geo id which we're not going to use the variable name which we can identify to fuzz right so that's just redundant uh that's this right here and then the estimate is the variable that came back so it should be population um if you spend a little more time with tidy census you can actually rename that variable name as you bring it down so uh you know has some value the moe is the margin of error which statistically speaking we should always be working with the margin of error but we're not today and then this very last thing over here that's the geometry this is our whole collection of shape files as a very weird value says that basically what it's telling us is that it's an s3 r data object which quite honestly we don't really care you know it's nice that it handles all of this for us and that's all we need to know it's just a whole collection of little lines that makes a polygon in this case a state shape or a county shape i'm sorry right so uh if you've ever worked with the older way of doing mapping in r which was called sp which i should mention um like one really good reason to use sf instead of sp which we'll talk about is that it was developed by the same person and sp kind of got old and it wasn't as useful in his opinion so he rewrote it and he called it sf so there may be some reasons why you run across sp objects and you still need to but you shouldn't really worry about sf being the new kid on the block because it was designed to work with the tidyverse approach and it was designed to be more modern and it probably does everything that you need um all right so moving on here's what we're going to do we're going to make a quarter flap and we're going to start by using this color quantile function and what we're reading into the color quantile function is this estimate vector right so just to roll back up to our data frame that's this vector right here and what we're trying to do i don't know if i have an example yeah sure um i want this collection of colors to be broken down into 10 parts so before i do any mapping at all i need to sort of leverage this color quantiles function to know how to create the cuts so all that's going on here in color quantile is i'm saying for the domain estimate that's in the data frame nc pop cutting into 10 bins and using another library called beardess which gives me those nice yellows to greens that's all it does i mean you can manually set the colors to anything you want but a lot of people are familiar with color brewer and a lot of people may also be familiar with beardess they do essentially the same thing i like beard is a little better but the value of both beard or beard is in color breweries that they generally tend to be more friendly for colorblind people and beard is in particular is also really good for just printing straight to black and white even though they're very brilliant colors it turns out if you print to a black and white printer you can still distinguish the shades very well right so that's all that's going on there is we're setting up this function called map palette to be able to print those colors and that happens down here okay that's what we're going to reference that again so we'll talk about that and again in a minute so going back to our data frame nc pop the next thing i'm going to do is i'm going to transform to a coordinate reference system by using and we're going to talk about do various examples of setting the crs projection in this case this is the way you set it to this projection so i'm going to give you my very rudimentary quick explanation of projections they don't you don't always have to set them but you set projections because we're looking at two-dimensional flat representations of a globe which is round so geometrically speaking the reason why canada looks so huge not only is it huge but there's a lot of space between the to make that round surface get up to the pole and when you flatten that out it sort of elongates canada disproportionately longer than it really is that's my explanation for why you need projection it's really quite fascinating but i'm at the same time i don't think that that's really the point of this workshop so just know that there are different kinds of projections that you may want to choose depending on where you're mapping so the projection you would set for north carolina would essentially be entirely different than the projection you might use for representing something in australia okay you can look up which projection to use anywhere you want all right so if we just start with these three lines we're going to get uh almost nothing we have a projection and we have no base map so the next thing that's going on here and this answer is your question earlier about setting the base map earlier we were setting a base map with add tiles so i could just run this function you just add those three and i'll end up with this huge base map that we're pretty familiar with because we've looked into it um already uh has a lot of useful layers but we might not want that base map um so in this case i'm using a base map called stamen toner lines now important question is how do you know which base map to use so if you get the help on add provider tiles i'm going to hit f1 here and go back down here under provider there are two very nice links that essentially you can follow to find what base maps are easily available to you i'm going to click on one of them and you can just sort of scroll your way through each one of these options till you find one that you like um it's a little more complicated than that because most of these base maps that are provided through here are free but some of them are only free if you register for them and it has to do with how apis sort of keep track of who's using their services um since they're providing you a free service they want to know a little bit about who you are right and they want to make sure that you don't overwhelm your system but there's several um base maps on here that are completely free and don't require registration and the largest collection of those that seems to be used broadly are the stamen versions so this is a stamen toner and there's a nice one called stamen toner background there's stamen watercolor not sure what you would use that for but it's pretty there's stamen terrain you can figure that out somewhere down in this list i found one called stamen toner lines and i kind of like it so i'm using it so then the last thing i'm going to do is i'm going to fill my polygons i'm going to use the add polygons function fill my polygons with that set of quantiles in the estimate range which i did it's not on the screen but i did it right here at line 55 right so i'm calling map palette down and i'm identifying it right here and again using estimate just being redundant the other thing that's going on here just so you know uh like we use pop-up to begin with where with the nc starbucks thing we use we put in the pop-up just the name vector which was very easy to use the name vector was perfectly easy to read in this particular data set called nc pop the name vector has like too much information in it and so to a certain degree i want you to ignore this but this is just a regular expression which is defining which part of that vector i'm pulling out you can process your vectors any way you want so if that's confusing to you um you can ignore it but essentially what i'm saying is i don't want my pop-up to say alamance county common north carolina i just wanted to say alamance county so that's all that's happening there depending on how you process your vector you can you can do that anywhere you want right so i'm just going to run these first bits here leaving off the legend for a second so this is my stainer uh statement toner light background as a face map and this is my core flip and again the colors are set up based on the estimate using the beardest library now we also talked about layers so you can see that you can add a legend highly recommended by the way like hard to know what you're looking at without a legend um and the nice thing about the leaflet map is it will define this aspect of the legend for you as well okay now let's see if there's anything else oh yeah yeah yeah we can also do let's just keep on going we read in the starbucks data i just want you to know that you can do similar stuff as before uh so if i run this what's going on here is i put my um i put my starbucks into the counties as a layer the x y plot over top of the population core flip so that's mark that goes back to that thing that you referenced earlier that it sort of looks like um people in rural areas don't drink starbucks um or it looks like corporate businesses work hard to be in population centers depends on how you want you know how you want to tell the story um and what's more accurate right so that's probably enough for now let's go ahead and hit um exercise show you one thing and then we're going to move back to uh the third exercise so there's this package called gg map which is definitely on the mapping side of not specifically being spatial analysis in that in at least from my opinion now you can extend gg map a long way if you know gg plot um it will help you use gg map without too much effort um in that a lot of layering and the connection the connector with the plus symbol is the same uh and you can definitely use gg map to make some interesting layers but what i think it's exceptionally good at is just making sort of quick simple static maps right so uh since that is actually a use case that sometimes shows up because people need to either point them in a publication or make a poster uh we're just going to cover it real quickly i'm going to read in this starbucks data like i did before and we're going to limit down to north carolina and what you'll note i want you to note actually going to go back over to to the uh to this website under plot coordinates static map to show you this is what people were doing basically you could say something like get map during north carolina and set the zoom level um and then under map type the easiest thing to do was to use one of the known variables for google maps right so terrain satellite roadmap for hybrid and then you get something like this and then you could overlay heat maps or um x y plots or whatever on top of that it was a very good way to get a quick static map which i think of for use in some kind of publication or printed thing right a lot of what we've been doing is we've been using leaflet which has a very nice interactive layer to it but in order to share that with others you need to send them an html file which is not hard or mounted on the web which is moderately more complicated um but this is sort of a different use case now sadly uh google completely rewrote their api about eight months ago and so in order to do this you can still do it but you have to go through some of google's documentation on apis and while that is not hard it's probably more tedious than it is um trivial like it's just you just got to read a bunch of documentation you have to put in a credit card number that they never end up using they basically give you this huge amount of free credits it's all just to make sure that you don't run a muck so you can still do this if you want to google map but the easiest thing to do now is just to switch over to something like stamen maps so since we're talking about publications we're going to use a i'm going to show you a stamen map example the nice thing about stamen maps really in my mind is that they're exceptionally good contracts for black and white all right so what you're doing here in stamen and what is different is that you have to identify a bounding box by latitude and longitude you can check the direct uh the documentation under under location for get map but prior to that if you did have the google api setup you could just put in something like burlington north carolina it would give you back a natural bounding box for burlington you could set the zoom but in this case you have to just be a little bit more prescriptive and identify your bounding box and then the other things that's happening is we're setting uh the source equal to stamen there are several other sources and the map type equal to toner now once again i just want to highlight this if you look in the documentation on that's on board for get map you go down here to source it will tell you which sources you can use and if you go down to map type it will tell you which map types work for which sources so you can't really make those variables up but they're easy to gather um and then in the last example we even saw an example of how you can extend that out with additional basemaps right so everything i'm going to do here first i'm going to identify the map box and i'm just going to put that uh vector in here so it's a little easier to read i'm going to put all that from get map into an object and then the main command is just gg map and i'm identifying this object as this object just to make it easy to see and what i end up there with there is that stamen map that you just saw a minute ago with the toner by contrast black and white um and an example of how you can extend it with the gg map gg plot like syntax is i'm adding this g on point layer on top of that um so i can look at my starbucks example again and i have control over what the what the points look like and how big they are in this case they're red um and then i can also just like you can with all with all gg plot stuff i can affect the theme a bit if i run that whole program again um i'm going to take out i think i'm taking out yeah i'm taking out the latitude and longitude and i'm taking out these x and y labels um and if you want to learn more about gg plot you can right there is you know something that you could imagine would more easily fit into a publication that you're submitting all right but moving along let's pick up thematic mapping and before i get too far along on this go back here to my slide so cora plus are a form of thematic mapping um or anything on this slide that i really want you to know some of this we've already talked about sf is a is a predator is a successor to sp and it's very easy to coerce the data objects back and forth um and this link will actually tell you how to do that um there's some nice vignettes uh and another thing that's actually pretty handy is that you can read and write shape files in the raw with these two functions so if you're getting a hold of shape files in a way different from what we just did right we just use tiny senses to pull them down and manage all that for us but people get their shape files from a lot of sources one is to ask mark or send a note to ask data um we're going to use a package here in a second called tigris which is a cousin to tidy senses but doesn't require um an api key doesn't do as much as tidy senses but what it does do is pulls down shape files if you had to share those shape files with somebody else who wasn't doing the same computational uh wasn't didn't have the same computational style as you maybe maybe they're working in python maybe they're working in um arc gis just need to share the shape files you might do something like this where you write out your shape files and make share them and then they can do their own analysis but basically in terms of workflow for uh dramatic mapping we're going to use tigris to pull a shape file we're going to get some data from somewhere else in this case bureau of labor statistics data has nothing to do with the senses we're going to have to join those the example that i'm using actually here this slide needs to be updated um it's using the function append data which is part of the team at library but you can actually use the deployer library left join does the same thing um and that's what we're going to do in our example and then lastly we're going to visualize we're going to focus on the gg plot visualization with the function geon sf which works naturally with the sf library but there are some others here that are actually far simpler to use um but you have a little less control over the final product so let's just let me just see if i can show you what i mean by far simpler to use and then we'll quickly move on right so over here thematic mapping t-map right uh so if you admit if you wrangle all your data and get what you need you can actually just send your your data frame that has polygons in it as shape files in it you just send it to qtm and you can see that it will automatically develop your color ramp it'll develop your labels um in this case it's doing the whole of the us and because the us territories actually expand a huge amount of the globe it's not a particularly attractive map um so of course you have to do a little bit more wrangling like in this case we're filtering out we're saying we don't want region nine we looked at those census regions earlier we don't want to ask them we don't want hawaii and we get here's a good example of a perfectly fine map that's unprojected so tm shape which is a t-map function did all this with minimal effort it came up with the the breaks and it came up with the color scheme and it's very readable the only thing i don't like about this is the projection right this flat line across the top is a dead giveaway but it's not projected very well because that's not what the united states really looks like and projection is a big thing in gis i just want you to note that there's a way to set your transforms using a different projection that's the syntax for it and there you get a more natural shape uh which people prefer seem to prefer all right so we're going to move on to close this close this zero three two thematic mapping and run all these libraries and the next function we're going to run is from tigress that's this right here and it is um using the states function to pull down all of us state polygon shapes the shape files and by default now we're calling it class sf if you uh don't have sf working you could put class sp in there and it might work so now we have this shape file us geo if we look at that real quickly uh these kind of shape files that and in this case come from the census are full of a whole lot of metadata that we don't particularly care about like the region and the division and the state code and all that even the postal service state code but what we do care about is this this thing right here the shape which is referred to as the geometry so that's the reason why we did this so we could get the shape and then over here uh i pulled down some bls data that i described above how i grabbed that and basically end up with a two variable data frame that has area name which we're going to have to munch a little bit because we don't want these codes and it has um the Nelson there annual mean wages so if we keep wrangling that uh we end up with renaming some stuff fixing the names putting in a data frame called bls join and having a look this is where we're doing our left joint right so what we're joining is gs us geo which is has our shape files and we're joining it to bls wage right and we're joining it by name equals state so i think in us geo if we look at it uh here's name right here this is by the way i always hate this like the worst way to do a join is is alpha numerically like if you could if you could join on a code you'd be way better on the numeric code to be way better off but in this case we didn't have it and we're going to join that to bls wage and so this is where state name equals the name of us geo and what we end up with is a slightly bigger table that has and all the only reason why we did this at all is so that we could have this geometry in the right spot with this wage number filter out those regions that we don't want to show up in our visualization and then here's the here's the code that actually allows us to visualize it with ggplot right so everything we did up until that point was just data wrangling to the states now we're going to ggplot where we're filling and color color and fill our this thing fill is the internal part of the polygon and color is going to be the border around it we're setting it to the same variable using gmsf setting the projection once again and in this case creating our color ramp with viridus and this will generate hopefully a pretty nice map to wait non there it is so um since it is in ggplot there's a lot more you can do to get rid of these or alternately you can see the lat lines and the gray scale and the legend and all those things can be managed and if you look deeper into the documentation i'm sharing you're welcome to manage that and you know the nice thing about doing all this in r is that you have a lot of fine-grained control the downside of doing it in r is that you have to attend to the fine-grained control otherwise you're not going to get a map so and that's that's the cases where like i don't if you don't really want to attend to it use something like tmap and the qtm function that i just briefly went over um there are lots of ways to make choreoplasts um these are just two examples so given that i'm going to suggest now that you so here's the really free form part right the last exercise exercise three doesn't actually have any answers so i'm going to invite you to do those while i'm around um if you've had enough give me some feedback on the post-its and put them up on the door as you're on your way out um if you have specific questions i can try and answer them now what i usually say to people is these workshops are good for giving a broad overview but really horrible for a particular project and if you're really going to learn this stuff you need to engage with a particular project so please come see me wednesdays one to three set up an appointment go see mark we are more than happy to make this relevant to your project but in a workshop setting it's often difficult to do um but i appreciate your attention i appreciate you coming and that is the end of the workshop so thanks so much