 Good morning everyone. Welcome to the part two of mapping crime data in R. My name is Nadia Kenner, I'm a research associate with the UK Data Service and we'll be running the live code demonstration today. Yeah, the code demonstration is split into three main topics as mentioned yesterday. We'll start by briefly exploring our crime data. We'll start by making some basic maps. We'll then move on to exploring shape files and how we can use these to combine our crime data. And then we'll move on to adding another data set known as the census data set where we look at the differences between mapping crime rate and mapping crime count. We'll also look at some extra topics or I'll leave this kind of just for you to explore. But there's some really useful information about binning and about mapping using Google API's gg maps. But yeah, given the time we'll see we'll see where we're up with that. And yeah, just a little little rundown. I'll quickly demonstrate how you can get the project available on your computer. If you haven't yet done so, I won't spend too long on this, maybe just about five minutes. Just make sure that everyone has access to it because the data, the code and some extra information is all available on that link. So I'll go ahead and just quickly show how to do this. So this is the repo for the UK data service crime drain off. There are three workshops December February and June. And in order to clone this repo, all you need to do is click this big green code button copies the link here, which is a HTTPS link. And then you want to move back over to your R studios. Assuming you've just opened and opened this up, you should have an empty empty R studio. So don't worry about what you see at the moment. But all you want to do is click file, click new project. Don't worry about that. This will open up a option to create a new project. You then want to click version control. GitHub is a is a version control system. It basically allows you to work on code at the same time as other people, allowing for like, better reproducibility. And then you want to click get. From here, all you need to do is paste that URL link that we copied from GitHub. And then automatically create the project directory name. And then all you need to do is save it somewhere on your computer. And that's it. I would advise clicking open in a new session. Just because if you have anything else open, you don't want to override or cause cause a crash because I love to do that. And then that says that's it. So as soon as you create that new project, everything that you've seen on the GitHub will be available on your own computer. So you can get involved at the same time as I am. I'm not going to do this because I already have it up. Yep. And within this GitHub account, we have four main files, we have the preliminary preliminary tasks, which indicate how to set up your working directory to make sure that's done. And then we have all the packages and all the packages that we need. So those are the two things you just need to make sure you have done. Set the working directory and install the packages. And then from there, we're going to be working through topic one, topic two, and our topic three. Yeah. I suppose I'm just going to give it two minutes just to let make sure that everyone's in the store these packages because I realized I forgot to mention this yesterday. And I know that it could take a while to get packages downloaded on your computer. So yeah, just give it two minutes while while that happens. Hopefully, everyone knows how to set your working directory. It was it was set as a prerequisite for this course. Hopefully no one's struggling or lagging too far behind. Someone has asked, it tells me to download Git, so where should I download it? Yes, you should have a, you need to get downloaded on your computer in order to then create a GitHub account where you can then access the code. And we're going to start to go through the code. So let's start with topic one, which looks at exploring our crime data. This first bit, this first line of code, which lists the RM list equals LS that that's not necessarily relevant. That was just a code that cleared your global environment. So that was just more for me, but that's the only thing. So yes, first thing you want to do is load your packages. As I'm using an R markdown file, you can connect the little green arrows that will load all the codes in the chunks at the same time. Now that's all loaded. We can go ahead and download our crime data set. We're using the crime data set from the police dot UK. This is open recorded police statistics, so it's freely available to anyone. And we're going to be looking at the years August 2020 to August 2021 from Surrey. No reason, but just a random error that shows. There are, I've included a Google doc in how to download the data in this data folder, but you can just use this bit of code here that will automatically download this data set for you. So we are going to read in the month of August 2020. So what we're going to do is call on a new variable or new objects already called crime, use the assignment operator and the function read underscore CSV and include that data set from the data folder, which can be found here. So looking at the month 2020 right there. And that will automatically read in that data set. It is a CSV file. And there is. And I'm also using the function clean names from the janitor package just to make the names all okay. So it's easier to use for the data manipulation. So we can run that code. And as you can see, we have our crime data set loaded into our environment we have 8,912 observations and 12 variables. Our first step is just to kind of explore this data set to see what we're actually dealing with to see what type of variables we have to do this we can use the head function. And what this does is list the first. I want to say six rows of data here six would be right. Right there. And all the columns. The crime ID which is not really relevant to us. We have the month we have where it was reported where the crime falls within. And this is information that issues which will become very important to us, and we have the location, the LSO code, the SSA name. The crime type, the last outcome category and context which just read as any for some context the LSO a name that stands for the lower layer super output area. And this is a type of type of type of war is a type of. government statistics that helps identify smaller areas within, within Surrey in this instance. You can also use the glimpse function to, to view the dataset and it reads it out like this, which is a bit better. But yeah, choices, choices up to you to, to use what you want. Yeah, we're going to move on. So as I mentioned yesterday, we talked about points, lines and polygons as a type of spatial data. In our instance, our coordinate variables, which should be the last shoot and the longitude are known as that point data. Here are the variables for that. They indicate one point where a crime has happened. We then have the location variable, which represents the line. This is normally defined by a straight or a junction. So here we see on or near slash park slash open space. This is a type of line data. And then last week, we have the LSOA name which represents our polygon. And this is what we'll be using as our unit measure for this, for this webinar. Now, we currently have a data frame that is not spatial. Our studios does not know that this is a spatial data frame. Yes, it contains these points, lines and polygons, but currently it just treats it as like a table, which is treated as a very basic data frame. So in order to work with all the functions that come within R, we need to tell R that this is a spatial data frame. And we can do this by assigning it as a simple features. Now, a simple features is a really common R language. And it basically just allows you to handle and manipulate those points, lines and polygons. There is more information if you'd like to read up on by yourself about what exactly our simple features are. But what we're going to do is assign your longitude and latitude variables to a simple features coordinate. And I'm going to show you exactly how to do that. As mentioned yesterday, we talked about coordinate reference systems. So yeah, we talked about coordinate reference systems as a like a spatial reference, right, which defines a specific map projection. There are thousands of coordinate reference systems. And yesterday I mentioned that the most common is the WGS84. There is also the BNG, which is identified as a British national grid. Now, with each coordinate reference system, this is where it gets even more confusing. There's always more context. But each coordinate reference system is assigned an ESPG identifier. And I can't quite remember what ESPG stands for. I won't say European. Lost me. If anyone knows, you can leave that in the chat. But yeah, each CRS has a unique ESP identifier, which allows you to assign a non-spatial data frame into a spatial data frame. The BNG ESPG is 20,700. The WGS84 uses the ESPG 4R26. These are kind of like the three most common systems. So I've left them in here just for your context. But yeah, we can go ahead and basically change our data frame into a simple features by including an ESPG identifier. Now, I also mentioned yesterday that if your data frame contains longitude or latitude, then the WGS84 is the system that you would use. So we'll be looking at the 4326 identifier. So let's go ahead and have a look at how we can identify whether a data frame is assigned a coordinate reference system. We can use the function st underscore CRS to do so. So if we run this, we'll see that we have NA. There is no current coordinate reference system set, which tells us that this data frame is non-spatial. So in order to assign one, in order to turn this data frame spatial, we can use the st underscore as SF, which stands for the simple features. I am assigning this to a new variable called SF just to make things a little clearer. So we're not overriding the original data set. We call on our crime data set, use the chords function. And in our instance, our reference, our geographical variables are the longitude and latitude. And the CRS is the 4326. We also just make false the NA. So if we run this, nothing really happens, but you'll see that we have our new data set added in the SF, and we have one less variable. And this is because it's combined that longitude and latitude into a official geometry. So to check this, we can run the st underscore CRS function again. And now you'll see that an ESPG identifier has been assigned to a data set, which is excellent. So let's have a look at what's changed in the data set. We can run the head function again. We scroll over to the end. You'll see that the longitude and latitude variables have been removed. And we now have a geometry variable. And this will allow us to make maps within ggplot and the other packages. Just for some reference, if you had Northings and Eastings, I realized that this was a question yesterday. Typically, the Northings and Eastings would use the BNG identifier that is a British national grid. So you would change this to Northing and Eastings, and then you would put in the CRS, which is the 277000. And that will make your Northings and Eastings into one variable for your geometry. Now we've gone ahead and done that. We're going to start to map the point data, which is where it gets fun. So now we have an official simple features object that has an official spatial object that contains some point level data. So how can we go ahead and create a basic point map? How can we point the crimes in Surrey? For this, we're going to be using ggplot and the geom SF function. And all we do is call on our data set named SF, which was just created earlier. Now if we run this, we will get a, let's get rid of that bit. We have this cool looking image, which highlights the shape of Surrey and all the points of crime that have happened within the year of August 2020. Now this looked all right, but it's not that readable. It's not much we could do with this. So let's start to add some more context to these maps. One way to do this is to color the different crime types. And we can do this by adding a specific aesthetic using the call function. So what we need to do is add another bracket and supply the variable crime type from the SF object to call. Now if we run that, give it a couple seconds because this is an issue with maps. They just take a while. And now we have a bit more of an interesting map. We have one that identifies all the different crime types associated with some sort of color. Although obviously there's a lot of overlap in this. We're going to talk about how we can limit that variation a little later on. Another way to spice up the map could be to add a reference map. Remember we talked about this yesterday. A reference map also is known as the base map. So what it basically does is just adds a normal map behind what we have. And we do this by using the annotation map tile function. That's the only change to that code. It's just an additional line where we've added this function. And this is from the GG spatial package if I remember correctly. Yeah, I think it is. Yeah, GG spatial package for reference. So if we run this again, it's going to take a while, especially because we're using a base map. And there we have. We now have our reference map added to our crime type. So this is just a little bit more readable. And now we know we have sorry located on a map and not just on a grid. All right. Now let's just say you were interested in looking at one type of crime. That map there listed all 12 crimes at the time, but maybe that's not your interest. So in this instance, I'm going to show you how to subset for just one type of crime. And we're going to be looking at antisocial behaviour. Now most of this code is just from the Diplo package, which is how you subset a specific crime type. So what I'm doing is calling on a new object called ASB standing for antisocial behaviour using the subset function and calling on the crime type from the SF package. I'm also using the select function just to remove a few variables that are not of interest to us. So if we run this code, we have a new package, sorry, a new object called ASB. So let's have a quick look at how this looks. So we have our months where it was reported again, the location, but most importantly, we're only looking at the crime types now. And we still have that geometry, which is important for us. So let's go ahead and map this like we did. In the same way, we're going to use the ggplot function, the annotation map tile, call on the gmsf and call on our dataset called ASB. I am assigning this to a new variable called ASB map, just for clarity. And you'll see that now that's been added to our objects. In order to plot this, you can just simply type out the object, press control enter or option enter depending on if your window is on that user. And now we have a more, call it the final map of just ASB in sorry. So you can see there's much less jittering. And yeah, it's a great little way just if you're looking at one crime type. Cool. So I'm going to give you guys about, I'm going to call it five, five to 10 minutes to run these activities on your own computer. There are, I want you guys to kind of look at how anti-social behavior compares to the crime type drugs. So I've left some like partial code here if you guys think do this on your own computer. Follow the steps and in five minutes I'll come back and write in the rest of the code. A question in the chat has been asked and it says if there's any restrictions on using LSOO, LSOA data. As we're using open recorded police statistics in this instance, there are no restrictions because this is a variable that comes free to download from police..uk. It is a form of small area statistics used very, very widely by government statistics. So most analysis is available to be done by researchers, uni students or elsewhere. All right, it's been about five minutes. So I'm going to go ahead and fill in the questions for the activity. So the first step was to subset the data for those crime types recorded as drugs. So all we'd have to do is simply use the subset function, call on our simple features object which we named SF, call on the variable crime type and then call on the type of crime which was named drugs. I don't think it's capital, maybe it's capital, might get an error. Should be fine. And then for step two I've just asked you to assign this to a new variable called drugs. So we'll do the same thing, crime types and we'll call this, let me just make sure that it is, yeah, I think it is a capital, capital D apology booth. There we go, capital D. That's the issue of R, R is case sensitive so it can be a little bit annoying. And there's no S at the end of crime types. There we go. And then step three was to use Judy Plot to plot the point data over a base map. So we call on Judy Plot and then we use the annotation map tile function and we call on the geome SF function and the data set we want that we've just created is named drugs. And now we have a, we will have a map of drugs and sorry. So what can we say about the maps? I mean the amount of crime is produced in August 2020 of drugs compared to antisocial behavior. We can say that there's probably much less drug activity happening in this month compared to antisocial behavior. Is this suspected? Is this what we would have thought? These are questions that you should be asking yourself definitely as researchers, as uni students. But obviously antisocial behavior is a, I'll give you a much more common crime type so this is what we would expect to see. Yeah. That is topic one. We have explored the crime data set and we've shown how to produce some really intricate maps using using Judy Plot and geome underscore SF. I'm going to move on to topic two in just a minute. In this topic, we're going to be looking at how to use shapefiles to enhance our maps and why, why, what are shapefiles and why we use them. So yeah, we'll just start in two minutes. Just make sure that everyone's kind of caught up from the last activities. So while we're waiting, you can just load the packages needed for this topic. I'm just going to address a question I've seen in the Q&A, which have asked, are there any options to use different reference maps? For example, if we had smaller geographic areas, would it show buildings? Yes, this is possible, but it would involve using additional data sets where you would then have to join that data set to your crime data set. This is called, this is known as a street level mapping. What you would do is use postcode addresses to identify which buildings are where and then you would need to basically join this to your data set. I'm not too confident on this, I tried to run a project using street lights a few years ago and that was really interesting, where I geocoded the addresses to street lamps and compared these to different crime types. So yeah, it's definitely possible to go smaller, but it just involves a lot more conversation. All right, so what is a shapefile and why are we using it? Basically, shapefiles, they come under the SF package and they represent a geospatial vector that is used for GIS software. Now shapefiles, they store both geographic location and it's associated with attribute information. The common format in GIS is that it stores vector data, so this is our lines, our points and our polygons and it stores them as a single feature class, which means it will store it as a single type. They tend not to mix up the points, lines and polygons, so you never really work with all three of them. Now shapefiles contain multiple files within them. You have as listed here .shx, .shp, .tbf and .prj. Not all completely necessary to know right now, but it's quite interesting. We're going to be mainly using the .shp file and this contains the geometry data. I'm actually going to just show you here how that data looks like and those four type files are situated. So again, I've already downloaded the shapefile for you, so there's no need to go ahead and do this, all is available here. But as you can see, once I downloaded the shapefile, I had all of these attachments or files associated with them. We've got the projection, which contains the coordinates and the projection information. We have attribute information. We have the shapefile itself, which contains the geometry data and then we have the shx, which is a bit complicated, but it's like the positional index of the feature geometry, but not completely necessary to know right now. But yeah, let's go ahead and read in the shapefile for Surrey Heath, because that is the area we're interested in. So in the shapefile, we use the st underscore read function and I'm calling this to a new object called shp underscore file. Now you might know it's a double colon here. This doesn't change the code. This is just letting you know that this function comes from this package. So if we run this, we get this information and it says that we have an S3 shapefile, which is a multi-polygon and it is named England LSOO LSOA 2001. So yeah, the first step would be to go ahead and plot this shapefile. We're going to plot it as an empty shapefile, so there's no information about crime types here. This is simply just an empty representation of all the LSOAs in Surrey Heath, and it's really similar to how we plot the crime data. We're going to do this using ggplot and using geom underscore sf function again, and we call on the data, call the dataset. We run this again, give it a minute, and now we have an image of each LSOA from Surrey Heath, all of which obviously like bounding each other. This is really useful, because now with this geometry attribute, we'll be able to plot crime more efficiently and more accurately to each LSOA, and we'll be able to identify the boundaries. In topic one, we just had the area in a hole, and we didn't have all these boundaries and all these areas, so much more efficient in terms of the introduction of policy amendment and figuring out what crimes have happened. We can also use the head function to just view this file to see what we're dealing with, but we've already had a look at this. So yeah, we have the name, the code, and the geometry. Now this geometry variable is important because this is the variable that we have in our crime data set. So what we need to do is somehow join these two datasets together so that we can plot the crime data set on top of this. There is more information about shapefiles, but yeah, I'm not going to read that now. It's just to kind of inform you that this is an empty shapefile and there's no context provided yet. Yeah, we're going to go ahead and join these two datasets. Firstly, we can view the geometry available by using the ST underscore geometry function, which I think is also from the SF package. Again, if you run this, it just tells you that this is a multi polygon. So this is still a polygon file because there are multiple LSOAs. It's naming this as a multi polygon for specifically for 55 LSOAs. And yeah, this function is just how you can contain the geometries of the list. So how do we go ahead and join these two datasets together? Your first step is to group the crimes per LSOA. So now we're walking back to our crime dataset. The original crime data set contains the individual count of repeated crime types across LSOAs. Therefore, the LSOAs are repeated multiple times. And this is because you would expect to see multiple crime types in one LSOA. If you think back to that first image we made where there was a lot of overlap, that is why, because there are multiple crime types happening in that one area. But in order to highlight how many crimes have occurred in each LSOA in each area, we can count the crimes per area and obtain a group statistics. So in order to do this, we can just use some simple data wrangling, some simple data manipulation to go ahead and do this. Again, I'm going to create a new object, this time called crimes grouped by LSOA. I tend to create new objects as I go. Some people tend to just override the original, but I like to keep things separate so that there's, um, yes, I don't override anything and it doesn't cause any confusion. Yeah, so we're going to again use the assignment operator to call on our crime dataset. And then we're going to group the LSOA code and then count each crime within each LSOA. So this code does that all at once, which is a really nifty bit of code. And if we do this, you see that we have this new variable or new dataset here called crimes crimes grouped by LSOAs containing just two variables. So let's have a quick look at what this looks like. As you can see, we have the LSOA and the count of crime within each area. That's brilliant because now that means that we can go ahead and join this to our original dataset, I mean to the shapefile. So let's go ahead and merge the shapefile to our crime dataset. To do this, we're going to use a little function called left underscore join. The left join function basically returns all the rows of the table on the left side of the join and matches the rows for the table on the right side of the join. So what I'm doing here is creating a new object again called sorry LSOA using that left join function. Now we call on our first object of interest, which is the empty shapefile. And then we call on the crimes grouped by LSOA, which we have just made. And now we call on two variables that match within each other. And we want to basically join these by the codes right by the LSOA codes. So in the shapefile, the LSOA variable name is code. And in the crimes grouped by LSOA, the LSOA variable is named this. I'm saying LSOA a lot, apologies. But let's go ahead and create this file. Brilliant. So that's been worked successfully. And you can see we have 55 observations and five variables. Let's have a quick look at what we're looking at here. So now we've got six features and four fields. So now we have that camera variable added to our empty LSOA, which is really useful. And we see that the geometry has matched up effectively. These extra functions are just ways to explore the dataset a bit more. I've already shown you what SC geometry does. This just shows you the type of vector that you have. We have a multi-polygon. And then we have the STBOX, which basically obtains the object values as specific units. It's not that important really. But yeah, now we can move on to mapping this data. Again, we use the same kind of code we used in topic one. So we call on ggplot. We are going to add a base map so that involves adding the annotation map tile. And then we're going to call on geom underscore sf. And we're calling on that new dataset called sorry LSOA. And you use the aesthetics function to fill the map with the crime count. That new variable was called count. And we're just using alpha 0.05 that just decreases the transparency. And then we use the scale underscore fill gradient just to also just improve the aesthetics. So if we run this, just to give it 10 seconds or so, as I said, it's all a bit slow. But now we have a much more detailed and intricate map of the crime types. In sorry, we have this really nice gradated x axis. And we have a base map and we have some transparency. So this is a much more readable and effective map if you were to study specific crime types. I'd also like to just introduce a different way how you could plot maps, which is by using the T maps package. The T much package basically allows you to create thematic maps. There are two types. There's the the view function, which is a like an ordinary image, like we've just seen. And then there's the plot function, which is sorry wrong way around the view image is the interactive map. And the plot is an ordinary image. So in order to change this map that we've just seen into an interactive map, all you need to do is use the team up underscore mode function to change this to view. So run this, you see that team up has now been set to interactive viewing. And we can run the same code. Probably it's not the same code. We use the TM underscore shape function from the SF package. Call on sorry, LSOA. We use TM underscore field to count that crime type. And I've just set the borders to green and again, load the transparency so we can see the borders. So if we run this, we now have an interactive map, which means we can see a bit more nicely, which LSOA has the highest crime type as you can see this one here does. We can also zoom in and zoom out, which is really interesting because you can put that in location to London and whatnot. But yeah, that's a really fun way to make maps a bit more interactive. The last little topic I'd like to talk about in this section is classification methods. Now, how can we better visualize these counts? How can we take it one step further? Well, count data does not equally represent the population distribution at hand, but the team apps allows you to alter the characteristics of these thematic maps. That is this function here, the TM shape. And what we can basically do is create different styles and different styles result in different binning techniques. Binning technique is just a way that the crimes have been grouped. So in our previous map, I don't know why I got rid of it, I'll run it again. You'll see that the crime count, the binning has automatically been done for you. We've got one to 10, 11 to 20, and so on, so on. But you might not necessarily want this binning technique, you might want to do something completely different. And there are classification methods in place to allow you to do so. So in this next chunk of code, I've gone ahead and used three different techniques or three different classification methods. I've looked at the K-means, I've looked at Jenks, and I've looked at standard deviation. I'm not going to go into too much detail about what these means because it's quite math-sy and quite statistics-based, but in short, the Jenks and the K-means, they tend to minimize within group distances. And yeah, you can have a read of what these definitions mean a bit more thoroughly if you'd like. But I'm just going to go ahead and show you how we can change the style of the maps. So very similar to what we have done just up here by adding a style function to change this classification method. So what I've done here, I've created three different maps, adding in a style function and calling on the different classifications. We've got K-means, we've got Jenks, and I've got standard deviation. I've also assigned these to their own objects just again to make things a little neater. So if we run all through, you see A has been added to our environment, B has been added to our environment, and C has been added to our environment. Now you could simply plot these maps by simply typing in one of them and typing in A. And here we have a different way that our crime counts have been classified. You can have a look at B. And again, you see how the classification methods have changed. We're now looking at 1 to 7 and 38 to 53 and so on and so on. And then we have C, which is the standard deviation. Now there are many different ways to classify your data. These are not limited. There is a whole list to choose from, but you must think carefully about the choices you make, as this might affect your conclusions and your interpretations. And yeah, whatever kind of one you do use, I'd recommend some sort of consistency in your maps. Like you can't compare a standard deviation map to a K-means map because the way that these have been clustered is very different, right? Yeah, let's just say that you wanted to map all three together to show how different these can be. We can use the T-MapsArrange function to do so. So what I'm going to do is first change our viewing, our mode, back to the basic image mode. So I want to get rid of that interaction. Now we see T-MapsMode has been set to plotting. So we're no longer going to have an interactive map. And I'm going to use T-MapsArrange, plot in the A, B and C maps that we created. So that is the genks, the K-means, the genks and the standard deviation in one image in one map, which is really nifty function. It's going to take a while because it's three maps. And then we have an image with the three maps and the three different classification methods. And you can see huge differences in how these have been classified. You can see changes in the color contrast and how different LSOAs have now increased and decreased in the crime count. So yeah, so just whichever one you do use, just make sure you understand the interpretations of these and what they might mean to your crime counts. Now we're going to move on to looking at using categorical variables, which is a type of, you can use small multitudes basically to plot maps using like categorical variables. I'm just going to skip most of this because it's not too important right now, but again you can read this in your in your own time. But let's just say that you wanted to plot each LSOA individually in that you didn't want them bounded together. You wanted to plot them side by side. We could do this by using the TMFastFits function here. So I'm calling on the name variable, which is the name of the LSOA and just supplying some aesthetics. I'm also using the TMLayout function just to make the image a little bit readable. If we run this, we'll see that we have a much, again we're going to have to give it a minute. Don't worry about that warning sign. This is just because we have quite low crime counts, but if you're using higher crime counts this wouldn't be an issue. But yeah, now we have each LSOA and their shape. But you know how useful is this to you? How useful is having each one separated out together? But it might be more useful if you had some sort of categorical variable present in your dataset. Maybe if you had like the deprivation level, you know if you was using the IMD, which is a scale from 1 to 10, if you had this variable present, which is possible to join, you could then use the TMFastFits to plot each area depending on its separation, which might be really interesting. I'm just going to skip the additional features. This is just kind of for you guys if you want to know about how you can make your maps a bit more aesthetically pleasing to those ways to change the style, ways to change the legends. And you can also add cool things like compasses, scale bars and grids. In fact, I'm just going to run this one because I really like the way this one looks. So yeah, you can get like really cool little things added. You can get a compass, a scale bar, a grid, which just makes your maps a bit more fun to read. But yeah, I'm pretty sure that is the end of activity topic two. So I've been talking for a while, so I'll give you guys five to ten minutes to explore activity two. There isn't too much to do for this one. This is more just exploring those different classification methods. And yeah, I want you guys to have a look at exploring the B class and H class methods. These are just different types. They're known as hierarchical clustering and bad clustering. And yeah, follow these steps in the activity and I'll come back in. I'll call it five minutes and then run through the answers with you. There's no specific questions about activity two. So I'm just going to go ahead and answer the fill in the activity question. I've seen a few questions in the Q&A, but I'm going to come back to those at the end. So yeah, let's go ahead and have a look at these activity questions. You're first asked to explore the methods B class and H class, then to assign these to the separate objects. Let's call them H and B. So to do so, let's start with the H class first. We'll just call this H. Use the TM underscore shape function and call on your object. We're using sorry, no quotes because it's an object. And then we'll be calling on the fill function in which instance we want the count. And then we want the style. So the style we wanted was the H class. There's more information about these if you use the help function on the right or you can help in the code as well. So yeah, that's our H class category. And we'll just go ahead and create our B class as well. Again, this will be really simple. So all you have to do is use your sorry LSOA object. We use count to fill this in. And then we have our style. In this case, I'm going to use the B class. So let's go ahead and run these. H and B, yep. I probably should have called that something else as we did have a previous map called B, but that's fine. It's just been overrated. And now I've asked you to plot these together using TMAP arrange. So you could plot these individually as shown. But yeah, this is really simple to do this. You just use the H, the B maps that we created under the TMAP arrange function. And you'll see that there's some computing going on because this is a more mathematical clustering method. But yeah, classification method. So here we have two new classification methods with we see that there's different bins have been created. So again, it's kind of really important to discuss the implications of these, why they're high accounts in certain LSOAs when using different classification methods. And then question three, I would ask you to plot an interactive map using the B class method by changing the command in the TMAP underscore mode function. So again, to do this, we simply tell off that we want to look at the interactive mode, which is done by using view. And that's been now set to interactive viewing. And then we can go ahead and make that map for that again. So we can use story LSOA. We're going to use the count again. And we can choose the style, which is the now if we run this map again, after setting the map to interactive, we now have interactive B cluster classification map on our crime count. So yeah, I hope that's been helpful in helping you kind of explore these different methods and what this effect can have on your work. But yes, finally, we're going to move on to our last topic, topic three, which involves looking at the differences between crime rate and crime count, and the effect that this might have on your work. So first steps is to obviously load the random packages. Most of these have been loaded in topics one and two, but it doesn't hurt to load them again. So for this topic, we're going to be looking at population statistics. So crime data is not entirely accurate of population density. So what whilst topics one and two have been really useful in identifying certain patterns, this point level open crime data is really used in the isolation for detailed analysis. For one thing, as we talked about yesterday, the data points are geomast, and this means that the points are highly likely to be overlapped, give a skewed picture of the distribution. And this links back to the question that was in the Q&A in that some of these points might have actually taken place in one LSOA, but have been shifted and geomast to another. Maybe it was sitting on a border and had just been automatically moved, and this might drastically change your crime count in one LSOA. But there are ways to go around this, such as jittering or applying census-based data. Jittering techniques are available in the additional topics. Given the time, I'll see if we have time to go through that. But we're going to be using the second method, which is applying census-based data. So what I've done is obtained some statistics from infuse from the UK data service. The data is available in this folder called censuspopulation, and we're going to be specifically looking at this CSV file here. Now infuse is really tricky to use, and I have attached a, I'll just show it now. I've attached this Word document, which has detailed how I've collected the crime data, which can be seen here. I've detailed how to collect the shapefiles, and I've also given some instructions on how to collect the census data. But it is very tedious, it's very clicky, so don't worry too much about that now, we'll just take way too much time, just use the data that's available. But what I've done is specifically selected the Workday population and the residential population from, from Sorry. Now you might be asking what exactly is Workday and residential population? Well, in short, the residential population reflects the usual activity of an area, whereas the Workday population reflects who works there, who resides in the area, and those that either work from home or who do not work. So you might have, yes, you might expect an increased Workday population as opposed to a residential population. But yeah, we're going to look at how we can use these to, to apply some more context to our crime data. So the first step is just to read in this data set. I'm using a new object called pop, just stands for population, and we can use the read underscore CSV function to do so. The slides function basically removes, and hits the first three rows of data, because they weren't, they were just titles and not actual variables. And I've also selected only the columns of interest. Again, I'm using janitor to clean the names, to make them all underscore, I mean, all lowercase, sorry. And I've also gone ahead and renamed some of the variables because they have been given silly names like the Workday and residential population, they have silly numbers. So I've just renamed these so they're a bit clearer for us in analysis. And I've also had to mutate and convert these variables to numeric because they are additionally where there's character variables, but they are numeric. So yeah, this is a mutate app function which allows you to do that across multiple variables. So yeah, let's go ahead and run this. And then we can have a quick look at our dataset again using the head function. So as you can see, we have our geocode, the area that falls in, the population density, which was another variable that I added, but not completely necessary for this workshop. We have the population count from the Workday population, and then we have the residential population looking at the count. So our next step is now to join this population dataset to our sorry LSOO, which includes the count of crime from each LSOO. So we want to merge these again so that we have additional information or attribute information added to our sorry shape file. So again, to do this, we use that left join function. And we're calling on our first dataset, which is our shape file, with all the information that we made in topic two. And we're calling on pop, which is that census dataset that includes those two variables of interest. And we join these by the code and geocode, which is just the LSOA. So let's go ahead and do that. And let's have a look at what we've got now. As you can see, we now have the population count and the population, sorry, the Workday population and the residential population added to our shape file as including the original count. In fact, I might just open this up so you can see it a bit clearer, there's a lot of variables. We'll take a minute, sorry. Yeah, so yeah, we have that count that we've added in. Now we have the population count from the Workday and the residential as well as our geometry files. Don't worry about these clipboards, they're just because the list of geometries are too long for that one area that they just listed like that, but it's not an issue. But yeah, now we have these variables, we can go ahead and start to figure out how to calculate the crime rate. Now a crime rate is basically calculated by dividing the number of reported crimes by the total population and then multiplying by 100,000. In short, there's a little equation there for you. So don't get too confused, it's really not too difficult. You take the count, you take your population of interest and then we divide it by 1000. So for our data set, I'm going to start by looking at the Workday population. So in order to create this count, all you need to do is divide the raw count by the Workday population and times it by 1000. You might be thinking, why are we using 1000? Isn't that a really random number? Well, this is actually the average population of an LSOA in England. So this number might vary depending on the boundary you're looking at, the country you're looking at, and so on and so on. So what I've done is, again, assigned a, I'm going to keep the same name, sorry, LSOA, and used the mutate function to create a new variable called crime rate. This is how you would do that. Mutate, I believe, is found in the DIPLA package, or what-to-say. And yeah, we use brackets to calculate this new variable. So we assign that Workday population count and the count and we times by 1000 and we'll get our crime rate. So if we go ahead and run this and we'll have a quick look at the data set again, I'll just view it here so it's a little bit clearer. We should have a new variable that indicates the crime count from the Workday population. And here we have it. We have this, now we have our rate of crime per Workday population. So what can we do with that? What was, how can we start to explore this and map data? We're going to use pretty much the same functions that we use in Topics 1 to 2, following GGplot and TM maps. So let's start with GGplot. Again, we're going to call on a base map because I want to make this a bit more readable, a bit more context. We call on GeoMSF, apply our, sorry, LSOA data set, fill in by the crime rate, apply some aesthetics, and you'll get a map that looks like this with our crime rate. So what you could do is kind of compare this crime rate to the original crime count that we had and see the differences and see which areas have higher or lower counts. Now I'm going to show you just how to do this with the TM shapes. We've already done this, so I won't explain it too much. But yeah, we're just going to use TM shapes, fill this with crime rate, and we're just going to use the quantile style, which is like the most basic one. It's not like the standard deviation methods that we use. So yeah, and now we have this really cool crime rate map, which is also interactive because I hadn't changed the TM map mode, but that's right. And yeah, now we have the crime rate map. Cool, so yeah, the last thing I'd like to just talk about, I'll spend a few minutes on this, is cartograms and how we can use this with GGplot. A cartogram is basically a type of map where different geographic areas are modified based on a variable associated to those areas. There are two types of cartograms, contiguous and non-contiguous. This basically just means whether you want your LSOAs bounded or not. Quite similar to like the small multitudes that we looked at and having them split up. I'm going to be looking at sharing common borders, so I want to have all the LSOAs in Surrey loaded together. In order to create a cartogram, you need to basically join the statistical and the geographical data. So what you need to do is using the modified SF object, which is our Surrey LSOA, which has everything we need. And you need to pass the values of interest. You need to apply a weight value, which indicates some sort of population. So in this instance, I'm just going to look at the workday population. And I'm assigning this to a new variable called cart. You'll get a mean-sized error of iteration. This is just doing its calculations. And the cartogram is basically just another type of thematic map, by the way. It's one that was mentioned yesterday's talk. And you'll see how this can look and why this might not be effective for crime data. It might be more effective for, I'd probably say something like maybe trouble or weather, but you'll see why. And yeah, now we can go ahead and create this plot. And you'll see we'll get something that looks like this. This is an empty map. We haven't assigned the Surrey variable. So this is simply a cartogram of the LSOAs and Surrey's. So let's go ahead and add our population count. That's that weighted variable we just calculated. You do this by simply applying an aesthetic and using the fill function. So on this, you'll now see a map, a cartogram, with our crime count across, sorry, which is very interesting. It depends, you know, like how readable is this map, how reproducible is it for you and your work. And you can also go ahead and basically add more ascetics to this. You can add a title, you can add change the colors, you can add subtitles. I think one thing to think about when creating maps is to consider color blindness. That's a huge factor when making these maps, you know, how readable is this people who are colorblind. There are color pilots in place that allow you to change this in this variable here. So if you search in the help function, you can have a look and see what's what's best. Yeah, so now we've just finished up. Now you've learned how to create a crime rate and how you can go ahead and plot this. We move on to activity three. I'll give you guys five five minutes to have a go at this and then we'll work through this together. But what I want you to do is basically replicate what I've just done with the workday population but to have a go on the residential population now. So it's the exact same code but you're just changing a few variables to calculate the crime rate from the residential population. As I message in the chat saying that there's an error message, can't find cartogram, that is probably because I didn't tell you to install it or load it. You would probably need to install the package cartogram. You can see it online 125 here. So you just if you haven't installed it yet just run install packages and type in cartgram. Let me know if that works for you and then make sure to load the libraries off. But yeah we're coming towards the end of this this talk. So I'm going to go ahead and finish up this activity but we might run over a few minutes while I answer last minute questions but if you could before you leave or when you leave to do the survey evaluation. I mean a lot for me and my colleagues because it helps us to know what you guys want to see next. But yeah let's go ahead and finish off this the last activity. So your first step was to calculate the crime rate. It's again remembering that calculation you have the account variable and you divide this by your variable interest depending on your population. So I've asked you to have a look at the residential population is that right? Which was called pop count res and we're going to be timing it by 1000 as well. Was it the residential? Yeah it was residential. Great now we have that new variable added. You'll see now we have 11 variables which means we have a crime rate named crime rate 2 for our residential population. So now we can go ahead and plot this using gdplot. So you're going to simply call on your data set named sorry LSOA. And then we're going to fill this using the crime rates but 2 because this is relevant to the residential population. And now we can go ahead and run this and we'll have a nifty little if it loads gdplot of our residential population. And then we can do basically the exact same with our using tmaps which is just a different variation. And we want to fill by even I forgot crime rate. Crime rate 2. There we go. And there we have our more interactive map using a different package which is the tmaps. I tend to prefer tmaps to gdplot but I have realised that gdplot has more options in terms of editing aesthetics and kind of just making it look a bit bit prettier but personal choice right. And now I've asked you to compare how the workday versus the residential population looks like. So we can go ahead and do this by creating two new objects. I'm just going to call them ENF for sake. And we're going to fill this again by the crime underscore rate 2. I'll just one and then we'll compare this to the crime underscore rate 2 which is relevant to the residential population. So we go ahead and run this. I've also changed the alpha to 0.3 but you can change the semester on and see what will happen to the aesthetics. And now we can go ahead and plot these two together. Simply using tmap arrange. And we'll have this nice little map that hopefully. Perfect. And now we have two maps that show the crime rate compared to the workday population and the residential population. You can see that there are differences in the two in that there is a much lower maximum crime rate for the residential population. So what does that say about comparing crimes against different populations? If you were to use the census you could also have a look at ethnicity and demographic and see how crime rates vary across the two which would be really interesting. And it gives you scope to develop research questions and really ask questions about how crime changes among a society, among a demographic. And lastly I've had a little code just to show you or help you to explore the cartgrams. So it's really simple. You fill in the ggpop with the cart function and then you fill using the pop count. Whereas you can have a look and do the workday as well but first stop to you. And now we have this really nifty little cartgram of our residential population across the Surrey Heath. And yeah that draws conclusion to this talk. I hope that it has been beneficial. We do have a few minutes left so I'm willing to take questions or in fact you're welcome to hang around for you know another five minutes while I look at the additional tasks. There are also all the codes and solution I've put available in one on mark down file. So if you want to avoid all of the the extra information I put in the others then all the code can kind of just be found here as well as your answers from the activities. And yeah I'm going to just talk through the additional topics. Not necessarily go through them but so in the additional topics I'll just quickly show you what's here so you can go ahead and do this in your own time. But I mentioned that you can produce maps using gg maps which is using Google's API and this basically allows you to create a call it more realistic base map on top of your maps but unfortunately in order to do this you need to set up a connection to Google API and that could take quite a while. So I've included a separate RMD script which gives you a really step-by-step instructions on how to contain a Google API. It does take a while so this is only a few like you know if you're that if you love maps that much then go ahead and go for it. Just a warning it does require a credit card but it doesn't actually charge you. It's a very strange system I think it's for like security purposes but yeah you can have a go at exploring different areas in well across Google Maps across anywhere in the world and the code for that is all here and I've provided some aesthetics of that as well. Another interesting part is the binning the data which was kind of what I mentioned before but yeah this this with most crime data you or and with most simple features your your geometry so that is that combined x y coordinates are combined but sometimes you might want to split these up and have them separate so that you can start to bin your data a bit more effectively. Now there is a function that was created via this github link right here so I did not write this but if you run this function and this basically allows you to separate your longitude and elastos from your geometries and you can produce a more effective gg maps in short and it might look something like this and there's an error of course I'll have a look at that and I'll I'll update the code it's not sure why doesn't it we'll see and then I've included some more interactive information about the leaflet package which we didn't get to look at but this is just a different way to make interactive maps we looked mainly at the T maps package but this is just an eruption and I've less some more resources to jittering and using st insect but yeah that draws conclusion to this talk I realize we're five minutes over now I'm not sure who is still present but thank you all for attending.