 Good afternoon everyone. It's gone a couple minutes past so we're going to make a slow start, I think there are a few more people still joining as the numbers are slowly creeping up, but welcome back to part two of mapping crime data in our live code demonstration. To reintroduce myself, my name is Nadia Kenner and I'm a research associate based at the UK Data Service with the Cathy Marsh Institute. We're also joined with Emma Green today who will be facilitating this workshop. So if you head back over to your GitHub, which you should be able to see now. I'm hoping you can see my browser page at the moment. Yeah, you should be looking at my browser page right now. I'm currently on GitHub. And this is the GitHub repository that stores all the information and all the material that are related to our crime mapping workshops. Currently there's been run four times. And we're going to be using the March 2023 folder. Now in this folder we have two sub folders. One is the code itself and one is the data. I'll take you through the data first. If you click this, if you open this folder, you'll see that we have our shapefiles, our census data and our crime data. All information about how I downloaded these specific data sets can be found in this Word document. But to save you guys time throughout this workshop, all data is already available. So we have our crime data sets. We have 12 months here. We have our census data set, which includes a residential population and workday population. And we also have our shapefiles. If we head back to the main folder, you see we also have our code. So this is where most of you guys will be working. We have respective our markdown files that work through section one to section three, which will be the bulk of this workshop. If you are unfamiliar with how to clone a repository on GitHub, there are interactive binder links available if you scroll a little bit down. But just to let you know, if you do end up using these binder links, it can run a little bit slow, especially if multiple people are using them at the same time. So be careful doing this all at the same time. But it basically means that you don't have to install and download our studios, but I'm assuming that most people have already installed and got that up and running. So let's head back to the main repo. In order to clone a repository in R, I'll just walk through this really briefly so that everyone can follow along. You're going to click that big green button and you want to copy the HTTPS link as seen here. If you copy that link and head over to your R studios, which is here. In order to clone this repository, you can head to file. Click new project. Give me a minute while that. Open so go. You then click version control because that is what GitHub is. It's like a cloud service that means that people can work on the same projects at the same time. Then you click get and you can paste that URL that we just copied and save this into a folder into somewhere in your computer. I would always suggest opening a new project in a new session just so it doesn't clash with anything else that you've got open already. But yeah, that is the that is how you can clone the repository on GitHub so that you have all the exact material that I have on your own computer. But I'm going to cancel this because I've already done this prior. So here we are in our studios. As you can see, we've got three main panels at the moment. We've got a console, which is this big screen on the left. I am just going to turn my camera off just because it's distracting me. But yeah, continue speaking. So, yeah, we've got three main panels. We've got our console to the left. We've got our files, plots, packages and help to the right. And this is where you will see that clone repository in your own computer. So you can tell that we're working in an R project from this file here, which is a dot our prog file. And what this basically means is that it sets your working directory for you. And a working directory is just a file path on your computer that sets the default location of any files that you need into R. As you can see, we've got the material from December from March or present here. We'll only be using the material in the March folder. So if you head to code and first thing you want to do is open up the preliminary tasks on markdown file. This file just contains some information about how to set your working directory, what a working directory is, and how to clone your repository from GitHub if you haven't done so already. To figure out what your working directory is, you can use a function called get wd with two brackets. I felt that wrong. In fact, you might not be able to see my screen. So I'm going to zoom in for you just a little bit. I hope that's a bit better. I stopped get wed not get wd. There we go. And this tells me where my working directory is. This tells me where all my files are stored. So the first thing you need to do is install any necessary packages onto your computer. To install a package, use the function install dot packages and you supply the package name in double quotations. On markdown, you can run a whole chunk by clicking this little green arrow here, and that will run everything that's in the lighter gray box. You can also run individual lines of code by simply highlighting or simply having your cursor there and clicking command enter or control enter. So if you haven't got any of these packages installed, please go ahead and do that by clicking command enter. I've already done. I've already got all of these packages installed, which means the next step is to then load your packages to load your packages. You need to use the library function and you supply the packages in parentheses without quotation marks. So I'm going to go ahead and load my necessary packages for this workshop. As you see, the code is running in the console. It's telling me that things are working. And we've got this orange message here. Typically, if you receive an orange message, it's just a warning or just information about R. So it's just telling me what packages have been attached and it's that simple. You can get rid of the messages by clicking the little arrow. That's all you really need to know for the prerequisites. And yeah, so I will take you through the three sections right now. So the first, the next file you want to open is section underscore one dot RMD. In this first section, I will show you how to read in the data, run some basic explanatory analysis and produce some point maps. So again, before doing anything in R, you always need to load your necessary packages. And I've already done this, but some people prefer to load the packages when like before they start to only when they need it really. But I tend to just load all the packages at the start because then I don't need to worry about doing it later. But yes, let's go ahead and download our crime data to show you what I'm working with. We'll be using crime data from the police dot UK. I have selected the data from 2022 across 12 months. Again, so if you head back to the data folder and the crime data, you'll see 12 different folders or subfolders, which then have a CSV file within each one. Now we're going to be working specifically with the, I believe, February 2022 folder. And in order to, yes, I'm going to show you how to read this data into R. Again, there's information about how I obtain this data from the police dot UK website. But yeah, you can read that in your own time. So the first step I'm doing is reading in the data. As you can see, I've created a new object called crime. I want my new data set to be called crime rather than 2022 dash oh two dash sorry dash street because that is a very long name and make sure work a little bit less reproducible. So it's much better to have smaller and shorter object name and then using the read underscore CSV function from the library package. And this will read in the data. You might also see this extra bit of code hanging off at the end. Now, this function here is called a pipe, and it basically means and then so it allows you to kind of like extend your code and add additional parts so it's all in one line. And all I'm doing here is using the clean names function from the janitor package to make all of our variables lowercase. So again, we're working with a bit. So let's go ahead and run both these lines. As you can see, we've got a new object in our environment folder in our environment panel, sorry. We have our crime data set with 6,881 observations and 12 variables. The observations are in reference to the rows and the variables are in reference to the columns. You can also see that there's some information here that's been pasted. But let's have a look at exploring our data set a little bit more. So we can use the head function, which is from the base package in R, which allows you to obtain the first five rows of the data. The first six rows, sorry. And yeah, once we've run this, you see that our data set has been provided below. And these are our variables. So we have our crime ID, where it was reported by, where it falls within, the latitude, the longitude, the LSOA code, the LSOA name. And we have the crime type, the last outcome category. And that is it. So I'm going to talk through a little bit about this data set and what exactly we're looking at here. If you remember from the first talk, I mentioned that there are three types of structures when looking at vector data. That is your points, lines and polygons. This data set happens to include all three points, lines and polygons. Our coordinate variables, that would be the longitude and the latitude. These represent our points. This is the point data because it is given the exact location of one thing. We then have our location variable, which represents the line. So this is normally defined, as you can see here, by a street or a junction or a road. And then we have the LSOA names, which represent our polygons. LSOA refers to the lower layer super-alpha areas, which are a unit of measure and census geography and a very common one. Other polygon data could be borough, wards, districts, cities, etc. You can also use the unique function, which lists, which creates an array of all the different possible values in the column. So as you can see, I've called on the crime data set. And I'm calling on a specific variable, which was the crime types. And it's now listing all the different crime types that we have. As you can see, we have 14. The numbers on the side just indicate what number crime that is in correspondence. So if you were to move this along and make the screen bigger and rerun that, you see the number shift. So yeah, we've got 14 different crime types. So before moving on to some of the more complicated spatial topics, let's create some frequency tables for each of the different crime types. So we can have a look at what kind of trends we have. Now we can use the table function to create a frequency table for each of the different crime types. Now by default, the table is sorted by category, but then we can use what is known as the order function to order the table by count. So let's break this down. We first start by creating this frequency table using the table function, again, calling on our crime data set and calling on our variable. And I'm assigning this to a new object called counts. So let's just run that line. If you looked in your environment panel, you can see that's now been added to values. What we're now going to do is order this so that we can order the table by the count. And I'm ordering this in a decreasing fashion. So let's go ahead and do this. In this instance, I've not created a new object. I'm overwriting the count in line 102 and you'll see why in a minute. And now we can print the first few instances to see what we have. We can use the print function to show this. So yes, as you can see, it's telling us the total amount of crimes for each crime count and giving us a rough table. So nice. Now we've summarized the table, all our crime counts. We can go ahead and plot this to view this a bit nicer, a bit better, right? And we can use this function called barplot, which is part of the base package in R. And if we run this, we'll get a very basic but useful table that shows the crime counts, the average crime counts. So that when you're a worker of crime day, I would always suggest running frequency tables and running basic descriptive analysis so that you can start to understand and see what type of date you have. For example, antisocial behavior and violence and sexual offenses have one of the higher crime types, but drug seems to be sitting quite low. So now you can start to question this and bear in mind we're only using the data from February 2022. So why in February are antisocial behaviors so much higher than drug counts? So we are now going to be moving on to looking at the simple features and projection methods which were discussed in the first webinar. Now we currently have a data frame, that's our crime, that is not spatial. Although the data frame contains latitude and longitude, our studios doesn't know that the data is spatial. So we need to tell R that this data is spatial, right? And we do this using simple features, which is a common R language. It's also known as SF and it allows you to handle and manipulate the units of analysis. So that's the points, the lines and the polygons that we looked at earlier. SF is also compatible with GGplot, which is a very common package for creating visualizations. There is another package named SP, but within the last few years SF has kind of taken over SP just because of its applications with GGplot. So to recap, when working with spatial data, you need to identify the coordinate reference system. And this allows us to move from that 3D image of Earth as a sphere to that 2D image of a map on the screen. And as mentioned, there are thousands of coordinate reference systems. The one that we mentioned in the talk was the WGS84, the World Geodetic System. And we use this reference system because our data contains longitude and latitude. If your dataset contained like North things and Eastings rather than coordinates, then you might be using the British National Grid instead. But because the World Geodetic System lines up with coordinates, that being longitude and latitude, we are able to use this. So to transform a non-spatial data frame to a spatial data frame, we can use a function called ST underscore as underscore SF. But just to quick recap, SF objects are just a data frame that are a collection of spatial objects where each row is a spatial object, that being each row represents the polygon. Each row represents a different area of salary. In order to check if your data frame has a coordinate reference system already attached, you can use the ST underscore CRS function from the SF package. So if we run this, we see that R is reading as NA. This means that R doesn't know that this is a spatial data frame. And when it comes to matching shapefiles, we won't be able to do this without an attached reference system. So yes, let's go ahead and transform our data frame. In the first instance, I'm calling on a new object called SF, and I'm using the assignment operator to transform using the ST underscore as underscore SF function. So you don't have to call on your data frame. In this case, it's the crime data frame. And you need to supply chords. So our chords in this instance are the longitude and latitude. I will just say now that R is case sensitive. So if you had longitude with a capital L and a capital like capital L rather than the way that our data frame is set out, this won't work. You need to make sure that these are spelt the same way and look the same way in your original data set. Now we call on the coordinate reference system. And this is where we use that WKID, that well-known identifier. This is the reference number that is associated with the World Geodetic System. And we also are setting NA as false. So if we run this, again, you will see that there's some change in your environment and we now have a new SF object. But what's really interesting here is that we've actually lost a variable. We've actually lost a column, right? We had 12 variables in our crime data set, but now we only have 11 variables in our spatial data in our spatial data frame. So what have we lost? Why is there one less variable? Well, let's have a quick look. Well, first let's double check that this definitely has an attached CRS. So again, we run that same line of code that we see here, but on the new data frame. And as you can see, we now have the ESPG, which is the well-known identifier, known as 4326, which indicates the World Geodetic System. So, Fab, that has now transformed. Let's run the head function to look at the first six rows of our data to see what's changed. As you can see, not much has happened here, but if we keep scrolling to the end, you will see we have a new column called geometry. And we no longer have our latitude and longitude columns. They have merged into this geometry column. And this indicates now that we have a simple features object that is able to handle spatial data. So we can move on to then mapping our point data. So we have this SF object which contains the point level data about crime and Surrey. We can create some basic point maps of these. We can use the ggplot package to do this. And we're specifically using the geom underscore SF function. This is simply a unique aesthetic and it expects to find a simple features column containing simple features data, which is what we have here SFC point, indicating what type of data we have. And if we then need to call on your data and ours is named SF. So if we run this line, we now have a point map of our crime data in Surrey. Now looking at this, you would have no idea that this is Surrey because this is given it on the grid rather than a map. So you can start to add different aesthetics and make this map a bit more readable. You could, for example, colour in the different crime types, right? Because this is just showing us a pile of loads of little dots of all the crimes that have happened and simply wax onto a map. But what happens if we colour the crime type? Will this make a big difference to our map? As you can see, we've now got some more colours on here, but it's still not that readable. We still don't really know where we are. So we can supply different aesthetics. In this case, you're using the aesthetics function and I'm using fill to tell GG plot that I want crime type to be my main variable and I want crime type to also be coloured. I'm also now going to add some titles and some subtitles and a caption just to make this map a little bit more readable. So let's see what happens when we run this code. There we go. Now we have a bit more of a map with a bit more idea of what we're looking at. But at the same time, we still don't have a base map. We still don't know where this is. So we can add our base map, our reference map to this to see what we are looking at. We can use the function annotation underscore map tile to supply that base map and we run the same line of coders before. So we call on our data and we supply the aesthetics telling GG plot we want our variable to be coloured by crime type. So with the addition of a base map, let's see what we're looking at. It always takes a couple seconds to load, but yes, there we go. Now we have a bit more of a map with a bit more of a reference showing the pure crime count of story in February 2022. But you know, as you can see, there's quite a lot happening on in this map. It's not entirely readable. We can just see that there's a lot of pink at the moment, which would indicate a lot of either violent crime, maybe Seth from the person and even violent crimes as well and vehicle crime. So we can start to explore certain crime types individually. Yeah, so you can explore individual crime types by sub-setting for certain crime types. So in this bit of code here, I've created a new object called ASB because I'm interested in looking at just the antisocial behavior. I'm using the subset function from the, I believe, the diplo package. Calling on us on our special features on our spatial data frame. And I'm calling on a variable crime type. But in this instance, I'm using a double equal sign to locate one specific crime of time known as antisocial behavior. I am then also removing columns one, nine and 10, just because they are not entirely relevant for the rest of the analysis. And these are just additional information, different attributes that we don't need to complete our analysis. And just to let you know again, if you were to spell antisocial behavior or write antisocial behavior below case A, I'll run this for you. And we highlight this. You see that we have zero rows. We have no information in this data frame. And that's because antisocial behavior in the crime data set is spelled with a capital A. I might be able to show you this over here. Here is the spelling of antisocial behavior you see. And a lot of these all start with a capital letter. So make sure you just remember that the janitor function simply cleans the names of the variable at the top. So let's change that back to a capital A rerun this rerun the head function to view the first six lines. Now you can see we now have our data. So this is our data frame that contains just the antisocial behavior reports. So again, let's go ahead and plot this. I'm going to be plotting this on a reference map using the annotation map tile. And again using that GM SF. And now we have a bit more of a nicer map that shows the levels of antisocial behavior alone, apart from all the other crimes. So yeah, this was a very quick introduction to how to explore your crime data and to plot some basic point maps. Now I've got an activity here for you which kind of mimics what we've just done. But I'm going to give you guys like five, five minutes to answer these questions and then I'll come back and we can discuss these together. They aren't incredibly difficult. All you need to do is fill in the spots that have the fill in spots that have these lines here and follow the steps. There are clues in the steps as well. So do your best to try and fill these in and then like three to five minutes I'll come back and fill these in together. Hello, I am back. I hope you've had a couple minutes to explore this activity yourself. And I'm going to go through the answers now. If you've got any questions, please use the Q&A chat again. But yes, steps one was to subset the data for those crimes recorded as drugs. For example, we did the same thing but for crime types recorded as antisocial behavior. So in order to subset a data frame, you first need to call on the data frame you're using, which was already supplied, which was SF. You then need to supply the variable name. In this instance, it's crime type. You then need to use the double equal sign to subset the specific crime type that you want. And in this case, we wanted, or I asked for, I asked you to get the ones for drugs. And bear in mind, if you type this with a low case D, you're going to have the same issue. So make sure that this was a capital D because this follows the same format as the data set. So let's go ahead and subset those. Brilliant. As you can see, scroll along. We've got our crime type drugs selected. I'm then going to put this into a new object called drugs to save that table that we've just seen. So the same thing, you type in crime type. You type in the specific crime type that you're interested in. And now we have a new data frame at the top called drugs. So that information is stored there. And the final step was to use ggplot to plot the point data over a base map. And if you remember correctly, the function for a base map is known as annotation. Map tile, which can be seen here with double bracket at the end. And you then need to provide the geome underscore SF function. All on your data frame. In this case, it's the new data frame called drugs. If we plot this, we now have a map for our drug count. And as you can see, there are far less points compared to the antisocial behavior, right? Which is very interesting to know. Not surprising, I guess, because antisocial behaviors obviously can be a little bit more common and, you know, easier to report. But that rounds off section one. So we're now going to move on to section two, where we start to look at shape files. Again, any comments or questions put them in the chat. I'll have a quick look before we move on to section two. So what are shape files? Shape files stored under the SF package that we just worked with. They are a common format in the GIS industry. It allows us to store our vector data. That's the points, the lines, the polygons. And it stores it into a single feature class, which means it'll store a single type. It won't mix the different feature types. So it'll only store point data. It'll only store a line shape file. And it'll only store a polygon shape file. It won't ever have, you won't ever find a shape file that has all three vector attributes. Now, a shape file contains different geometry features as listed here. These contain a .shp, a .shx, a .dbf, and a .prg. These aren't completely necessary to know. I'm going to just show you what these might look like. So if you head back over to your data folder and you click shape file, which is where the shape file is stored, you will see those different file extensions here. The one that we need is the .shp. This contains the geometry data. It's a 2D axis of your coordinate data. Things like the shx contains a positional index of the feature geometry, which isn't really necessary for us to know. And the .prg is an interesting file because this is where the cs and the projection information is stored. If you are interested in where to obtain shape files, this can be found on the UK data service website. I think it's the center support, the boundary data, that's where you can find that. Again, all information about how to install the different data sets were found in the downloading the data doc, which can be seen here. But off the first step, so I've obtained a shape file for Cheshire and the next step to read this in. In order to read in a shape file, unlike reading in a csv, if you remember from section one where we used the read underscore csv function here, this wouldn't work with a shape file because it has a different like a geometrical output, right? But st underscore read allows us to read in those .shp files. This is also from the SF package. Just to let you know, the double colon here is simply calling on the package. You don't have to have this at the start of a code to a function. But sometimes I like to just show you which function comes from which package. Yeah, we can go ahead and download that data set, reading that data set from the shape file package. So let's go ahead and do that. So as you can see, it says we're reading the layers from England, LSOA 2021. It's telling us that we are working with a multi polygon, which is a polygon in itself. And you can see that we have 55 features and four different fields. You can also use the head function to view the shape file, but it's just a little bit messier. You can see that we have our name of the area, the label, the LSOA name and code, which are just reference points. It's telling us that we're in a multi polygon. And then we have the geometry column, which indicates the like coordinates for those polygon data sets. So yes, as you can see, we have a map. Oh, sorry, jumped ahead a bit. The first thing to do is to plot this empty shape file for Surrey Heath to see what we're actually looking at. Because although we stay a frame here of the shape file, this isn't really telling us much, right? This is just a geometric data set that has the outline for each different area in Surrey Heath. It's outlining and drawing each of those little areas in Surrey, but this doesn't really look like much to us. So let's just get rid of that. And let's plot our shape file. We can use the same function, which is a geo underscore SF function to plot a shape file. So let's go ahead and do that. And yeah, here we have an empty shape file. That basically has drawn out the area of Surrey for us. So break this map down just a little bit. Each of these areas represent a different LSOA, a different lower layer super output area. That geographical unit that's typically used for mapping crime data. And our overall goal will be to map that crime data, that pin, the point data that we made in section one onto this map. So we have a bit more of a clear direction about where these different crimes fall within different areas in Surrey. So we can check what type of, we could check the like attributes of the file while running the class function. Again, this just tells us, yes, this is good. We've got our simple features object and it's just telling us that this is a data frame. And yeah, so in total the shape file consists, as I said, of five variables. First four indicates the information about the specific LSOA. And we can ignore the column label because this is just another reference point. We can also use attributes function by calling on the data set and calling on our variable geometry. Because I just want to draw some attention to the geometry column just a little bit more. So the geometry column can be split into two key sections. That's the feature and the geometry. The feature in this case is our polygon level. And it's referenced by a multi polygon, which is in fact a simple feature geometry list column. So it's the same thing we've been working in. It's just got a different name. And the geometries are the numbers that follow. So we can view the geometries, including, we can get like a list view of our geometries by using the ST underscore geometry. If you expand this, it'll tell us a little bit more about the data set. So yeah, now we have a really basic understanding of what a shapefile is. Currently, we've just got an empty one. And I've shown you how to also import shapefiles into R. The next step would be to run some data manipulation and create some new data frames that can work with the format of shapefiles. One interesting point or like, I guess, overall generalized comment about working with crime data is that a lot of the stages before the analysis involved data preprocessing. And this means organizing your data into a format that can work with geometry files, including shapefiles. So that's what we're going to do here so that we can work with the census data in section three. So the first step is to group the crimes per LSOA. Now the original crime data set contains the individual count of reported crime types across each LSOA. Therefore, these LSOAs are repeated multiple times, right? And we can see this if we just open up the crime data set here. And we scroll along. We'll see that we have multiple repeats of multiple LSOAs like this one here, for example. So our crimes aren't grouped to just one LSOA, we have individual counts here. But the aim is to group the crime counts for each known area. So let's go ahead and count the crimes per LSOA and obtain some group statistics. To do this, I've again created a new object called crimes grouped by LSOA. Very long name for a front object, but I wanted to kind of keep things a bit more simple. Typically, you also don't need to create new objects every time you run a line of code. But I just wanted to be able to store all the different versions of the data we're working with so that you can see the changes. But after calling on the new object, you then call on your data frame. In this instance, we're going to use the crime, which is the raw data frame that was obtained in section one. We use our pipe function and our group by function from the Diplo package to call on our LSOA code from this crime data frame. And we ask it to summarize this by the count. So let's go ahead and run this bit of code. You'll see that again, we have a new variable added to our environment. This time we have 744 observations and just two variables. Let's use the head function to have a look at what exactly is going on. So this gives us the first six rows. And as you can see, we now have those crimes at the total count of crimes grouped per area, per little shape that we've seen in the shape file. So now we have this new dataset in our shape file. Now what do we do? We need to somehow join them together so that we can pinpoint that data onto the map. So yes, in our new object, you'll see that we have the code and the total count. And we can join the shape file and the crimes grouped by LSOA. That's the aggregated data, the group statistics into one data frame. And we can do this by using a function called left join. And what it does is returns all the rows of tables on the left side of the join and matches the rows for the table on the right side of the join. And I'll try to explain this a little bit better. So what I've done here is created a new object, surprise, surprise, named Surrey LSOA. Because that's as clear as I can get, that's exactly what it is. We then use that left join function and we need to call on our two data frames that we need. One is that empty shape file, right? And the other is that crimes grouped by LSOA. And in order to match the data frames, I'm going to be using the variable LSOA 21 CD from the shape file data frame and the LSOA code from the crimes grouped by LSOA. So let me just try and explain that a little bit better by opening up the data frames. So the two data frames we're trying to merge is number one, the shape file. And here is the code, right? As you can see. And the second is the shape file. And here is the code. So all we're doing is, sorry, I meant this one. And here's the code for the crimes grouped. So we're literally joining this column by this column to obtain the group statistics in the shape file. So yeah, I hope that makes sense. So yeah, let's head back to the R marked down file. So yeah, let's, we've already looked at the dataset, but we'll run the head function again on the sorry. Did I not actually run the line? That would make sense. Let's run that first. There we go. Now we have a new data frame. Sorry, LSOA, which contains 55 observations and six variables. So let's have a quick look at what we're looking at there. In fact, I'm just going to view this in the browser just so that we can see how it's merged. And there we go. We've got that new crime count added to our shape file. So for each LSOA, we have the total amount of crime counts for that area. We can, some extra functions if you're interested. You can view the geometry type by typing in ST underscore geometry type. And this tells you that we're working with a multi polygon. And you can also obtain values, objects values as specific units by using the STBBOX function. And this just tells you how far away the average LSOA are from each other, I think, but not very relevant. So yeah, let's go ahead and map this. Now we've done all the mathy stuff. So we again use ggplot and gomsf. I'm also supplying a base map using annotation map tile, calling on our new data frame, the new shape file that contains the group statistics, and asking it to fill this in by count. I've kind of already shown how it looks there, but I'm also using the alpha function here. This basically just tells you how transparent you want your map to be, how see through it should be. And we also have the scale fill underscore gradient 2, this last function here. And this just creates a diverging color gradient from low to medium to high. So if you run this plot, it might take a little while just because we are working with a shape file. We now have a map with a graduated prime count of, sorry, sweep. We can also plot this via the T maps package. And this basically allows you to plot maps that are more interactive. There are two types of maps that T map modes allow. And these types of maps are known as a view map and a plot map. A plot map is simply a static map, a static map, one that we've already been working with. But I'm going to show you the interactive version. So if you type in T map mode and run this view, this sets the T map mode to interactive viewing. And you can use a very similar structure to ggplot to create these interactive maps. So you call them TM shape, which is calling on the polygon. And this is then followed by TM fill, which is telling us to supply the crime counts in each polygon. And I'm also making the borders different colors and setting some aesthetics. So let's have a quick look at how this would look as an interactive map. So yeah, there we go. We have this interactive map of Surrey that shows the average crime count across this area. You can see it's, you can zoom in, you can zoom out. Each LSOA is also available as you like hover over. So this is really good for, you know, presentations and whatnot. Okay, so that is shapefiles. I'm now going to show you how to make some classification maps, some classification algorithms, because sometimes you might want to better visualize those counts, right? Because count data does not equally represent the population distribution at hand. TMaps allows you to alter the characterizations of these these three matic maps by changing the different styles. So, well, so yes, when mapping quantitative data such as the crime counts, typically the variables need to put into bins. And I can show you exactly what bin is if I just reload this map that we opened. This is a bin here. It's telling us that the data is being binned into these categories from one to 20, 21 to 40, 40 on 60, 61 to 80. But we can change the way that these bins are set out by applying different classification algorithms to make the count a bit more reliable. There are loads of different classification methods that you can use, but for this example, I've simply used the k-means, the jenks, and the standard deviation. There is some information here about what each of these different classification methods do, but we're also going to be able to see how they make your maps look a bit different. So I'm going to just show you how to do this and kind of skip over that extra information, but please feel free to read that in your own time. So the first step I'm doing is creating a new data frame called A. I'm using the TMaps package here to create my maps. So I'm calling on TM underscore shape, calling on that data set, filling this with the crime count. But this time, a bit different to the map before, I'm supplying a style. This is a classification method. And if you go to the help function and type in TMaps, I mean, I could just do it. You should be able to find the full list of the styles that are available. So TM style, I believe, somewhere in here. But you'll be able to find the full list of all the different classification methods that you want to use. Maybe if we open TM fill and just see if this is present. Yes, here we go. So this is a method to process the color scale. And as you can see, this is all the list of different classification methods that are available. So yeah, I'm running three separate maps, calling these A, B and C, making a K-means map, a Jenks classification map, and a standard deviation map. So I'm going to run these three and create these at the same time. And then I'm going to change the viewing mode back to a plot. I want this to then be a static map because currently we're set on the viewing mode. Now it's told me it's back to plotting, which is great. And I'm going to use the TMapsArrange function, which are basically just known for small multiples. It allows you to plot multiple maps in one image. So if I run that, you'll be able to see our three different types of maps. So we have our K-means, our Jenks and our standard deviation. You can see how the bins look a bit different. We've got rates for one, and we've got higher numbers for another. And yeah, so you can also see how different LSOAs have higher and lower counts than before, right? So this like large LSOA seems much lower when we're running a standard deviation than compared to a Jenks. Now, obviously as a statistician or as a crime map or as a cartographer, you know the reason why this color is different. But if you were to show this map to, let's say, a non-statition compared to like the K-means map, what effect do you think that's going to have on their opinions, right? Because there are stronger colors present here. They might be interpreting, they might be obtaining different interpretations from these three different maps. So that's just something to think about when visualizing your data, as well as understanding the effects that these bins can have on your interpretations. Let me just see what time we're at, about two or five. I'll quickly just show you this map, but I won't really talk too much about it. But if you were interested in processing a map that had a categorical variable, you can use the TM facets function in the T-map package. So, as you can see in this bit of this line of code here, I've included the name variable from the sorry LSOA. Now, the name variable isn't the best example of a categorical variable. A categorical variable that might be a bit better, probably like demographic, right? Like gender or age or other descriptions or attributes. But just for the purpose of this example, I'm using the name function and you can see that this allows you to make multiple maps, next to each other, much similar to T-maps. So what this has done is broken up each of the separate LSOAs and plotted them as their own map. Not sure when this would ever necessarily be useful, but again, this was mainly used for categorical variables that are more relevant. I'm going to skip over these additional features, but you can feel free to run them yourself. In fact, I might as well run them. I just won't explain them. But I'm just showing you how you can make different styles. For example, this one, I've just included a function that helps those who are colorblind to view maps a bit better. And in this example, I've included how to include legends, so that's titles to your map. And in this example, I've included how you can have borders and compasses and scale bars. So I'll just quickly run that so you can see how better that visualization is. So yeah, it's got all these additional features. You can include a table on the graph. You can include a compass, the scale bar, and that is all here, as you can see. I realized I've talked for quite a while, so we're going to move on to activity two, where I'll give you, I think there's three questions again. So I'll give you five minutes to work through this activity and I'm going to pop down and get some more water and I'll be back and then discuss the answers with you. So again, any questions, comments, I'll answer them as soon as I'm back. We'll see you in a couple of minutes. Hello, everyone. I am back. Thanks for waiting patiently. I hope you had a minute to at least start to explore some of these activities. I'm going to start talking through it now. As I mentioned, well, Task 1 is asking you to explore the B class and H class classification methods and have a look at like what the main differences are. And I've showed you how you can use the help function in the right hand pane to do this, but you can also call on the, I believe this is, if you supply a double question mark, this will bring up, this will take you straight to the TM package as you've just seen here. So that shows another way to explore packages and get help, but the main difference between B class and H class. So H class is they generate the breaks using hierarchical clustering and B class generate their breaks using bag clustering. Again, I'm not going to talk too much about the difference between these, but yeah, let's go ahead and finish off the activity. So I've asked you to assign B class and H class classification maps into separate objects and call them H and B, and then plot them together using T map arrange. So let's do that now. The first step is to create a new object called H and you need to call on the data set that contains the polygon in the shape file. And in this case, it's the sorry, LFOA. We then need to fill this with the variable, the quantitative variable that we have and in this case it's the count of crime. And then we need to supply the right classification method. If you looked in the help files, you'll realize that it is spelled as B class. So that can go straight in there. We then going to create the one for the hierarchical clustering. Oh, sorry, let's call this H class because that is H. First one's hierarchical clustering, sorry. And the second one in my case is the bag clustering. So again, you supply your data frame that contains the polygons. You call on your variable that you need and you call on what type of classification you want. This time this will be B class. So let's go ahead and run both of these. As you can see, these have been added to my environment, sorry, my, yeah, environment panel at the top. And the second part to that question was to plot these using the team up arrange. This is really simple and you just put these side by side, run that bit of line. And because we're looking at two maps, it might take a bit longer, but it's done the maths. It's done what it needs to do. And now we've got a hierarchical clustering map and a bagged clustering map. The final exercise was to plot an interactive map using the B class classification classification method by changing the command in the T map underscore mode view. If you remember, there are two parts to team up mode. We've got the, sorry, we've got the point and we've got the view and we want to use the view to change us back to the interactive viewing mode. And we then type in our map that we've just created, which is so. And again, I think I said this for the B class, the quotations. And if we go ahead and run this, you'll now see that same map, but in an interactive viewing element, which is pretty cool. So yeah, that draws the end of shapefiles. So we can move on to the last and final section, which is how we incorporate our census data. Again, this can be found in the census data folder. So let me open up section three, the R marked out. I'm actually just going to close some of these, but yes. Here we have section three, which demonstrates the differences between crime rate and crime count. I'm just going to check the time. Yeah, no problem. We might run a little bit over an hour and a half, but I'll try and wrap this up as properly as I can. So what is crime rate, right? Crime rate is understood in totality as crimes per 1000 residents. And typically this is what like census would use users using the rate reduces statistical bias and reduces the effect of the modifiable aerial unit problem. And this is because it's known that crime rates vary, you know, in some populations and in some periods, the prevalence of crime is much greater than other populations under other time periods. So accounting for these findings is an enormously important task because if we understand the causal process that underlie variation, then we may be in a position to enact policy change that can then, you know, bring about changes in the volume of crime in society at any given point in time. So, yes, as per usual loads of packages, obviously if you're working in the same project, you won't need to do that again. But the first step would be to read in the census data. This data was downloaded from the UK data services website CCAM, which is a bit like an updated infuse. I'm not sure if anyone here has worked in infuse. Quite a difficult website to work with but CCAM is much updated and I was able to obtain residential and workday statistics quite easily for the area of sorry, before reading in. I think I should probably just explain the difference between the two data sets that we have as seen here. We've got a work count and our res count. The residential population simply reflects the usual activity of an area. That's the usual coming and going, the usual who lives there. Whereas the workday population reflects the people who work there, those who reside in the area and economic activity. There's obviously going to be differences in population flow when looking at workday residential, but we can use this to calculate our crime rate. So I'm going to be reading in just the residential population for now. I'm going to be using the read Excel function. Again, I'm going to be supplying the pipe function to include more code. I'm using the clean name to make everything lowercase and I'm also using the rename function from the Diplo package to rename some of the variables that were given. These variables that come second were the original names. And obviously these are just not readable, you know, they don't really mean much. But I know that looking at the code book that this was the LSOA. This variable here represented the name of the LSOA and this weird number variable represented the count. That's the residential count. So let's go ahead and load this in. Anyway, I haven't load a package. I think it's the read Excel package. What I've got to include that in the installed packages, but this is what you'll need to do that. So if you run install packages, there we go. That's now worked. And as you can see, it's added to our environment. So yeah, make sure to install this one. So I'm not sure if I've included that in the list of packages in the prerequisites. And yes. So let's have a look at the first six lines of our dataset. Again, we have our LSOA code and name. It's quite a tongue twister. So I apologize for you having to hear it so often, but we also have our residential count as you can, not that variable. This variable is our residential count tells you how many people reside in that certain area. There are also other attribute variables, but they're not completely necessary for us to know because we don't end up using them. The next step would be to join the dataset to our sorry, LSOA. So we currently have the shape file attached to the crimes groups by LSOA. And now we just want to attach that extra residential count to our sorry, LSOA. So adding yet another variable to this data set. So I'm calling on our main data set that we've been using and this is going to overwrite it by adding in that variable by joining the two datasets. So I call on the sorry dataset, I call on the census data set and I match these by the same variable output, which is the LSOA. So let's go ahead and run that. And we can view this. Again, a bit messy, so I'm just going to open this in view of you. And if we scroll along, scroll along, you can see that our census data has now been added to our shape file, which is really cool. So we've got that residential population count compared to our crime counts in this variable. So the next step would be to calculate the crime rate. It's really quite simple to calculate crime rate. We have to divide the, the census population statistics or in this count, in this case, it's the residential count. And yet you then times this by 1000. And I've done this by 1000 because this is the average population of an LSOA. If you are working with larger polygons, then this number might change to like 100. If you're working with smaller polygons and this number might change, might increase as well. But Google and then you'll know exactly what the average size of your polygon is. So let's calculate the crime rate and add this new variable to our dataset. Again, I'm calling on our main dataset that we've been using, which is the sorry LSOA. I'm overwriting this so that we can add the new variable rather than replace anything that exists. And I'm using the new take function from the Diplo package to create this new variable. The new variable I'm creating is called crime underscore rate. And I'm dividing the total crime count that was from our original crime dataset with the res count, which was the name of the census workday statistics and timesing this by 1000. And this will give us the crime rate. So we've run that run the head function again. You'll see we've got a new column here added called crime rate. And that's excellent because now we can do everything we've done with GG plot and T maps and plot our crime rate instead of using the crime count. Just looking at a comment, partly there's an error in one line. Let me just have a quick look. Sorry. Error in head res count, object res count. I might have, yeah, the object should be named residential count, not res count. So maybe I forgot to push some changes or correct that, but that should say residential count, not res count. If you try that, it should work. Give me a thumbs up or yes, if that's worked to you in the comments. But we will assume that that was the issue and just move on. So we've got our crime rate. The next step is to plot this to turn it back to section one where we first looked at GD plot. We're going to create a GD plot with a base map using the annotation map tile function. We're calling on our story LSOA and we're filling this with the crime rate instead of the crime count as we've been doing. I've also set the transparency to 0.05 and I've also added some gradient coloring. So let's have a look at what this looks like. So this is our crime rate map. It's a little bit more accurate than plotting simply the crime counts. And if you ever do work with crime rate, I would, if you ever do work with crime data, I would definitely suggest using crime rate over crime count. We can also use T maps to make this plot as well. So this involves the TM underscore shape function in the T map package. And I've decided to use a classification method known as quantile and also setting the borders to include some transparency. So let's have a quick look at what this would look like to. Oh, so I forgot that we were still in the interactive viewing mode, but yes, this is how it would look like. I've got a rate bin in the middle. We also got a missing data in white. And we've got that transparent gradient going along the key. I'm going to skip out the sections on cartograms and GG plots because I'm just looking at the time. But if you would like, you can obviously uncomment this out by removing these, these quotation marks. I mean, sorry, these hashtags. If you're on a Mac, I believe the shortcut for this is command shift C. Yeah. You can just uncomment all of that out and then run this in your own time. But the one in cartograms can take a while to load. So I'd be careful. But basically, it's just a map where the geometry of the regions is distorted in order to convey the information of an alternative variable. And this region will be inflated or deflated according to its numeric value. But yes, that draws inclusion to this talk or this section. Again, I've got some activities. I'll give you about five minutes to. I might give you, yeah, I'll give you five minutes to run through this. I basically ask you to kind of do some of that same exploration, but this time using the res count. I'm sorry, using the work count, which we haven't looked at yet. So I've supplied the code for how to read in the work count data. And I've then shown you how to join this work day count to your story. So again, we have that additional variable. So this is just adding another census variable onto our crime data. I will just say apologies for any noise to drilling has just started in the office so I will try to speak a little bit louder. Yeah, I'll quickly run these. Oh, interesting class doesn't exist. I think that's because if I remember correctly, I can get rid of that bit. So happens when you don't say that you're working directory right. There we go. And yes, this now has a red in the second census data set, which is the work day work day population. And again, using that left join function, I'm overwriting the sorry LSO by adding the second census statistics. So now if we look at this, we've got another variable named. This is a bit too messy. So I'm actually just going to open up the viewer again. And if we scroll along, scroll along, scroll along, scroll along. We now have that work count added as well as our res count. So yeah, spend the next five minutes just running through these steps and I will show you how to do this. And given the time, I might work through one extra activity in the additional section, which is a way to use the Google API. But yes, five minutes and then I will come back. And in that time, I'm also going to have a look at some of the questions in the Q&A. I've just got another question saying how to set the classification and color schemes on using ggplot. You can change the color by running the aesthetics. So this function here, and you would use coal equals crime rate. In fact, I think we did this in section one. So let me scroll down to where this was shown. Yes, here. So you can color the different crime types using aesthetics, coal equals crime type. As for adding classification systems to a ggplot, that's a very good question. I've only ever used classification with team apps, but I'm going to have a quick Google because it's always good to know. I'm sure it might be possible to add a certain type of classification. But then again, ggplot is mainly for, you know, like distinct visualization. So yes, I will, I'll get back to that one. But for now, let me finish off with activity three. Where are we? Scroll right to the end. So yeah, the first steps I asked was to, if you to calculate the crime rate and assign it a new variable named crime rate two. So again, we're calling on our sorry, LSOA overriding this using the assignment operator. And I've got this new variable called crime rate two. We are going to divide the count by the by the. Of long day by the work day variable, which I can't actually remember the name of even though I just is called work underscore count. There we go. Work underscore count. And we're going to divide this by 1000. Sweet, that's been added. So now we can go ahead and plot this using ggmap. We're going to call on our data set, which is sorry, underscore LSO. We're going to fill this with the crime type. Sorry, the crime rate. And yes, set the alpha to two, give it some gradient. Let's go ahead and plot this and see what we're looking at. Interesting that it's taking a little bit longer than the other map, which tells me I might have done something wrong, but unless it is just being a bit slow. Still running interesting. Oh, that has one. No, it hasn't. I'm not sure why that looks so incorrect. Let me have a quick look at this to make sure I've got the correct variable name. So it's definitely named work count. There are actually a few missing values, which is strange and hadn't happened before. So I wonder if the way I've joined the data set might be a bit incorrect times this by 1000. Yeah, that should. That should be correct. Maybe because we're using the work day variable, which is in theory, a much lower population than your average population. So we're going to obtain a lot of NAs. So the example that I had in here last year actually was for the official population count of the LSO a not the economic activity. So if you would like to try like this yourself, although it would just not going to get very nice maps. I would suggest obtaining the official population of an LSO a and that will give you the most accurate crime rate, rather than working with populations that are only obtained a certain demographics. But if you were to plot this using team apps, you would then just pull on that new variable, which was the crime rate to again to find the sort of content. I'm not going to run it because it took so long for Gigi plot to run that down road that it will take far too long. And then I've asked you to compare how this work day versus the residential population would look. Now, although this might not look correct because of the issue with the residential count, it's no way we can still show you what a comparison of two maps looks like and how to run this. So I'm calling on the two new variables which were crime rate and crime rate to I'm calling them the workday pop and the residential pop. So let's crime rate not found. Is that what we named the crime rate here. Yeah, crime underscore rate. Oh, that's not good is it strange. That's somehow. So the crime rate here. So it can. Oh, have I forgotten to supply quotation marks. I have a feeling that might be it. Yes, there is quotation marks make a big difference. I said this in section one didn't I, and I made the same mistake so you know it definitely does happen. But that's sweet. Now we've got two maps and we can use team up a range to them. But these probably take a minute, but that does draw a conclusion to this talk. So I guess thank you all for listening and taking part in the activities. If you have looking at the time. Quarter two, and there's still a few people here. So if you're interested just for the next couple minutes I'm going to talk through the Google API. I need this was going to happen because of the issue with the data set but hey how you can still use this code to supply in your own work and you know kind of use this as a outline or baseline for your own work. Yeah, so I'm going to place things up and open the last file which is the additional material, additional topics here. And in this I have some kind of more complicated and a bit more maybe like advanced topics, one being how to use Google API along with GD plot. I'm going to show you how to scroll down and then show you how to bin data. This code came from this GitHub account here, hence the reference so this wasn't written by myself, but it worked excellent excellently. And then show you a second way to make interactive maps and also some information about other important functions. Just a couple minutes while I guess as people start to leave and whatnot. I'm just, yeah, I'll just tell you about the Google API because I thought it was really interesting. The only issue with this is that you will need to follow about five or six steps to set up a Google API before you can even follow this code. You'll definitely do this in your own time while I explain, but the steps are all here. You just need to set an account and then make sure you have certain requirements and different API is enabled on your account. Everything you need to know about setting up can be found here. And then you need to register your Google API key with our studios following this code. That would be like the last final step. I'll just quickly show you what you can do with Google Maps. There's a function called QM plot which basically likes to build its plots. Like really quick like, it's like a duty plot equivalent basically, but the question is that it's not as quicker. It's kind of less accurate for like plotting spatial data. It's a little bit build maps that are like similar to the ones that we see on our phones see it every day. So if I just run this first line, this is connecting now. So what I've done is use Google's base map basically to supply our current count over a reference map. Again, this isn't very neat at all. There's too much happening in one error. We can't really see what's going on. But I just wanted to show you a quick way to build a map that was a bit different to how we did it in section one. But to go on to the Google API, Google API is pretty cool. You can use the geocode function to ask our studio to supply the longitude of latitude of an area. Now I've just chosen Crawley as a random example from the, as an example from the crime data set, just a random area. If I run this, it will give me the longitude and latitude of Crawley, which is pretty cool. It can also understand things that aren't necessary in an area. So I think if you type in like the White House, it would also give you the longitude and latitude for that, which I think is pretty cool. So yeah, that's how the Google API works by connecting to Google Maps. But we can also now let's just go back to Crawley because Crawley likes that, right? And what I've done with these coordinates is I've aggravated this into a data frame called Crawley. And I'm then using the get map function from Google API and some aesthetics to then plot this using Google Maps. And as you can see, it's got these like orange messages, but this means it's working and it is connected to the API. You can then use the ggmap function and I've supplied that new map I've created. And now we've got a Google Maps of Crawley, of the area of Crawley, which is pretty cool. And what we can do with that is then overlay that crime data on top of this. And to do this, you would simply call in the date, you would use ggmap again, but this time supply a geo point. And those points are going to be the longitude and latitude. So that's like the point of which every crime has taken place. So let's just quickly run that and you will see what I mean. And boom, now we have our point data plotted onto arguably a more accurate map if we're using Google. You can also then color the crime types. So we have something that's a bit more readable and we can also size the crime types. So you can change the aesthetics, you know, for the crimes that are more prominent, they might be bigger. And if you're interested in working with one specific crime type rather than this whole list of crime type, then you can follow along with this code here. But I am just going to run it all rather than explaining all the little bits and bobs again. The only thing that's different is I've substituted for just antisocial behavior. And if we scroll down, just give it a minute to load. So yeah, this is a map of just antisocial behavior and sorry, very similar to the ggplot map we made in section one, but this is just an alternative way to do that. And I've also included how to create a density map using ggmap, which for some people they prefer because it's easier to see the distribution of crime across the area. Yes, that is coming to the end of the talk I'd say I will close off there. I don't think I've done enough talking for today, but here is the references that were used to help build the code. The slide decks from today and tomorrow will be available on the GitHub page and we'll also send these slide decks out. I think a couple of days after the event. In part, I hope I've been able to provide you a good introduction to crime mapping and data in our long, long workshops. So make sure you stand up and stretch your legs after this. And thank you again, Emma Green for facilitating this workshop. And again, if you've got any questions about this workshop, I couldn't answer today. Feel free to drop me an email or come in and attend our computational social science drop in, which I believe is on the 14th of April. I couldn't have that date wrong, but you're a webpage to see what that'd be. Thanks guys. Thank you, thank you.