 Well it's gone 10 o'clock so maybe we should introduce ourselves. Hi everyone, my name is Nadia Kenner and I am a research associate with the UK Data Service. This is a workshop demonstrating how to map crime data in R. All materials today will be available via the GitHub link. These include the data sets that we will use. These will include how to download the data sets as well as the R Markdown files that include the code demonstration. There is also, I'd like to point you to a blog that I wrote for the UK Data Service. It just provides a really good kind of introduction to GIS and spatial data. I suggest kind of reading that after the workshop if you are interested in a real-world example. The workshop will follow this breakdown. We'll start with a quick presentation and introduce kind of some main topics in GIS and spatial data. The topics covered in this workshop are only kind of like introductory, but they will help to understand the code demo that is run in the second half. I mean it'd be impossible to cover every topic in GIS and spatial data in such a short workshop, but of course there are loads of readings online that can expand the content provided in these slides. The code demo and we're breaking down into three main topics. We'll start with topic one, which explores our crime data, more specifically how to turn non-spatial data into spatial data. Topic two will then introduce the use of shapefiles and how to join this to spatial data, and then topic three will then cover some topics around census data and the differences between crime rate and crime count. So let's get started with the workshop. What we know exactly is GIS. GIS or Graphical Information Systems are a theoretical framework that allow for the creation and analysis of spatial and geographical data. It can be viewed as quite an abstract platform that integrates data onto a map through various methods. GIS is present in virtually every field and every organisation, as it's a way to share information and to solve complex problems around the world. GIS benefits allow for trends and patterns to be studied visually. GIS has quite an interesting history. It started around the 1960s by a pioneer named Roger Thomanson. He was commissioned by the Canadian government to create a usable and efficient inventory of its natural resources. He tried various manual methods for overlaying environmental, cultural and economic variables, but all too costly or too timely. So he introduced the kind of first automated computing system and he was known as the father of GIS. From there we have researchers such as Laura and Jack Dungamond who developed the Environmental Systems Research Institute, also known as ESRI, which is a software developed for mapping and spatial analysis. There are various softwares available, including GeoDAR, ArcGIS, FME, QGIS and R. All of these softwares kind of allow for spatial data analysis, geo-visualisation, spatial autocorrelation and spatial modelling. This workshop obviously will be using R and its IDE R studios due to the increasing amount of packages that have become available for spatial analysis and mapping. So GIS are kind of known to produce two broad types of maps. These are known as a reference and thematic maps. Reference maps tend to highlight natural patterns or synthetic features, including the positioning and heights of mountains or the layout of bus routes. This type of map is simply referencing what exists in our physical environment. On the other hand, we have theometric maps, which are used to highlight a spatial relationship. Theometric mapping is how we map a particular theme to a geographic area. It tells us a story about a place and is commonly used to map subjects such as climate issues, population densities or health issues. I like to refer to reference maps to pinpoint data onto a map and they can be quite descriptive, whereas theometric maps tend to study a theme within a map and they tend to be more explanatory. Even still, the differences between reference and thematic maps can become confusing. So I've created a few example scenarios to question whether they fall within the description of reference or thematic maps. So if you'd like to head over to Mentimeter, you can participate in the quiz and we can kind of discuss some different scenarios that draw on reference or thematic map examples. I think Saoirse will pop the code for Mentimeter into the chat. It's also at the top of the screen. Yes, I'll put it in the chat now. Perfect. I would just give you a minute to head over to Mentimeter. Pop in the code at the top of the screen and yeah, we'll get going with the quiz. I'll just give you a few seconds to do that. All right, hopefully we're all there and have Mentimeter up and going. So let's start with scenario one. The visualization of road networks to improve road safety measures are a type of, is the quiz available? Sorry. Perfect. Thanks for that, Julia. It looks like the pie chart has stopped moving. So let's go ahead and kind of discuss the results that we see here. At first glance, this might be said to be a type of reference map as we are pinpointing existing road networks onto a map. However, this might also be said to be a type of thematic map as we are studying the existing road networks to then improve the road safety measures, which can be seen as a type of accident analysis. One example, for example, might be the introduction of speed signs or zebra crossings outside schools or residential areas due to reports of accidents or dangerous driving. So what we're doing here is studying spatially the relationship between road networks and accidents. And then we are mapping a particular theme to a geographic area. So this kind of thematic example does include the aspect of a reference map. But because we are using it to study spatially a theme, we can say that this is a type of thematic map. We'll now move on to scenario two. The presentation of the Earth surface showing its elevation is a type of reference thematic or not sure. Pop in your answers. Again, we'll give it 30 seconds to a minute just to let these answers roll through. Results are still coming in, but it looks like the majority of people, 79% think that this is an example of a reference map. So we have 74 now. Results are shifting. Results are shifting. We'll just give this another 10 seconds to see if anyone's got any last minute votes. Alright, so we have 74% of votes coming in on reference maps, 13% are thematic and 13% I'm not sure, 75 now. But the majority definitely sit in that reference map. The visualization of the Earth surface showing its elevation can be seen to be a reference map. We have a type of topographic map, which tends to refer to the graphical representation of the three dimensional configuration of the surface of the Earth. In short, it is simply describing where the Earth surface is elevated. These maps are normally represented by contour lines. I'm pretty sure you might have seen them if you've ever been on any type of walk or any type of dov activity, you would have seen these. However, you might also see this as a type of reference map. Research has suggested that studying a topographic map is a great way to learn about how to match terrain features with the contour lines on a map, such as the steepness of the terrain, the shape of the terrain or whether above or below sea level. So in this instance, you might interpret that we are studying spatially the relationship of contour lines to different features of the Earth. So if you answered any of the options, then you can consider yourself to be correct. We're going to head over to the last scenario, which is whether we see navigation tools such as Google Maps or city mapper to be classed as dramatic reference. Or again, there's an option for not sure. Give you 30 seconds to a minute on this one again. It looks like most of our votes are in and it seems that 52% or roughly 50% of people think that this example is a type of thematic map. We also have 38% people who think this is reference and 30% who are not too sure. And that's okay. It's absolutely okay not to be sure about these things. Because when I first read the scenario, I immediately thought that this was an obvious type of reference map because it highlights important features, physical features needed for travel such as bus routes, walking routes, cycle lanes. And typically you might even have options for cable taxis. However, reference maps might portray a basic set of features such as coastlines, terrain or transport routes. But can we say that an app that plans your travel is a type of reference map. If it uses algorithms to get you from one place to another, normally choosing you the fastest route or the cheapest route. Can we call this a type of thematic map instead since it's overlaying information on a bus map. On a base map, sorry. Interestingly, researchers consider navigation tools to be fundamentally different from both reference and thematic maps, which kind of opens debate for a third category. It could be argued that all maps are navigational depending on how you use them. The difference is that like a digital map, one that is specifically interactive. This links us back to the, sorry. So like one main difference between references thematic maps is that thematic maps tend to be more interactive, which is why navigation tools might be considered to be a type of thematic maps. Either way, the answer is open to debate and the difference between the two kind of depends on whether we are mapping places or whether we are mapping data. So we can answer these questions by looking at a more detailed real life example. And this is the example of tube maps. Some people might class tube maps to be reference maps because they show the location of different tube stations and the location of each tube line. However, recent research has showed that thematic that tube maps can also be thematic because they can be used to predict life expectancy poverty and medium house prices. This was a study called lives on the line. Most governmental statistics are mapped according to official geographical units such as wards or lower layer super output areas. And while such units are really important and essential for data analysis and making decisions about like governmental spending they are hard for people to relate to and they don't particularly stand out on a map. This is why they tried a new method to show life expectancy statistics in a fresh light by mapping them on the London tubes on the London tube grounds. So what can we sum up? Although maps fall broadly into two categories of reference of thematics, there are ways in which these maps can overlap or share similarities. Almost every thematic map could be considered to be a reference map, but not every reference map could be considered to be thematic. The decision is up to you. It's not entirely necessary to define these in your work, but it is important to know what type of map you want to make as these can be affected by the data you have. So we just kind of pause for a minute. But if you can think of any kind of maps that you think share quantities of both thematic and reference maps is an opportunity for you to just kind of note down some answers. We're also just going to have a little two minute break here so feel free to get up and stretch your legs or grab a glass of water but. One example is just just pop them in Mentimeter again. Wow, so we've got really good examples coming through at the moment we have bus maps, world maps, crime data, overlaid on maps, walking routes in the highlands. COVID death cases visualise on world maps. They keep coming through there's so many this is this is good to hear. I'm not going to go through each example but one which draws importance to this workshop is the crime data overlaid on maps. One of the things on these scenarios is just to kind of get you thinking about how reference and thematic maps can be overlaid and this is exactly what we cover in our crime demo. It seems the answers are still coming in so I'm just going to give it another like 30 seconds just to kind of get these opinions flowing heat maps is also a really good example. They definitely highlight. A base map, as well as additional data overlaid on top of these which is in this example using homicide rates. Really good example there. Time zone maps also really good example. Again they include some sort of base map but they have overlaying additional information that tells us something spatially about a theme fear of crime also really good. Similar to kind of the crime data overlaid on maps. Yeah, we're going to move on to the next topic now, which is spatial data. What exactly is spatial data. In short, it is a representation of the real world. Spatial data or also known as geospatial data is a data frame that contains information about a specific location, which can then be analyzed to better understand that location. And GIS enables this spatial data to be processed and analyzed. So there's quite a strong relationship between GIS is this kind of like abstract platform, and then spatial data as more of the upfront statistical tool. There are typically two types of spatial data vector data and raster data. I'm not going to cover raster data too much but just anyone who is interested. It typically refers to imagery or satellite data that are formed from a grid of pixels. On the other hand is much more common and consists of points, lines and polygons. Effective data is typically used in criminology and social sciences. This is what you would normally expect to use. So we can describe points to mean a pair of coordinates. This might be an x y coordinates or it might be a more than an Eastern coordinates. The example of a point data might be the location of where a robbery report had happened. We then have lines which extend the points and include two or more points. This might be for example, the street that that robbery was reported on. And then we have polygons that extend the lines, which tend to refer to three or more points. And this might be the area, the city or the wall that that street belongs in. There is a strong relationship between points, lines and polygons. I think it's important to consider these three features as integrated and rather than seeing them as separate because you can study points within lines and you can study lines of in polygons. You can also study points within polygons. So yeah, this is kind of like the main structure of spatial data. In this workshop will specifically be looking at point data and polygon data through the use of shape files. I will get on to that and it will later on. So now we have a basic understanding of spatial data and GIS. This is where we ask, how do we actually pinpoint a location to a map. So we use our projection methods. Map projection map projection sorry try to portray the surface of the earth or a portion of the earth on a flat piece of paper or computer screen. In layman's terms map projections try to transform the earth from its 3D to its 2D. It's important to note that during these projection methods the data can become distorted affecting the area shape distance and direction of points. Although there are algorithms in place to kind of control for this all features of all features can never really be preserved so just keep that in mind. There are three different types of projection families known as cylindrical, conical and planar. And within each projection family there are hundreds to thousands of different types of projections. They can get pretty confusing and they're not completely necessary to know for this workshop but Here is a quick example of how different projection methods can kind of differ our perceptions of what we see. In this example I'm showing two different projection methods. On the left we see the Mercator projection method. And on the right we see the Gal Peters projection methods. The Mercator projection method is something that we are very familiar with. This is how we see our world. The Mercator uses something called angular conformity which isn't necessary to know whereas the Gal Peters projection uses something called cylindrical map projection. The Peters projection is unique among world maps because the area ratios of all continents are the same as they are in reality. That is Greenland doesn't seem larger than Africa. Whereas on the Mercator map Greenland is larger than Africa. So the Peters projection, excuse me, sorry. On the Peters projection the continents of Africa and Asia appear quite large while usually inflated polar regions such as like Canada and maybe even Greenland shrink back to their proper sizes. The Mercator projection grossly distorts the size of the continents but stays true to their shapes. So geographically speaking the shapes are more important. It's far easier to change the scale of a map for different areas of the world than to adjust for the length with ratio as one needs to do with the Peters projection. In addition the Mercator projection only distorts the longitudinal distance whereas the Peters like kind of messes up the scale almost everywhere from both longitudinal and latitude. And this is why the Mercators beats out the Peters in the world of cartography and why like apps such as Google Maps tend to use the Mercator projection. So yeah, this is kind of just important as it shows us how different map projections can portray different perceptions of countries. We'll be discussing projection methods in the crime demo and we'll be showing how to run some of these projection methods. So how do we actually move from the 3D to the 2D? Well this is done with the help of our CRS also known as coordinate reference systems where every place on earth is specified by three main numbers known as our coordinates. There are two main coordinate reference systems are important to know these are the geographic coordinate reference systems and the projected coordinate reference systems. The geographic coordinate system is a reference framework that kind of defines the locations of features on a model of earth. It's shaped like a globe and this and this spherical whereas the projected coordinate system is flat and it contains a, it can get quite confusing but the projected coordinate system contains the graphical coordinate system. But it converts the graphical coordinate system into a flat surface using different types of projection algorithms and parameters. The decision of which map projection and coordinate reference system is kind of up to you and it depends on the regional extent of the area you want to work in, as well as the analysis you want to do, and often the availability of data that you have. When working with more than one form of spatial data, it's important to ensure that the data is stored in the same CRS or they will fail to line up with the GIS. I realize that there's quite a lot of acronyms going on right now so if you have any questions we can take a minute to pause and we can discuss them in the chat if you need some further clarification. I'd also kind of just like to draw attention to the differences between the projection methods itself and the projected coordinate systems. They are two different things and they tend to be overlap but they should be kept separate. The projection systems itself is just a mathematical algorithm, whereas the PCS or the projected coordinate system signify how specific round earth models are projected onto a map. So these can be quite difficult to understand that different but just know that one is just a mathematical algorithm. We've discussed kind of some of the challenges with map projections and coordinate reference systems, but we can let's tie this discussion back to crime data itself and discuss some of the challenges of mapping crime data. In our workshop we'll be using open police recorded statistics, but these can be criticized on a few grounds. Firstly, police recorded crime provides point information through the use of GIS. However, the accuracy of spatial data is obscured by geo masking techniques that serve to protect the location privacy of victims. They never provide the location of where an exact crime was reported. I think there are methods called jittering, which are used to kind of shake the exact location. So what we think might have happened outside of the school might not have actually happened there, but these are just used to protect the privacy of victims. Secondly, police recorded crime statistics unknown to contribute to the gray figure of crime, in that they underestimate the actual number of crimes recorded and not just reported which reduces the accuracy of statistical models due to a large amount of missing data. This isn't something that can be overcome in like mapping techniques, but is something that definitely needs to be considered if you are using open police recorded statistics. Sadly, there are some conceptual issues surrounding the definition of crime types in police recorded statistics. For example, the police tend to combine violent offenses with sexual offenses and these are grouped under one category. These should be viewed with caution in analysis as it applies quite an overtly like holistic definition by conceptualizing these indifferent crimes into one category, in that not all violent crimes are sexual and not all sexual crime are violent. But unfortunately, there's no way to kind of distinguish between these two. The police recorded crime statistics fell to include demographic variables, which is why a lot of data and research that does use crime data. You need to have some knowledge about how to join this other data, whether this is census data or administrative data that can provide more information. And the kind of like last limitation are the impacts of seasonality. With police recorded crime statistics, we are limited to monthly and yearly data, which means that temporal analysis is only limited to monthly or yearly trends. Additionally, we have to consider effects or large events that kind of break everyday routine, one being the effect of COVID-19. Over the pandemic, researchers identified a reduction in some criminal activity as a result of increased government restriction and lockdown rules. One example being that there was a huge reduction in burglary as more people have been forced to work from home, reducing opportunities to commit these crimes. So it's not entirely accurate to hold year to year comparisons over the pandemic, and this is something you have to consider when examining crime data. We're going to take a little break here, but I'm going to give you guys the opportunity to kind of discuss any more challenges that you think may arise when dealing with crime data. So if you'd like, you can head over back to Mentimeter and just kind of put any thoughts in that you think might arise when dealing with crime data. But we're going to call this a five minute break as well. Looks like we have some great examples of comfort. We can discuss a few of these. The first being that encodes existing systemic discrimination. That is a great point and links back to kind of that gray figure of crime because why is there such a huge gap between reported and recorded statistics? Well, Graba and Stan, they did this study, I think it was in 2018 that highlighted that to call the police is a privilege of being white. And this is kind of like the willingness to call the police is affected by demographics and affected by certain parts of the population. This is something just to consider because not all populations are there for likely to call the police. So reports of crime raise questions about reliability from different sample sizes. On that, yeah, we don't have, we missed the wider context. That's very true. Football seasons probably impacts a prevalence location and timing of certain crimes quite a lot. Yeah, that's a very good point. We would definitely expect to kind of see more antisocial behavior arise during football games. But these might not be noted or recorded in certain studies. Crime data is partly a product of police activity rather than a measure of criminal activity. That's a great point. I'm just having to read through these. Nadia, we also got a question in the chat. Alex asks, does the available crime location data include information on what proportion of recorded crimes in that area is missing for spatial data. I think like, do the reports for an area, do they tell you when the exact location is missing or is that just left as sort of missing values or not recorded? I think there's a left as missing values and crime data. We do explore some of this in the workshop. There are a few missing values dealt with when we group our crimes by NSOA. I'll have a look into that a little bit later on as well. But I think they are just marked as missing values, yeah. If we'll detect where especially street crimes occur, that's very true. And this is because, again, I guess because of the geo masking and geo privacy issues, we can never... It's hard because we can never really trust exactly where these crimes happen. We kind of just have to trust that there is some accuracy involved in that. The balance of data collection processes based on uneven police actions toward different districts. Very, very true. I mean, in Greater Manchester, I think it was in 2019, we lost, was it six to eight months of data due to a fault in the system? We have this huge gap in Manchester where we have no idea what levels of crime happened. I think in the crime data with level of various depth preference in the areas, yeah, that's true. A lot of studies tend to link administrative data to levels of deprivation, but crime statistics in itself don't kind of include these. So these things about actually crime data are really quite limited unless you are joining these other data sets, especially census data sets would provide a lot of benefits to studying deprivation levels as well as like disparities among our society. Maybe some people might question the use of quantitative methods as quite insensitive to crime statistics where some people might favor the use of crime surveys to provide more individualistic experiences of crime. However, of course, large quantitative studies have proved effective in both the prediction of crime rates and the introduction slash amendment of new policy. I'm just wary of the time but these are really, really good points and definitely things that you should be considering when using open recorded police statistics. So we've just kind of finished up the presentation now on on GIS and special data and I hope we have been able to provide a brief introduction to the main topics. I understand I haven't been able to cover everything but as I said that would be incredibly hard to do in such a short demonstration, but we're going to just have a 10 minute break while we allow you to prepare your studio space for the code demo. So if you follow the GitHub link there is a step by step guide on how to set up your working directory and install the packages required. You can also just kind of clone over the repo from GitHub which would then mean it'd be easier to follow along the code that we run. The option is up to you but we'll give you a call it 10 minutes to have a break stretch your legs and we'll come back and get going with the code demo. So we'll see you back in 10 minutes. All right, so hopefully everyone has got their studio up and running and you can see the same scripts that you see in front of me. If not that's okay you can just type the code as we go but it might take you a little bit longer and I don't want anyone to fall too far behind. Yeah, this is topic one we'll be exploring our crime data specifically how to turn non spatial data into spatial data. These are the packages required. Hopefully these are all installed and loaded into your library. And we will just get started with the crime level. We have about I think it's an hour and a half to run through these topics so it'll be roughly 20 to 25 minutes per topic. So the crime data was downloaded from data dot police dot UK. I've selected them on August 2020 to August 2021 specifically from the area of sorry. There's information in this word document about how to download this data if you're interested in case you want to use this in the future. But yeah just download that data unzip it and load that into your directory. The folders are also available up here. We'll only be looking at one month of data throughout this this workshop but I thought I'd include a year just in case you wanted to do your own analysis. Yeah, let's first, let's first download the data. I've just called this crime. You can call it whatever you'd like. I'm also using the clean names function under the janitor package which just helps to make all variables lowercase and just a bit need to fill the data analysis so let's quickly just explore our, our variables. As you can see we have the crime ID which isn't really relevant to us. We then have the time date variable. As I said we only have a month and years, which is why temporal analysis can be really limited year to year comparison isn't entirely recommended when it comes to crime data. We then have where it was reported by where it falls within. We then have the longitude on that should our location, LSO codes, LSO names and the crime type and also the last outcome category. So, typically LSOs have an average population of 1000 and they're used to improve the reporting of small area statistics in England and Wales. As I mentioned before LSOs aren't that recognizable to non statistics minds or like non geographers. So it can be quite hard to kind of share these results to to non researchers. But they are really useful when it comes to mapping administrative data which is what we would be doing with with our census data. We can also look at the different features of spatial data in this example. Point data is represented by our longitude and latitude. Our location variable represents the line, and this is normally defined by like a street or junction and as you can see we have on a street St George's Court. And then we have the LSO name which kind of represents our polygon. You can also use borough awards districts but in this instance we are using LSOs. So let's just briefly explore our data set. Currently this is a non spatial data set. Even though that we have longitude and latitude variables, R doesn't know that this is a spatial data set. So currently this is this is still non spatial, but we're going to explore the data set as if it was non spatial at first. So let's take a look by using the functions ggmap and ggplot. ggmap likes to build its plots in a function called qmplot which is the ggplot equivalent of qplot. The basic idea driving ggmap is to take a downloaded map image, plot it as a context layer using ggplot and then plot additional content layers of data statistics or models on top of the map. ggmap, downloading a map as an image and formatting the image of the plotting is done with the getmap function. It's kind of considered like a wrapper function. Arguably qmaps is known as the quicker function but it's less accurate for plotting spatial data. It's pretty much as useful to build maps that are similar to ones we see via our apps such as like Google Maps. So first I just get an overview of the crimes on my map using qmplot. What we first do is call the longitude last two variables, call on our data set, and in this instance we're coloring our crime type. We also set some aesthetic features. So if we run this qmplot code, we'll see quite a basic image of our point data in, sorry, scroll down and we have this. Arguably not the neatest kind of map but we can see the different types of crime types are available in our area, sorry. You then might be interested in studying like one specific area. And this example I'm specifically looking at Crawley 002B. This is just a random area from the crime data set. In order to get a, the location, like information of an area, sorry, you can use the geocode function. Geocode simply identifies the longitude last you know an area. It is the process of determining geographic coordinates for place names, street addresses and codes. It uses Google's like geocoding API to turn addresses from text to latitude and longitude and then pairs them quite nicely. So if we run this geocode function on our area of Crawley 002B, we get our longitude and last two variables. We can then use the get map function to specifically plot our area of Crawley. And what we have to do is create a new object that defines the longitude and last two for the areas, which I've done in line 83. But you might be wondering why the longitude and last shoes I have provided are different to the longitude and last shoes provided from the geocode function. And this is because of our germ asking and your privacy issues. We see that the coordinates have been shifted just a little bit so they're different to what Google's API is picked up. So these are again just like the issues you really need to consider when looking at spatial data. Nevertheless, we're going to just use this as an example. So we create a new object called Crawley and identify longitude and last shoes. We then use the get map function and set this to a zoom of 13. And that will kind of just load up. Sorry, I should open the console over. And then we can plot this map using juju maps with the new object map that we have just created. So we run juju map on the new object. We now have kind of an aerial image of our area of Crawley 002B. Obviously, this is not providing any information about our crime data set. This is simply just an aerial image of Crawley. In order to include the point data from our crime set crime data set. We can use the geome underscore point function to call specific points from our crime data set. And I've done this just here. So again, we use juju maps. All in our map that we've created. You then use the geome underscore point and address the aesthetics which are longitude and last shoe variables. And we then call on the data set. And then we run this will have the same image we've just seen but with point data represented in Crawley. We can also. Nadia. Yes, I just interrupt you a lot of people following along are having trouble with getting the API key. They're trying to run the code and it says error Google now requires an API key. Okay. I didn't mention policies. Yes, you would need to set a. API key. I already had my set from last year. I think you can just do this via. I'm just doing this. Yeah, that's fine. I mean, there's also some people who are having trouble getting the files off of git. I don't know if we want to just take a quick break to let people sort of download install all these functions and try and get the files off of git. Yes, I didn't realize I was running too fast. Yeah, it's okay. Just pause here and I'll have a look at the Google API right now. See if I can find some resources. If you hold on a moment, I'll just say something about how to download the files in git. When people go to the GitHub repo and I'm just sharing the link again so that people can can follow. If you go directly to that page, there's a big green button that says code and has a open smaller menu arrow. If you click on that big green button, it gives you the option to clone and that's using git open with GitHub desktop, but you have to have GitHub desktop installed or download zip. Download zip will give you all of the files from this repo in a local drive on your computer, at which point you can unzip it and use those files within R in a way that you normally would. So hopefully that helps everyone get the files from GitHub if they were struggling with that. If anyone else has questions about this still, please do let me know. So yeah, this is the GitHub link from the web. An easy way to kind of copy this whole repo is to click the green code button. You can simply just copy the HTTP address. And if we head back over to our R studios, I'm going to stop sharing today. Head back over to R studios. You can click file. New project. You can click version control. Get the get where you can clone your project from the get repo. And you can just simply post a URL in there. Little automatically copy the name. You can save this in a new in a in a folder in your computer anyway you'd like. This will automatically suggest opening in a new session so you don't have any clashes and then this will automatically just open up what you what I see in my screen. This is quite an easy way to go ahead and do that. Okay. Super. That's really good. Hopefully that's also given people a bit of time to get all of the install packages and sort of get libraries and things like that up and running. So great. Thank you so much. Can take a while to install packages. Thank you again. Confusion. Hopefully everyone is a little bit more on track but we're going to continue with the workshop but if you have any issues still please just let us know in the chat and one of us will help you out. But we're going to skip over the GGMAT because that was completely my fault for not remembering the API key. But as I said, this isn't the best method to go ahead and plot maps anyway so so don't worry about it. So yeah we're going to move on to looking at simple features and projection methods. A simple feature is a data frame that contains a collection of spatial objects. Each row is a spatial object that may have data associated with it. One of the features is compatible with duty plot. There was another order package named SP but within the last few years researchers and geographers are moving from SP to SF. The only main difference you need to know between SP and SF is that basically SP package didn't really work entirely with all of our data frame structures. The use of SF is much more beneficial. To recap, when working with spatial data you need to identify a CRS in order to move from that 3D image to the 2D. SF objects are as I said data frames that contain a collection of spatial objects. There are thousands of coordinate reference systems and typically when you are working with the longitude and latitude variables, which is what we have in our case, you would use what we call the world geodetic system which is online one to two. With each coordinate reference system you have to identify a unique ESPG which I think stands for European Petroleum Survey Group. I might have made that up but each code is just a four to five digit number which represents a particular CRS definition. So when do you use the CRS? Well, first it's worth considering when to transform the data. In some cases transforming to a projected CRS is essential such as when using like geometric functions such as ST underscore buffer which just allows you to subset really small data points. But yeah if your data contains longitude and latitude as I said then the world geodetic system is the one we would use. However, if your data contains more things than estons then you would use the projected coordinate systems. An example of the projected coordinate systems would be the British National Grid. But we don't need to worry about that now. So in order to check what CRS you have you can use the ST underscore CRS function. And you'll see that we have NA and that we do not have a coordinate reference system attached to our crime data set. So in order to attach a CRS we can use the ST underscore AS to SF function which turns our crime data set into a special features. The first thing you need to do is call on the data set and then you use the calls functions to call on the longitude and latitude. You then call on the CRS and in this instance the unique identifier is 4326 for the world geodetic system. You can also exclude or include NA's if you'd like but that's not a necessary part of the code. So to run this I'm creating a new object called SF just to make things a little bit clearer between our original crime data set and our newly created simple features object. So we run this and now we have a new object in our data set and you can see that we have one less variable. The first thing to always do is to check that the CRS has matched and has worked. So again we can use the ST underscore CRS function to do so. Now as you can see now we have an attached coordinate reference system. In this instance the world geodetic system. So let's have a brief look at our data set and see why we have one less variable. If we just use the head function you can check all the variables that are on your data set. So it's gone right to the end and we'll see a new column called geometry. You may also notice that the longitude and latitude columns have disappeared. This is because the longitude and latitudes have almost joined now and have turned into a geometry attribute that make up one variable. This is the main difference between spatial and non-spatial objects. So let's go ahead and map our point data with our newly found SF object which contains the point levels. In order to plot the point data we can use ggplot and we can use the additional function geomsf which is in line 161. geomsf is quite similar to functions such as hist or box plots on the ggplot but geomsf specifically works with spatial data. It's a unique aesthetic which basically expects to find a special features column containing the simple features data. This is our newly created variable that we had. So if we run this, the first ggplot we can see we have a plot of our crime data in sorry. We have the longitude and we have the latitude and we have the points. Obviously this map doesn't really tell us much. We don't have a base map or a reference map attached so it's really hard to know where this is. And additionally we have yet to specify different crime types. So we can go ahead and do that by kind of addressing the different aesthetics that the ggplot function has. So let's start by first colouring the different crime types. We can do this by applying an additional part to the code. We again call on ggplot. We then call on the geomsf function. We then call the dataset and then we use aes which is the aesthetics function to call the crime type by colour. So if we run this, you will see a... Hopefully there we go. You see a map that identifies the crime type by colour. Although this is better than the first map, this still doesn't really tell us much because we have no idea where this is right. This could be anywhere. So why don't we go ahead and attach the reference map which will help to identify those physical features of our area. And to attach a reference map, you would use the function called annotation map tile seen in line 169. You don't need to call any other additional features inside the brackets. So just simply add this code from the code and you'll see it might take a little longer just because we are using a reference map. And then we have a new map that is plotted over our area of sorry, as well as addressing the different crime types. Although this is better than our first map we created, it is still quite messy and we still have quite an overlay of crime types in certain areas, which makes it really hard to read this map. One kind of method to overcome this is to look specifically at certain crime types. So we can do this by sub-setting the data set and selecting only one one crime type. So let's just go ahead and do that. In this instance, I am sub-setting for the crime type antisocial behaviour. You can, um, yeah, you can follow along or you can use a different crime type that's totally up to you. In this instance, I create a new object called ASB. We sub-set our simple features object. We pull on the variable crime type and use a double equal sign to address which crime type we want to look at. I've also just removed some columns here because they weren't of interest to me. They're not necessary to use. So if we do that, we now have a new data set that just contains our antisocial behaviour crime. And then we can use the same gdplot code to plot the same type of map. In this instance, we obviously won't need to call on the aesthetics code because we are only looking at one crime type. But I still want to include a reference map. So let's just go ahead and plot the gdplot for antisocial behaviour. But now we have a map of antisocial behaviour just in, sorry, this code. I realise that we might have gone through quite a lot in this topic. So we're definitely going to call on a 10 minute break here where I'll give you an opportunity just to make sure that everything is running smoothly in that section. There is also a chance to run a little activity as you can see at the end of the script. So if you have a moment in this break, go ahead and have a go at sub-setting this for just the crime type blocks and then creating your own gdplot maps. I've used partial codes in this instance just to make this run a bit smoother. But we'll give you five to 10 minutes to have a go at filling out this activities and then we'll ask any questions that you may have. I don't want to rush anyone here, but if anyone has happened just to finish the activity, what I might suggest to you is that having a look at different CRS identifiers. Back in our, where we use the STF on the score function, you can have a look at what would happen if you change the CRS to something like the British National Grid. If you change the identifier to 27700 and create a plot, you'll see what kind of effect this has on your data and you'll be able to see that we don't have our longitude and latitudes presented. And this is because longitudes and latitudes work well, the well geodetic system. I can run this off the activity if people would like to see the example from myself, but yeah. Alright, it's been about five minutes, so I'm going to go ahead and answer these activity questions at the bottom. Feel free to stop me at any point if you have any questions. So the first step was to subset the data for those crimes, crime types recorded as drugs. So the first thing you would do is call on your variable, call on your variable crime type and select your, select crime type. In this instance, we are interested in drugs. I remember it's like it's a capital D, but we'll find out and see if that's right. So we subset this data set. You realize we get the data set in this code here. In order to create this into its own object, all you would have to do is use the assignment operator and replicate this code. So we just did that again, replicate code, is that your crime type, which is drugs, which is capital D. Remember, R is case sensitive. So if you use a lowercase D than this, you would get an empty data throw. So we run this and then we have 217 observations of our crime type drugs. So step three was to use gdplot to plot the point data over a base map. So let's do this to create the base map, use the function annotation map tile, which is here. Again, you don't need to include any information in the parentheses. And you call on your new data set, which in this instance is called. We run this duty block. Give it a minute, we now have a map of drugs in sorry, and you can see that this is much less than the antisocial behavior map. Just a little side note, if you'd like to present these maps side by side, you could use a function called plot underscore grid, which is in the package. So just install this library if you haven't got it installed already. Remember to use quotation marks when installing packages. You can load CalPot into your data frames. And then the function used to plot these grids side by side would look like this. Do you call on plot underscore grid. Then call on the data frames that I mean the maps that you want to plot side by side. In this instance, we're interested in our drugs and antisocial behavior. But before doing that, you need to make sure that these gdplots have been assigned to their own objects. So what you need to do is call this drunk underscore map, for example, and assign that to a gdplot. So this doesn't plot the map, this just creates an object. If you scroll up, you then have to do the same for your antisocial map antisocial behavior map, which is here. So again, we call all the new objects called ASP map. So we can assign an operator and assign the gdplot function to our ASP map. As you can see, these have been added to our data frames, but to our environment, but they're not data frames, they are lists. And then to plot these side by side, you simply call on the data set. So we've called these ASP map, drug map. And this should be able to plot the maps side by side. So this is just a nice little function to kind of like make the work of it needed, you know, hopefully this will load. There we are. Now we have both map side by side and you can see huge differences in the amount of antisocial behavior compared to drugs. You can also add other features if you like, you can title these by using the labels function. We call these ASP, I guess, might just make things a little bit clearer for you. This is the issue of plotting maps, making maps, everything takes so long. And then we have two maps with very boring titles, but these in help to identify the differences between ASP and drugs. So hopefully you've been able to run that activity yourself. The topic has highlighted how we can plot point level data using simple features objects. We've also looked at how to turn non spatial datasets into spatial datasets. And I mentioned that you might want to have a go at using a different coordinate reference system to see what would happen. I'll just quickly show this. I'm going to use a coordinate reference system to actually use the British national grid, which is 27700. I'm going to call this SF2 because I don't want to get rid of my original object. So now we have a new simple features object called SF2 with the coordinate reference system. So you can check the reference system again by using this function. And you can see we have assigned this to the British national grid. So what would happen if we plot this data seven? We'll go back to our very, very simple, simple map of our point data. Remember to change the object to the SF2 that we just created. And if we plot this, we'll see quite a messy map. We don't have any coordinates like we did before, which tells us that we're in the wrong coordinate reference system. So this is just how you can kind of tell which coordinate reference system would work for your data set. Yeah, that's topic one. I'm going to change this back just so we don't have any confusion. I might even just move object to because we're not using the British national grid anyway. So if anyone's got any questions, please ask away. Otherwise, I'm going to give it a few minutes and move on to topic two. We did have one question from Paul, who says the base map created by annotation underscore map underscore tile is quite blurry. Is there a way to make that clearer? That's a very good question. There probably is but I don't have the answer to it. That's fine. I mean, it's one of those things we can try and look up and answer, you know, at some point in the future. Yeah, I'm sure there is. I bet you could even change the like, I don't know, the opacity of it maybe. Yeah, I'll have a look if we have some time at the end. All right, so if everyone is happy, we can move on to topic two. Again, make sure these packages are loaded into a library and we'll get started on our shapefiles. So what is a shapefile? A shapefile is a common format in the GIS industry. It stores our vector data. That is our point lines and polygons. And it stores it as a single feature class, which means it will only store a single type in that it will only store. Point data. It will only store line data or only store polygon data. They represent, yeah, the excuse me, sorry. They contain multiple files and they're usable across multiple applications within GIS. If you open the shapefile dataset here on the bottom right in our files tab, you'll see that there are multiple files that are attached to our shapefile. We have the .dbf file, which is, it contains the attribute for each shape. You have the .prg file, which is just a file that contains information on projection format, including the coordinate reference system and projection information. You then have the .shx file, which contains the positional index of the feature geometry. And then you have the .shp file, which contains the geometry data. And this file contains the geometry for all the features. So this is the file we'll be using to, this is the shapefile that we will be using. To read in a shapefile, you use the st underscore read function within the sf package. So if we run this file in. We have this, this information here that is provided. And it's telling us that we're dealing with a multi polygon. A multi polygon is the same as a polygon. This just means that we are looking at multiple polygons, which are, would be like multiple NSOAs. That's that's all that means. To plot the shapefile. You can then use the ggplot function that we did earlier with the geome underscore sf function. So if we run this, we'll see a very, very like basic image of our shapefile. And what we see here is our shapefile of sorry, with each NSOA. These are boundary districts of each NSOA, as this is what I selected when using the open boundary data from the UK data service. Again, I've provided information on how to download this data set. Because it can be quite difficult to select what kind of level of like administrative what you want. But yeah, all information available in files. If we view this file, we'll see we have our NSOA names or NSOA codes. And again, our geometry, which identifies that this is a shapefile because we have combined longitude and latitude in one column. So let's move on to. Oh yeah, you can use the st geometry function kind of just to have a look at other features of the data set. It's pretty similar to reading it in so not necessary to know. But again, it's just telling you that's a polygon and it lists the first five geometries in our area. So let's move on to grouping the crimes per NSOA. The original crime data set contains the individual count of repeated crime types across NSOAs. NSOAs, sorry, therefore the NSOAs are repeated multiple times. This is because you would expect to see like multiple crime counts in one NSOA. So in order to highlight how many crimes have occurred in each area, we can count the crimes per NSOA and obtain group statistics. So this function here is quite a neat way just to do this all in one. What I'm doing is creating a new object called crimes grouped by NSOA. Arguably this isn't the most effective name for an object, but this was just to make this code a bit clearer. Then call on our original crime data set and we group these by the NSOA code. And then we summarize by the count. So if we run this code, we'll see a new data set here. We see that we have much less observations than the original crime data set. And this is telling us the crimes per NSOA. So we can again just use that head function to have a look at what we're dealing with. And as you can see, each NSOA is listed on the left. And within each NSOA, we have the crime count that has happened within that area. And this is the total crime type. We're not looking at a specific crime type at the moment. So now we have a separate data frame that contains our grouped crime statistics. We need a way to join this to our shape files. We can do this by using a function called left underscore join. And this function returns all the rows of the table on the left side of the join. Which is the rows for the table on the right side of the join. In layman's terms, all this is doing is matching the NSOA code from the shape file, which is identified by this variable and the NSOA code in the crimes group by NSOA data set to this variable. So if we run this code, I've also called this a new object called sorry NSOA. I keep saying that one. And now we have a new object called sorry NSOA. So let's just have a quick look at how our data set looks. So what we have is a simple features object with each NSOA and the crime count that has happened within. We can use geometry type by looking at the ST geometry type function. And this again is telling us that this is a multi polygon. And this is because we have this is because like each NSOA is a polygon and we're dealing with 725 and at this point. We can also then use the underscore ST underscore BB box function to obtain these values as like specific units. So I think these are, I think these might be like the average X and Y parameters for the longitude last shoes. We can then go ahead and plot this data using our Gigi plot again, similar to what we did before. We attach an annotation map tile which provides the base map. And then we also have the geom SF function using our new found a new created data set sorry underscore NSOA and we fill this by the crime count. This is our new variable that we created. And I'm also just setting their pastes is 0.05 so it's a bit clearer. If we run the student function. Just close off of it. There we are. And then we have a new map that highlights the number of times part and that's a way. And you can see that on the west side of sorry we seem to have higher numbers of crime which is quite interesting. We can also use a package called team apps to plot plot this sorry and that's why it's got pretty similar function to the Gigi plots and SF and the codes really easy to use. There are two modes that you can use with the team app package which is pretty cool. One is an interactive map and one is more of like a interactive map like this kind of map very very basic so we won't have that interactive element to do this we can use the view slash plot function. The view function identifies an interactive map and a plot function identifies what we see like here. So the first thing you always need to do with team apps is set type of mode you want. And this instance I'm just going to plot I'm going to set the mode to plot and then we'll have a look at what would happen if we changes to view. So you need to identify user team shape function to identify your shape file or your polygon or your line or point data. This is where you would identify that user team underscore film, which would fill in our variable of interest which is the crime and then we can use some aesthetics to identify the borders between our LSAs. I'm just going to call these make these greens that was a little bit clearer. And again these just set the thickness and the opacity. So if we run this code under the plot mode of team apps, we'll have this map here. And it's pretty similar to the gg maps plot but people have different preferences about which package they like to use both do very similar things so this is up to you. Let's have a look at what would happen if we use the view function. All you have to do is change change the mode to view, and you'll get this warning which is which is fine this is expected. And then you just run the same type of map. The same code might take a little bit longer because we're using interactive map but yeah. The interactive map of your phone count is sorry, and I'm sorry. And you can select each area if you want to have a look at that in detail, you can zoom in you can zoom out. Some people prefer this map, some people prefer the plot mode, but kind of really just depend what what we're using this map for you know. So let's take a two minute break while I go and get a drink but we're going to go move on to classification methods next. Alright, I'm back I'm hydrated and I'm ready to go so we're going to move on to looking at classification methods now and this kind of helps in order to better visualize count data. We can equally represent the population distribution at hand, but the team apps package allows you to alter the characteristics of these thematic maps via the styles function. When mapping quantitative data, such as crime counts. The variables need to put in need to be put into bins. I've seen in the previous example the default binning applied to the, to the LSOS group started from, I think it was one to 10, 11 to 20, 21 to 30, 31 to 40, 40, 150, and finally 51 to 60. And then they decided automatically, but we can define more accurate classes that best reflect the distributional character of the data. There are multiple different classification methods that you can use and different systems you can use. In this example, I specifically look at the came in the jinks and the standard deviation. The came in is kind of a method of like the vector quantization that aims to form a petition of like n observations and decay clusters in which each observation belongs to the cluster with the nearest mean. And then the next, also known as natural breaks or the goodness of fit variants aims to arrange a set of values into natural classes. And then the standard deviation method is a standardized measure of observations derived from the mean. I know there's quite a lot of statistics involved in these in these methods. It's a very basic rundown of these. But if you are interested, then you can use the help function to kind of get better idea of the statistics, and how these changes shape your data. But let's just have a quick look at how these make our data. How these make our maps look different. So start off by the K means. All we have done is use the same code as a bar for the T maps. But the only difference is we have identified the style function and use quotation marks to identify the different type of classification methods. In our first one, we're looking at came in. So let's have a quick look at this. And as you can see, our bins are much different to what we're in the original just like our data in the automatic one. Arguably there are more counts in the, in the, in the east of sorry but it's hard to say. Then also run. Same thing with drinks. I'm just kind of highlighting the differences between how the bins change here. Again, you can see the bins change here. We're up to 53. Max was automatic was up to 50 was up to 60 actual, which tells us that the automated classification system might be over predicting some of these kinds in areas. And let's have a quick look at standard deviation. Again, very different classification systems. And this is even a minus that's because this is standard deviation. But you can just see how each classification system changes the way you interpret a map. And that's why it's really important to consider using these different type of methods and not just to stick with the automated system. So the only way you can kind of map these together similar to how we did it with the cowplot is to assign each map to an object. So let's assign each map to its own object and we're just going to call these a b and c. That's what's called a. That is our k means. The next map is going to be called B. And then our standard deviation. You can see. And we can put these together by using the team up. Arrange function was obviously on the underneath the team up. Team up package. We'd have to do is identify. Those three new objects that we have just created. We have a B and C. It might take a while, hopefully not. Yeah, we'll just give us a minute while this loads up. That's a good example. The reason this has happened is because we are currently in the interactive viewing mode. Remember how we use the team up. The team up function. This needs to be set to plot in order to use the team up arrange. And then rerun these maps as out there run in the plot function. And now we should be able to plot these three maps using. Using the mode plot. There we go. That's much better. Now it was three maps side by side. This is in in the maximum and minimum crank out and how these various across LSOAs. So just really keep this in mind when you are looking at your. When you're looking at your own data and what the sessions these might have on your viewers. You can also identify. And categorical variables in your data set. Just like the team up's arranged function, the team up facets are a way to produce like side by side maps also known as small multiples. It's very similar to the facet grid function and duty, but if you've ever used that. I put some information here, if you want to read more about small multiples. But in this instance, we are using the team facets just to. Not these maps per LSO, which arguably isn't ideal, but because we haven't got a clear categorical variable, we're just using this for now. But if you had something like deprivation or or gender or household income, this might be a better representation for categorical variables. I just wanted to provide example of how you would do this using team facets. So free from this, our name variable by the way just refers to the name of each LSOA. But as you can see, it plots these really small multiples with each, each area in our area of interest. Probably not the easiest plot to read because we are using LSO names. But as I said, this is just an example in case you had other categorical variable in your data set that you're interested in showing. I realize we are pushed for time. So I'm going to skip over the additional features of team apps, but feel free to go through this yourself at the end. So activity two. I'm going to give you guys I think 10 minutes should be enough to kind of go through these activities and also an opportunity to ask any questions that you have. So please feel free to leave a message in the chat. But in this example, in this activity, sorry, I've kind of just given the option to explore different classification systems and how they can vary and what are the main differences. So yeah, just just have a go at these activities and then move on through them in 10 minutes. I just want to pipe up and say that the K means clustering option that you showed earlier counts as machine learning. So if you want to impress your friends at holiday parties, tell them how you used machine learning to, you know, map crime data. That is very good to me. I didn't know that myself. I just wanted to mention another comment that came up in the chat that I think is probably useful for everyone to know about is that you are allowed to publish to create maps and share them with people and publish them freely. If you're using open source data. If you're using securely obtained data data that is that is given to you under, you know, certain conditions or that you have to apply for access to get it through secure means. You are still allowed to create all the maps that you want and as part of your analysis, but that if you do want to share or publish those maps. You need to be very careful to make sure that they don't disclose anything inappropriate that they are meet the conditions under which you have access to the data. So just want to let people know that because there are, you know, issues around privacy and security when using mapping sort of analysis styles. You're still allowed to map securely obtained data, but that you should take care when sharing or publishing it. We have a comment in the chat. Alex says ST underscore CRS parentheses SF error object SF not found I'm guessing that just means the package hasn't been installed properly but Nadia if you have any other useful insight on that. Um, yeah, that might just be because you haven't installed a package or you haven't that SF object yet. One just reload that function that creates the SF object and then you should be able to run that function. Yeah, maybe just back up a couple of steps and rerun the code. Yeah, it was just a topic. Yeah, there was some problems people had following along and topic one so if we missed if you missed out a step in in topic one that topic to then refers to that could that could give you an error like that. Yeah, definitely. I understand that we've created a lot of objects. Typically this isn't recommended is to you might just want to like override each object but for clarity purposes. It's just easy to see the steps that we've made in this workshop. Yeah, this is a sort of a matter of style when to get used to working in our I quite like to create lots of different objects. But I'm generally working in a directory that isn't sort of limited for space or speed or something like that. If you're working on a machine that struggles with a lot of different objects or you're working in a space where, you know, too many very large objects is going to cause a problem then you might want to overwrite. And when it comes to mapping data, you'll realize that or does struggle a little bit, especially if you have large data set so overriding might might be the better option with mapping but I tend to create loads of little objects as well because it almost keeps track of what you're doing doesn't it. It does yeah I feel like there's a lot of. It's sort of a way of showing your work, which is quite useful for yourself if you get mixed up and decide to go back and redo something at an earlier stage you can kind of see Oh what have I done. But yeah, maps are quite large and large data sets create very large maps. Yeah, yeah. I mean it's easy enough to delete an object. It's just the DRM function so if you do find that there are too many objects in your way you can just use this function to to delete some of those in your environment. Okay. Just a quick follow up question to the object not found question was, could you perhaps show us which line of code creates the SF object. Or maybe mention it just sort of like it would be line 134 from topic one 134 from topic one are simple features. Okay, that's great. In this chunk here, where we identify the coordinate reference system. Okay, I've just popped that into the code so that it's easier to reference. Okay, thanks so much. Yeah, I think we'll go ahead and answer these activity questions. Hopefully everyone's had enough time to kind of explore the different type of classification methods. Okay, so let's let's get going to the first question was to create two different maps for the B class and h class classification systems. The, from what I remember the h class is a form of hierarchical clustering and the B class is a form of bag clustering. I'm not totally sure what these do I'm not a citizen of art but we can explore these maps visually to see the differences. The first method is to identify your object. We're using the sorry and so on. And here you need to identify your variable of interest which is the count data. And then we use the star function to identify the different classification systems. In this instance, I'll use the B class. Run that make sure that works. And then we do the same for the h class. So you apply your data set. You then address the variable of interest. And then again, all on your classification system. Just to let you know if you didn't use quotation marks these wouldn't work so just make sure that you always use quotation marks when calling on classification systems. And great so these two maps are worked. We can see the maps here at the bottom. And topic two I asked to create these into separate objects and then use a team up arrange, and we call these. So we run these, I probably should have called it be because we already had an object could be this is what I think something complicated but it would be right for now. And then we use the team up arrange function to plot these maps side. So you should receive just loading away. This image here. Which is great so now we have hierarchical clustering and the by clustering side by side. I'll create a lot not huge differences with what we see here but there are definitely darker areas in the north or sorry when we're using by clustering them when using hierarchical clustering. And for activities three I put an interactive mark using the be classification system. Again this was just explore the effects of like the different modes. So to put an interactive block used to you function. So we change that to you. And you know it's works when you have this the sign. And again, we're using the B plus you can just copy and paste the code from above but I was thinking it's useful to type out the code as much as possible because well due to poor and mapping and data is quite repetitive in a way so it never hurts to keep keep practicing these. And now we have interactive map on our sorry area using the back clustering. If you have any questions about those activities again please leave them in the chat. Otherwise I'm going to move on to topic three because I am aware of the time. Interactive maps are very cool though so thank you for showing us how to make that. Yeah, and the additional activities. So hopefully everyone's on track and we'll start with our topic three looking at the differences between crime rate and crime count. So as discussed before, count data is not entirely accurate population density. So whilst the like above maps might help us identify interesting patterns, point level open crime data is rarely used in isolation for detailed analysis. For one thing the data points, as I said, are germ loss. This means that the points are highly likely to be overlapped, giving this good picture of my distribution. There are ways around this such as through jittering or applying sense of space data. In this workshop we'll look at the NASA and apply some sense of space data. So yeah, I think it's useful to just kind of discuss what is crime rate. In short, crime rate is best understood in totality as like crimes per 1000 residents per the latest census. So using rate is a way to reduce statistical bias and reduce the effect of the modifiable aerial unit problem, which is a big issue in GIS and data science. So in this instance we're looking at population statistics obtained from infuse. So that is a lot of code going on, but this is just some data manipulation and nothing that is too complicated. The slice function we're just removing rows that are not of interest, selecting our rows, cleaning the names, and then we are renaming our workday residential population and our residential population because these are the two population types I have obtained from the infuse. There are multiple variables that you can use from infuse. These are just two that are easy enough to work with and provide quite a good example about the differences of using workday slash residential distribution. So it's known that like crime rates vary and in some populations and in some periods the prevalence of crime is much greater than in other populations and other times. So accounting for these findings is an enormous, like it's a really important task because if we understand the causal processes that underline variation then we might be in a position to enact like policy change that can bring about changes in the volume of crime society at any given point in time. So the residential and the workday population on one way to do this. The residential population reflects the usual activity of an area, whereas the workday population reflects people who work, those who are residents in the area, and those that either work from home or who do not work. So let's load this data set in and we'll have a quick look at what we're dealing with. As you can see we have our LSO codes, our labels, and we have two different population types, the workday and the residential. We also have a population density but we won't be using that today. So the first step is to join your population data set to our new shapefile. This is the shapefile that we created in topic two. So just make sure you have that made already. So this is the same left underscore join function to do this. What we are doing is matching the LSO code from sorry LSO to our population. LSO codes, LSOAs, keep sending them up. So yeah we'll just use that left join function to kind of merge the two data sets so that we have everything on the one data set. So this is our LSO code. You can see that I've also kept the name as kept the name the same. Just because, well, yeah. I don't know. I wanted to, you can change it if you'd like, but it might get, it might make things confusing. So you'll see that our workday count and our residential count has been added to our simple features object, which is which is pretty cool. Which means that we'd be able to work out the crime rate. The crime rate is calculated by dividing the total number of reported crimes by the total population and then multiplying by 100,000. So we take the count variable, we divide this by one of the population variables, whether that's the workday or the residential, and then we times by 1000. In this instance, we use 1000 as this is the average population of an NSOA, but if you're using a larger unit of analysis, then you might choose to multiply by 100,000. Just remember what effects will have on your rate and how this denotide puts across your results. So here we use the mutate function to create a new variable called crime rate. So we take our count dataset, we divide this by the population count from the workday, and we times by 1000. And this adds this crime rate into our newly, into our simple features object. So we run this. Again, we use the head function just to see what has happened. So here we have a new column called crime rate added to our simple features, which means that now we'll be able to use the previous skills from topic two and topic one to plot these trends and see the differences between crime rate and crime count. So let's start with the ggplot. The first step is to, again, pull on your base map, identify the aesthetic geome underscore SF, pull on your dataset, and fill this with the crime rate instead of the count. Before we're using the variable count, but yeah, just make sure you've changed that. If we run this, we'll see this plots exactly kind of what we made in topic two. But instead we have the crime rate. And we can do, you can see that obviously there is a lower proportion compared to our count. I think we're dealing with again 60, 60 as a maximum point count, but here we have 30. So there are huge differences in how crime counts tend to overpredict areas. So just just bear that in mind. We can then do the same thing with our team apps. Again, instead of filling with the crime count, we're filling in our crime rate. So you can just run that the same. I'm using the quantile classification system here, which I didn't introduce before and I did not introduce before, but this is just another classification system that you're welcome to use. Yeah, that plots are pretty nice. A nice map. Crime rate is much more effective than then crime counts are definitely considered that when you're mapping your own data. But let's move on to cartograms. The cartogram is a map in which the geometry of regions is distorted in order to convey the information of an alternate variable. The region area will be inflated or deflated according to its numeric value, and in all the cartogram package is the best way to build this. My cartograms like a type of map where different geographic areas are modified based on variables associated to those areas. So while cartograms can be like visually appealing they require previous knowledge, fortunately, which we have already discussed. So in order to create this cartogram, you first need to apply a weight dataset, a weight variable to your population count. In this instance, we're going to be looking at the workday population. So make sure you have an installed the cartogram function, we're not sure if I included that above, so apologies, but make sure that's installed and loaded. And once that is, you can use the classroom count function to create a continuous count of your population count of sorry, and so I'm just going to call this new data frame. You should see yes. This warning sign bottom there's nothing to worry about this is just all doing its maths. Right that stopped and now we can again just use duty plot to plot this car function. We use the same geo underscore SF aesthetic to do so. If we have a quick look, we'll see that we have this really like distorted image but this is what cost firms do this is the type of basically like continuous map that helps to understand the different geographic areas are modified based on our population count. So this is what we see here. We can also apply some aesthetics to this. So we'll first step is just to fill this in with our population count variable. And as you can see, now we have the population count variable and identifies areas with higher or lower population count. The next step is then to just just make that map a little bit nicer. Yeah, I didn't really explain what each of these do but they're not they're all available via like the help package so you can just see what those do. And yeah this leads us to activities three. So we have map the variable pop count work. But in this activity, you can replicate kind of the same same maps we have made but with the pop count residential variable, and then we can compare the differences between using work day and residential. The first step is to calculate your crime rate. And then we can start processing this so I'll give you a real as we're very short for time but I'll give you five minutes just to have a go at this. I might even cut that a bit short but yeah, in a few minutes and Yeah, I just want to point out lots of people if you're not used to working in our or coding or sort of using lines of code like this, you might feel it's cheating to copy paste and then just change the relevant variables it is not at all that is basically how coding works. So you can Absolutely copy and paste the lines of code above that are relevant and change the, the bits, you know the name of the file that you're creating or the, you know, variable that it's pointing to or something like that just change the bits you need to. That's a lot faster than trying to write it out and you're less likely to make mistakes. And that's exactly why I've included partial codes because they are very, very similar. And there's no need to completely memorize, you know, nine by nine, you can make your life easier and pretty short cuts. Okay, well hopefully you've had enough time just to run through these activities. I'm going to do this quite quickly but just because of time. So in this instance I want you to look at the variable clock count res which represents the residential population. The first step is to identify that variable. And we are dividing the count by the residential population. And then time saying 1000 as I see average size of LSOA. And then we can plot this using duty plot. So we call them our data set, which is just sorry LSOA. And we fill this with our new crime count or crime rate, sorry. You might have noticed I've called this crime rate to just so that we have both rates available in the data set. Again, you can override this but you just need to remember what is what. So now if we run that you should have a duty plot with your crime rate too. Here we are. Should look something like this. And I've also asked you to do the same with your with the team up solution. Again, you just call your variable here. Crime rate to and just using the quanta classification system. I just wanted to interrupt briefly. Another user is shared post codes.io website and post codes IO capital R is an R package either that website or the R package might be quite useful if you're moving between post codes and latitude and longitude. I didn't make any error because I didn't put in quotation mark. Simple things like that will throw you off but errors are really common in R so don't let that throw you off. And then I've just used the team up to range function for question four. So here I'm asking you to compare the crime rates. And then you would supply your name of the variables in quotation marks this time. And you would be the same. Crime rate to run both of these. And then we can put in our new objects, which are called the NF. And you can run that like so. And then our last function is the cost grounds but. Yeah, so we would use the. Sorry. Spend it right. Sorry. And you would fill this with your. Is that what's called the variable pop count res. And then we have a cost graph of what are the other residential population. You can see huge differences that residential population is much more densely populated than the work day. So this is just things to consider. And yeah, we haven't got much time left. So we'll end the demo here. Thanks everyone. Take care. Thank you. Bye everyone. Bye.