 Good morning everyone and welcome to this UK data service workshop named Mapping Crime Data and Introduction to GIS and Spatial Data. My name is Nadia Ken I'm a research associate at the UK data service based in the Cathy Marsh Institute at the University of Manchester. So this presentation will be treated as an introduction to the main topics and issues involved with GIS and Spatial Data. It's going to provide the necessary information for the code demonstration which is taking place on Friday which is the 16th of February also at 11 a.m. So let's jump into the content of this talk. Today we'll be discussing some of the major concepts and yeah we'll be discussing some of the major concepts with GIS and Spatial Data so we'll start by discussing what exactly is GIS. We'll look at the differences between Spatial Data and non-spatial data. Just turning my camera off because there's a bit of a lag. We'll then talk about different types of maps specifically the differences between reference and thematic maps. We then move on to discussing what projection methods and corner reference systems are and we'll have a very brief discussion about spatial relations or spatial statistics which is a bit more of an advanced topic in GIS and mapping data. And lastly I'll provide you some resources for crime data and hopefully given time we can discuss some of the challenges of mapping crime data as well. And for the second part of this workshop which is being held on the 16th of February these are the topics that we'll be running through in the code demonstration which will be done in R. I will talk a little bit more about the code demonstration towards the end of this talk. So what exactly is GIS? GIS stands for a graphical information system. This is simply a computer system for capturing, storing, checking and displaying data related to positions on Earth's surface. It can be viewed as quite an abstract platform that attempts to integrate data onto a map using various methods. It's present in virtually every field and every organization and it's simply a way to share information and solve complex problems around the world with this kind of like visual aspect. The biggest benefit is that it allows for trends and patterns to be studied visually, which provides a new form of analysis. GIS was coined by a pioneer named Roger Tomlinson in the 1960s and he was commissioned by the Canadian government to create a usable and efficient inventory of its natural resources. He tried various manual methods for overlaying environmental, cultural and economic variables, but all were too costly. So he helped to create what was known as the first automated computing system and this is how he got the term the father of GIS. And from there we've had researchers such as Laura and Jack Dangamond who you might be familiar with. They developed the Environmental System Research Institute, also known as the ESRI, which is a software developed for mapping and spatial analysis. Now it's important to note that most data sets you encounter in your lifetime can all be assigned a spatial location, whether on the Earth's surface or within some arbitrary coordinate system. So the question you need to ask yourself is does my data need to be analyzed in a GIS environment? Let's have a look at a quick example. Imagine you were interested here in question A. You were interested in identifying the 10 cities with the highest average income scores. Now if you look to the table to the left, which is just a simple table listing, you should be able to answer this question using this data set, right? It gives you the city and it gives you the score, the highest average income scores. And you can quite easily identify the top 10 cities. However, if you're interested in answering a question like B, which states whether you're interested in identifying whether these 10 cities are geographically clustered, then we kind of need more information, specifically about the shape of the country and the geographical location. So you might have something that looks like this instead, where we have the city, we also have the score, and we also have that spatial boundary column. Geographical information systems contain explicit geographic references, such as a latitude and longitude or a national grid coordinate. It can also include an implicit reference, such as an address, a postal code, a road name, a forest stand identifier, a census track name, and so on, so on. But all data in GIS is referenced. An automated process called geocoding is then used to create these explicit geographic references from implicit references. And these geographic references allow you to locate features such as, as I said, like businesses or a forest stand or events such as an earthquake on the Earth's surface for analysis. For crime data specifically, an attribute might include who or what it could be like where the call was received, or even just the type of crime, just kind of some sort of information about the crime type. There are various softwares available for running GIS. The most common are listed on the screen. We have GeoDAR, ArcGIS, FME, QGIS and R. ArcGIS and QGIS are highlighted because they're probably the most common softwares for GIS. However, this workshop will obviously be using R and its IDE R studio. And this is because there is an increasing amount of packages that have been available for spatial analysis and mapping. And over the last decade or so, R has definitely been increasing as an analytical tool for GIS and spatial data. But just out of interest, I would like to know what software have you or do you use for any kind of GIS mapping or spatial analysis? So please feel free to head back to the Mentimeter and just pop in your answers. This is me just being nosy. This isn't really in relation to the content of the talk. And this will also help to gather some information for the code demonstration as well. It looks like at the moment, the majority of users here use R. And we have a fair few, let's say other. For those who have said other, I don't think I've given you options to actually type that out, have you? But if you want to just pop that into the comments section or the Q&A, I'd be really interested just to hear what other softwares you are using. But we do have a very mixed range of experience, which is good. There's still a couple of votes coming through. So we'll just give it a couple seconds. Fabulous. Okay. So how exactly is GIS used? GIS as defined by Esri can be used for kind of six main use cases. It can be used to identify problems. It can be used to monitor change. It can be used to manage and respond to events. It can be used to perform forecasting to set priorities and understand trends. So if you've joined this workshop with one of these use cases in mind for a specific data set that you have in mind, then you've come to the right place. So let me just provide a quick crime example for one of these use cases. Here are a couple questions that you might need GIS to then answer for. So let's say you're interested in where the most vulnerable communities are located. Well, if individuals wanted to use crime maps to safeguard their personal safety, for example, by avoiding areas with a high level of street robberies, and clearly a geographical representation of crime would be appropriate. Maybe you're interested in why crimes occur in one area and not the other. There are several theories to help explain why crimes occur in some places and not the others. If you come from a criminological or sociological background, you might be wary of the routine activity theories, there'd be broken windows theories of social disorganization social disorganization theory, crime opportunity theory. And I believe the situational crime prevention theory, I'm sure I've missed a couple theories out there. But all these theories obviously conceptualize the question. But one theory in mind is the crime pattern theory, which tends to integrate crime within a geographic context, that demonstrates how the environment people live in and pass through influence criminality. So the theory specifically focuses on places and the lack of social control, or other measures of guardianship that are like informally needed to control crime. So for example, a suburban like neighborhood can become a hotspot for burglaries because some homes have inadequate protection and nobody home to then guard this property. Maybe you're interested in how offenders travel to the crime location. You know, in other words, what is it about one specific place that convents convinces an offender to commit a crime over an alternative location? There are a few reasons that affect how far an offender travels. The first, of course, being the offense type. And there was a bit of interesting research done by Ackerman and Ross Thimo in 2015, I believe, and they found that in general, violent crimes had a shorter medium distance to crimes than property crimes. There are also other factors such as the age of the offender, the gender of the offender, things like neighborhood affluence and target suitability, all of which can be then mapped visualized. And just the last question. So maybe you're interested in where there are more or less stop and searches than we would expect in relation to the distribution of crime. You know, so by comparing actual distribution of crime and the actual level of stop and search, we can start to question the necessity of stop and search. You can start to ask questions about whether crime, whether stop and search is a tool of crime reduction or crime detection. Or, you know, you can start to question police powers. And then you can start to map and visualize these trends. GIS are known to produce two broad types of maps. These are known as reference and thematic maps. Reference maps are used to communicate location or more static data points. These are used to pinpoint data onto a map and are inevitably very descriptive. On the other side, we have thematic maps, and these are used to highlight a spatial relationship. This is to study a theme within a map. And this is where it becomes a bit more explanatory. Here's a little example that shows what a reference and a thematic map might look like. Reference maps, the map on the left is highlighting natural patterns or synthetic features, which could include, you know, the positioning and heights of mountains or even the layout of bus routes or travel options. And this type of map is simply referencing what exists in our physical environment. The map on the left, on the other hand, doesn't look like a typical map, right? Here we see some sort of spatial relationship or spatial relation being studied, in this case, population density. And thematic map is how we map a particular theme to a geographic area. It tells us a story about a place, and it is commonly used to map subjects such as, yeah, like population densities, health issues, climate issues, understanding crime trends, and so on, so on. And sometimes the differences between references, thematic maps, can become confusing. So let's just have a look at a quick example, a real world example. So this is a project called Lives on the Line. This was a project done back in 2012, I believe. And they basically started to question the use of different aggregation statistics. They said that most government statistics are mapped according to like official geographical units, such as wards or what we call lower super output areas. And while these units are essential for data analysis and making decisions about data, it's very hard for many people to relate to because they don't particularly stand out on a map. I mean, if I was to ask you, which lower super output area do you live in? I don't think many people would be able to answer that question. And this is why they tried this new method in July to show life expectancy statistics and child poverty, I believe. Yes, child poverty in a new light. So they use tube stations as their unit of analysis to study this trend. So how would you class this tube map example, right? Would this be a reference or a thematic map? Because in one instance, you might think that they are reference, because they show the location of different tube stations and the location of each tube line. But we are doing more than just showing the physical features of the environment. In this instance, it could be seen to be a thematic map because it's being used to then predict life expectancy and poverty over the tube stations. There is a link to if you are interested in exploring that privilege yourself. The slides will be shared towards the end of this talk. But I want you guys to have a go at categorizing these scenarios into whether you think they are reference or thematic maps. So our first example is the visualization of road networks to improve road safety measures are a type of what? Feel free to head to the Mentimeter and pop in your votes and have a go at deciding whether you think that these maps would fall into either reference or thematic maps. I think I just closed the votes. So it should be open now. There we go. Got the majority of votes so far saying that they could be both sitting at around 70%. And we've got 24% saying reference maps and 76% saying thematic maps. The votes are going up for reference maps, which is really interesting. Okay, so you know, we can kind of break this scenario down into two aspects. At first glance, this definitely might be said to be a type of reference map because we are pinpointing existing road networks onto a map, right? We're just highlighting those physical features of Earth onto a map. So definitely can be seen as a reference map. However, this might also be said to be a type of thematic map as we are studying the existing road networks to then improve road safety measures, which can be seen as a type of accident analysis. So this could include maybe, you know, the introduction of speed signs or zebra crossings outside schools or residential areas due to reports of accidents or, you know, dangerous driving. So in this instance, we're studying spatially the relationship between the road networks and accidents, which are then being mapped to a geographic area. So if you've answered any of these, you're in theory, you are correct. They can be seen as both. Scenario two, the visualization of the Earth's surface showing its elevation is a type of what map is a reference map, a thematic map, or do you think those could also be both? Again, I'll just give 30 seconds to put some votes in and then we'll have a little discussion. Looks like the majority of votes have said that this is a type of reference map. And we've got about 15 people saying it's thematic and 10% saying they could be both. So let's just have a little talk about what this might be. Well, the visualization of the Earth's surface is also known as a topographic map, which specifically shows elevation. And these topographic topographic maps refer to a graphical representation of the three dimensional configuration of the surface of the Earth. In short, it's just simply describing where the Earth's surface is elevated. And these maps are normally represented by those like wiggly contour lines. So arguably, yes, this might be seen as a type of reference map. Research has suggested that studying studying a topographic map is a great way to learn how to match terrain features with the contour lines on the map. So in this instance, could we say that this is a thematic map if we go beyond studying those contour lines to match with other physical features of the environment? You know, you could, for example, study the steepness of the terrain, you could study whether this terrain is above or below sea level, you could study the shape of the terrain. So you know, in this instance, you might interpret that we are studying spatially the relationship between contour lines and those different features of the Earth. So if you answered any of these options, I would consider yourself to be correct. So give yourself a pat on the back. Either way, this answer is kind of open up to debate. This is the difference between mapping places and mapping data. And there's just one last scenario here, which I think is a really interesting discussion point. But navigation tools such as Google Maps, do you think that these can be classed as reference maps or thematic maps? We can think about other apps as like CityMapper and Waze as well. They're kind of all class into that navigational tool. But do you think that this would sit in the definitions of the reference map or a thematic map? Again, I'll just give this a couple seconds for the boats to flow through. We've got a fair split between could be both reference maps, which is really, really interesting. Numbers could keep jumping. I think people are catching on to my message here. Yeah, there's definitely a fair split between between those two answers. So let's just have a little talk about this because, yeah, immediately you might assume that this is an obvious type of reference map, right? It highlights the important physical features needed for travel. So this would include bus routes and walking routes and cycle lanes. Sorry, bus routes, walking routes, cycle lanes, roads, et cetera, et cetera. But yes, these reference map portray a basic set of features. But can we say that an app that plans your travel is a type of reference map? You know, it definitely uses some sort of AI algorithm to get you to one place to another, normally choosing you the fastest or the cheapest or even the safest route. So can we call this a thematic map instead since it's overlaying information onto this base map? And this brings up a really interesting debate from Roger Tomlinson, who looked at the differences between analog and digital maps. Researchers have considered navigational tools to be fundamentally different from both reference and thematic maps, which kind of opens debate for this third category. But it could be argued that all maps are navigational depending on how you use them, right? The difference is that a digital map, so one that is specifically interactive, whereas obviously an analog is not. But yeah, this links us back to the father of GIS, Roger Tomlinson, because he can overlay his data onto an analog map. So he moved towards computation. So this one main difference between the reference and thematic maps is that thematic maps tend to be a little bit more interactive, which is why navigational tools such as Google Maps, city mapper, Waze, Uber, et cetera, can be considered to be a type of thematic map. So just to sum up this section, in theory, all maps fall broadly into two categories. There are ways in which these types of maps overlap or share similarities in that almost every thematic map can be considered a reference map, but not every reference map is a thematic map. And this decision is kind of up to you. It's not entirely necessarily necessary to define these in your work, but it is important to know what type of map you want to make, as these can be affected by the type of data you have. So can you give any examples of any other types of maps that you think share qualities of both reference and thematic maps? Again, I'll just turn the answers on. I think I've given the option to put in multiple responses, so please feel free to drop in any suggestions that you have. Yeah, Google Maps is a great example. This is definitely a map that shares qualities of both reference and thematic maps because Google Earth allows you to do more than just view the physical environment of it. You can, it's overlaid, multiple forms of information so that you can study things more spatially. Maps with natural resource information. Yeah, this could definitely be considered to share qualities of both maps because in one instance, it would definitely be referencing parts of the physical environment. If it's overlaying this natural resource information on top, pretty similar to what Roger Tomlinson did in his original maps, then yeah, this could definitely be considered to be both other examples. Oh, we've got some more. There's deprivation maps and weather report maps. Yeah, great. Again, these two maps definitely show the qualities of both because we're attempting to study some sort of social phenomenon against the physical environment of Earth. These are some great examples. House price maps exactly the same thing. It kind of compares to that tube map example, so mapping specific house prices against certain areas would definitely be a thematic map because we're studying spatially the like difference between how high house prices might vary across the country, then that would definitely be a type of thematic map. Fabulous. Yeah, we'll move on from here. Oh, yeah, navigation maps have expected journey times delayed. Absolutely. That can definitely be considered a map to share both qualities. It has that kind of interactive element over it, which might just make it a little bit more thematic rather than reference. Education level and income. Absolutely. Another great example. We're mapping some sort of, yeah, that social phenomenon over the physical environment. So we're now going to move on to looking at what spatial data is and give some detailed examples. But in short, spatial data is just a representation of the real world. So every day, every example that we just spoke about in the previous slide can be considered to be a form of spatial data. It attempts to represent the physical features of the data in an accurate way and GIS enables this spatial data to be processed and analyzed. There are typically two types of spatial data. These are known as vector and roster data. Vector data. Contains points, lines and polygons. That should say points, lines and slash or polygons. They don't include, they don't have to include all three. Vector data is the most common form of, it's the most common form of spatial data and this consists of points, lines and polygons. So points can be seen as a pair of coordinates. So just giving some crime examples, this could be the location of where a missing person call was received. We then have lines which extends these points. This includes at least two points. So this could be, for example, the street that that missing person call was received on. We then have polygons which again extends the lines and includes three or more points. So this could be the area, the city, the ward that that street belongs in. So imagine it like you're zooming out in a map. You're starting from one point on a map and zooming out to look at the street and then zooming out to look at the area. And that is exactly what vector data is. Raster data on the other hand, referred to imagery or satellite data that are formed from a grid pixels. A roster map is basically quote unquote a dumb electronic map image that is made up of of various pixels. You can't manipulate the information as in you can't move a place name around. And when you zoom into the map, it can quickly become pixelated and unreadable. Now, rasters, however, are well suited for representing data that changes continuously across a landscape or surface. So they provide an effective tool, an effective method of storing the continuity as a surface, and they also provide a regularly spaced representation of surfaces. So this could be things like if you think of an example, temperature or elevation would be one way. Yeah, if you had a data set for temperature and elevation, this would probably be some sort of raster data where each pixel represents a different value actually within the overall range of values for that data. So in GIS, you will encounter many different GIS file formats within Vector and raster data. Some file formats are unique to specific GIS applications and others are a bit more universal. And for this course, we're going to focus on a subset of spatial data formats. These include shapefiles and geotifs. So geotif file formats are commonly used for raster data and shape file format are commonly used for vector data. We're not going to spend too long on geotif. In fact, we're not going to spend any time on geotif because that's not entirely that's not what we're working with. Climbed data is not continuous and Climbed data is normally represented in a vector data format, which is where we move on to shapefiles. Shapefiles is a file based data format. It is a feature class and it stores a collection of features that have the same geometry type. So this could be the points, the lines or the polygons. It shares the same attributes and it has a common spatial extent. Just a feature class, by the way, is simply a collection of geographic features that share the same geometry type. So this could be the points, the lines or the polygons. Despite what despite its name, it is typically typically called a single shapefile. It's actually composed of at least three files, which you can see on the screen. In this instance, I believe there are eight files, which is very common to see. So shapefile would just be this kind of like folder in your computer that has eight sub files within this. And each file within the shapefile has a specific role in defining the shapefile. The file extensions that are important to us are the .shp file, which is in reference to the feature geometry. So this is whether we're dealing with points, lines or polygon data. There is the .prj file system, which is in reference to the coordinate system information, which we'll be discussing in the second part of this workshop. And the .shx file, sorry, the .dvf file is the attribute information. So this would be like the all the extra information about the crime data set. So possibly the type of crime, if the the date variable was there, if the street name variable was there and other like textual facts for that crime data set. So when managing GIS files in R, specifically shapefiles, there are several key points and packages for handling this data effectively. It's essential to understand how to import, manipulate and visualise spatial data effectively. And here's just a very, very concise overview of what reading in a shapefile might look like into R. No worries if this makes absolutely no sense, because we're running through all of this in the code demonstration. But I just wanted to show you the main two packages used for reading in vector data into R, which is SF, the SF package and the RG DAO package. So, yeah, vector data provides us with a very precise way to represent spatial features. So that's the points, the lines and the polygons. But vector data, you know, it does excel in representing the geometry of spatial features, but it does introduce us to a very complex interplay between spatial accuracy and spatial analysis. This interplay is where we encounter the modifiable aerial unit problem, also known as MAUP, which is a very fundamental challenge in spatial analysis that affects our interpretations of spatial data. So the MAUP refers to the cartographic representation of data whose attributes are significantly influenced by the spatial scale used. So in other words, the same basic data yield different results when aggregated in different ways. And this applies where data is aggregated to like specific aerial units which could take many formats. So this could be post codes or street address or local government units and so on, so on. There are two key aspects into understanding the modifiable aerial unit problem, and this is the scaling effect and the zoning effect. The zoning effect can also or has also been referenced as the aggregation effect, so those words are similar unless if you've seen that anywhere else. But yeah, the scale effect. This is the result of changing the size of the spatial units that maybe from neighbourhoods to districts can alter the statistical results, such as the means or totals. So this occurs when the size of the spatial units used in an analysis affects the results. Generally, as the size of the spatial unit increase, the variability within each unit decreases, and this can lead to very different conclusions about patterns and relationships within the data at different scales. You know, for example, let's say we're analysing crime rates by neighbourhoods. This might show very different pattern compared to analysing crime rates by cities, right? You're going to have completely different interpretations of the results. The second aspect is the zoning aspect, slash the aggregation effect, and this is when altering the shape or configuration of a spatial unit, even if the scale remains constant, will impact the results. So this effect is observed when the way that the spatial units are aggregated, so how the boundaries are drawn, changes the outcome of the analysis. For instance, let's say that grouping neighbourhoods into districts in various ways can reveal or hide different spatial patterns of a phenomenon like poverty and disease. So I'm going to provide a quick example just to expand the understandings. I know it can be quite confusing, but here's just an image of what the scale effect and the zone effect might look like. This first image at the top demonstrates how changing the scale from small to large spatial units affects the perception of data variability. So, you know, it basically contrasts a detailed view of an area divided into many small areas against a simplified view where the area is represented at a much larger scale, and each large area here averages this variability of the tiny areas that it replaces. So yeah, this just visualizes how analysis outcomes can vary dramatically within that scale of observation. And this example at the bottom of the zone effect, this image illustrates the aggregation effects by showing a region first divided into several regular shaped areas, each representing a different like neighborhood and then aggregated into larger shaped areas. And each new area averages the characteristics of neighborhoods it combines, resulting in quite a generalized view that makes finer neighborhood level variations. You might be familiar with the zone effect in real life it can occur by accident or on purpose, but one example is that it can occur when a political party wants to favor an election, for example. You can basically manipulate any of the political boundaries below to like favor a certain side, and this is typically how one would rig an election. It's also known as gerrymandering. And so yeah, so this MAUP is just a phenomenon that reminds us of the importance of scale and zoning issues and spatial analysis. It occurs because the results of the analysis can vary significantly based on the size and shapes of the spatial units we use. Just as vector data allows us to like capture intricate details of the Earth's surface, it also exposes us to the complexities of choosing the quote unquote right scale for your analysis. So I've been talking for a while, so I'm going to head over to mentally to answer some questions about the zone and scale effect. This is just a true or false question. The scale effect in the MAUP refers to changes in the statistical results caused by the size of the geographical unit used in the analysis. Is this true or false? I think questions have now been opened, so feel free to take a vote. Do not worry about getting this right or wrong. I understand the MAUP is quite a complicated phenomenon. But if you think that this statement is true or if you think the statement is false, please put in your votes. But it looks like most people are voting true. There's a couple, I'm sure, and a couple for the false. But yes, if you answer true, congratulations, that would be correct. The scale effect refers to the changes in the statistical results caused by the size of the geographical units used in this analysis. So, yeah, using a final analysis will inevitably, sorry, if using a final analysis unit will inevitably lead to different analysis results. And this is because the scale effect causes variation in statistical results between different levels of aggregation. And what's about with the aggregation effect, also known as the zoning effect, which of the following best describes the zoning slash aggregation effect in the MAUP? I'll give just a couple minutes to answer this because there are a few options to choose from. We've got a couple votes in for option B and we've got one vote in for option D. We'll just let this roll for a couple seconds. Sweet, yeah, that's that's great, everyone. Thanks for taking part in that. I know this might seem like a bit of a tedious question to answer, but I think it's really important to rehash the definitions and the differences between the two because they can they can get confusing. But if you voted B, you can consider yourself to be correct. The zone effect is, yeah, so sorry, the aggregation effect, it describes how the arrangement and shape of spatial units can alter statistical results. So by changing the shape of specific boundaries will result in result changes in your statistical result. So I guess I can provide a bit of an example here, but like using data that's aggregated to let's say one mile grid sales will differ from analysis when you're using one mile Hexcom sales. And this zone effect is a problem because because it's it's part of the analysis, right? It's part of the aggregation scheme rather than the data itself. And this is a limitation that would have to be addressed in your research, depending on which kind of scale and zone that you try to use. So with all of this in mind, what do you think are some potential implications of ignoring this MAUP specifically related to analyzing crime data? Feel free to just type one word or sentences or feel free to not do anything if you don't want to. But yeah, what do you think the implications of just ignoring the MAUP? Yeah, of course, misrepresentation is a huge one. Type one or two areas. Yeah, that's that's that's a very good point. Wrong results, misleading unfair distribution, wrong spatial interpretation. This is fantastic. And I'm glad that people understanding what I'm trying to hone in at. You know, it doesn't mean that there's a right or wrong way to aggregate your data. The important thing is to note how you do this aggregation in your in your steps and explain why you have chosen to aggregate at that certain level that you have. But yeah, so ignoring this can obviously lead to interpretations of crime pattern and trends to be incorrect. You know, for example, what what appears to be a crime spot in an analysis at one scale might not be identified as such when the analysis is conducted at a different scale, right? And this could mislead law enforcements and policymakers about where the resources are most needed. And this impact on like policy development and evaluation is significantly important in crime data, because policies are aimed at reducing crime rates often depend on accurate spatial analysis. And if we're ignoring this MAUP is can lead to developments of policies based on like flawed interpretations of crime data, potentially resulting in like ineffective and misdirected crime prevention strategies. Just try to see if I can think of any other kind of potential implications. Obviously, there are ethical considerations here as well when ignoring the MAUP because this effect of policy will or actions can, you know, disproportionately affect certain communities. There are obviously research and academic implications as well because we need to then, you know, kind of question the scientific rigor as a crime researcher and the crime community to see how we can reduce how much we misguide our future studies and spatial analysis. But sweet, thank you for taking part in that. I believe we're at a break. We've just reached 50 minutes with just a little over halfway. So we're going to take a five minute break here and come back for the second half of the talk. Hello, I hope everyone's had a chance to stretch their legs and have a little drink. We're going to get started now just because we do have quite a bit of content to cover in this last half an hour. So yeah, let's get started. Yeah, now we now we have a very basic understanding of spatial data and kind of those components that we spoke about in the first half. This is where we ask yourself, how do we actually pinpoint a location to a map? And this is where we start to talk about projection methods. Projection methods and methods allow you to move from the 3D to the 2D. And I hope you can appreciate my hand drawn image. Map attractions try to portray the surface of the earth or a portion of the earth on a flat piece of paper or a computer screen. So yeah, in layman's terms, we're just trying to transform the earth's spherical shape, which is 3D into a planar shape, which is 2D. And I'd like to give the example of this of cutting up a football. So imagine imagine you had a football and you began cutting up with a knife. Now, if you had tried to recreate this back into a sphere, you would be able to do this pretty well. It fit quite nicely back up into a sphere, right? But if you try to recreate this into a rectangle or a square or a triangle, the the population wouldn't quite work in that it wouldn't fit together perfectly in order to reform into a rectangle, square or triangle. So these map projections are simply equations that tell a mapping system, which is the GIS, how to populate in new shape or area. So if we wanted to create a rectangle, we might have something like this where the whole area is populated. And this is how mapping projection systems through the use of a GIS are able to move a 3D image into the 2D. But with this comes a significant issue and this is known as distortion. During this process, you can have the misrepresentation of four things, which is the area, the shape, the distance and the direction of the point. There are algorithms in place to control for this, but it's rare that all four features are preserved correctly. So you can almost imagine this as like a like imagine a map projection a map projection as an attempt to reconstruct your face in two dimensions, right? So some maps will be able to get the shapes of all your features right, but not the sizes. So maybe your forehead and your chin, for instance, come out way bigger than they actually are. Other maps might be able to get the sizes right, but the shapes will be stretched out. So maybe your your nose comes out wider than it actually is. And yeah, some maps preserve distances. For example, maybe like the measurements from the tip of your nose to the chin could be different in a different map projection. And it all depends on which attributes you are willing to compromise. Some try to maintain the correct distance, others the shape, others the area and others the direction of points. But typically these algorithms tend to find some sort of sweet spot that will balance out all these factors. But this is why you see projections that are different for individual countries, regions, districts or states. And there are three main projection families. This is the cylindrical, the conical and the planner. We're not going to talk too much about these because within each projection family, you have about 100 to 1000 different types of projection methods that are available, which is why mapping and GIS can be can be so complicated. But I will provide an example. Just here for you to show how distortion can affect the interpretation of like of maps. In this example, we're looking at two maps. The map on the left uses a projection system, called the Web Makeda projection. And this is typically what we are familiar with. This is what Google Maps looks like to us. This is what the world map looks like to us. And on the right, we have the Galpitas projection, which is a different type of projection. But as you can see, this map is very distorted to what we normally what we normally know. The Makeda projection maintains angular conformity. It preserves the shapes but sorts the sizes, especially towards the poles. It so yeah, it has this Greenland appears larger than Africa effects, which you might have heard. And it tends to maintain accurate direction and shape but distorts the distances at high latitudes and the longer longitudinal distance are distorted, except very near to the poles. This Web Makeda map is used, as I said, in navigation issues in web mapping services like Google Maps and so on, so it's very common and what we see day to day. There are obviously some like cultural impact as well and historical implications of the Web Makeda. It can be criticized for exaggerating the size of the north. It has been said to historically, historically like imperialist countries at the expense of the equal equatorial regions, which in theory reflect and reinforce like this colonial bias. But there's so much history to the Web Makeda that won't even discuss. But on the other hand, we have this Galpeasus projection on the right, which aims to maintain area representation showing land masses in true proportion to their sizes on the Earth's surface. But what this does do, it means that the shapes and the direction are distorted, especially near the poles at low latitudes. So this provides a more call it equal representation of continent sizes. Correcting that size, the exaggeration of the northern lands near the seeming the Makeda. Typically, the Galpeas projection, it's not a very common projection if I'm not mistaken. In fact, I think it's used in educational purposes to show the exaggeration of distortion methods. Because, yeah, obviously it highlights this unequal distribution of land, right? It's emphasizing development and population density and resource allocation issues are all like kind of use cases for it. But, yeah, the choice between, you know, your projection depends on the specific needs of the analysis or application. And it's very crucial to be aware of potential distortions in any map projections and select the ones that best align with the intended use. So, yeah, understanding this distortion effects of these projections is key to selecting the most appropriate one for different applications and avoiding the reinforcement of biased world views. So, how do we... Oh, sorry, yeah, this was just an interactive project example. I'll just come out of the slides for a second. This is just an interactive map projection website that I found. I thought it was really cool and it allows you to explore between the different Maketa projections and three others as well. It's also got the Galpeas it's here. If you click the Galpeas as you can see how differently the visualizations on the size and the shapes and the areas are all distorted depending on which projection. But yeah, well, you can feel free to explore that in your own time. Get back to the slides. So, the question then is, how do we actually move from the 3D to the 2D? We've talked about these projection methods, but how do these projection methods actually work? What is involved in these projections? And this is where we talk about our coordinate reference systems. The coordinate reference systems exist in projection methods. The move from the 3D to the 2D is done with the help of a CRS where every place on Earth is specified by three numbers. So this is the lastitude, longitude and altitude. And each number indicates the distance between some it indicates the difference between a point that you're interested in and a fixed reference, also known as the origin. So there are two main coordinate reference systems. There is the geographic coordinate system and the projected coordinate system. The geographic coordinate system uses three coordinates, the longitude, lastitude and altitude. And this is simply a reference framework that defines the locations of features on a model of the Earth. So this is the where this is like where is the point on on Earth. And a GCS is it's uses an angular distance, usually degrees, and the most common use system in a geographic coordinate system is the World Geodetic System, 1984, which we'll be using in the code demonstration. So if your data contains longitude and latitude, typically the World Geodetic System is what you would most likely use. So yeah, a GCS just defines where the data is located on the Earth's surface. We then have the PCS or the projected coordinate system and this uses two coordinates. A PCS is is flat. It contains a geographic coordinate system, but it converts this into a flat surface by using the projection algorithms. In this case, the units are linear, most commonly in meters, as you can see. And yeah, this is just the how this a PCS tells the data how to draw onto a flat surface, whether it be a map or a computer screen. So let's just take a quick example of a coordinate reference system that you might see. So just lost my here. So the GCS, the corner, the geographic coordinate system, the bulk of code that we see at the bottom part of this table, this tells me where on Earth the data should be drawn. Right. And it provides a way to describe the location of the Earth's surface using a three dimensional spherical surface. Locations are defined by latitude, as I said, that this could be angular distance north or south of the equator and longitude. So this is the angular distance, which is east or west. And a geographic coordinate system is based on a ellipsoid model of the Earth, which accounts for the Earth's shape and size. In this instance, the geographic coordinate system referenced here is the world geodetic system 1984, which is the WGS. And when combining the WGS and the fuller system, which is the projected coordinate system, you're essentially taking point locations defined in a three dimensional space and transforming them into a two dimensional map that attempts to preserve certain properties. In this case, because we're using the fuller system, it would attempt to preserve areas and shape. The fuller projection system is a polyhedral projection, which projects the Earth onto like an unfolded as like an unfolded surface attempting to minimize distortion in areas and shapes. It's so, yeah, using fuller as a projected coordinate system would mean that spatial data initially referenced in a in the WGS would be projected onto this framework for analysis of your purposes, so they kind of work hand in hand in order to move from the 3D to the 2D. Yeah, this choice of CRS and projection depends on the specific goals of your project, whether you're aiming for precise navigation or location accuracy. This is where the world geodetic system would shine. Or whether this is for conceptual, you know, educational visualizations that challenge the traditional perspectives like the fuller projection I used earlier. So we're now going to just move on to looking at spatial relations and spatial statistics in GIS. So this kind of moves a bit more towards the advanced topics. We will be looking at one type of spatial analysis in the code demonstration on Friday. But just to kind of clarify the terms again, spatial relations refers to the way in which different locations, areas or objects are situated in relation to each other on the Earth's surface. The spatial analysis refers to then studying these entities or studying these relations by examining, accessing, modeling, evaluating all the spatial data features. So, yeah, spatial analysis is kind of seen as the seen as like the method section of a paper, for example, you know, it uses a variety of computational models, analytical techniques, algorithmic approaches to all assimilate geographic information and define its suitability for the target system. So, yeah, let's take a brief, brief look at the three most common analysis methods you might see, especially when working with crime data. The first is a point pattern analysis. A point pattern analysis is primarily concerned with identifying the spatial arrangements of individual points within a given area and determining whether the observed pattern is clustered, uniform or random. It focuses specifically on the spatial distribution of events or features without necessarily like assessing the intensity or the concentration of these events. It's just showing a distribution. It's used commonly in crime analysis to understand the distribution of crime trends across different areas. And, yeah, this analysis helps in understanding that like underlying spatial process that may lead to certain patterns. It's quite, I guess, descriptive analysis, but it proves important in crime for understanding that overall distribution of crime. On the other hand, we we have something called hotspot analysis, which is specifically designed to identify areas of high intensity or concentration of events. So it evaluates not just the location of the individual points as the point pattern analysis did, but it focuses on the clustering of points with high values. So this is how related each of these different crimes are to each other. Typically, you'll have a map that has one colour that varies in the intensity, and this is how hotspots are unknown. The most common technique or method used in hotspot analysis is the kernel density estimation. And this works by creating a smooth surface of crime intensity over a map, which can then highlight areas of concern that aren't apparent from the raw data alone. And this can be helpful in strategic planning and, you know, things like placing more street lights in areas with high nighttime crime rates or placing more CCTV outside areas that tend to be frequent to theft. This should say be hotspot analysis, so please just ignore that title for now. But we'll move on to looking at C, which, oh, this should say C. Ignore the numbers for now. But spatial autocorrelation is the third analysis I'd like to briefly talk about. Spatial autocorrelation is a technique that measures the degree to which crimes occur near each other in space. It is a measure of the degree to which one object is similar to another object in its surrounding area. For instance, if one neighbourhood has a high rate of burglary, then spatial autocorrelation would suggest that that neighbouring areas might also show high burglary rates. You use something called a morang eye, which then can be used as a visual representation of how all of these indices can plot and identify clusters of high crime rate. Again, there's very much practical implications for this, especially for police departments because they can use these metrics to predict and concentrate on areas that might require more attention or resources. Spatial autocorrelation can also be seen as a tool for prediction methods because it allows us to predict what neighbouring areas and crime levels might be like. That kind of sums up spatial autocorrelation. The last technique is spatial interpolation. This map here uses a gradient of colours to interpolate crime rates between known data points, creating a map that basically estimates crime rates in areas with no direct data. You might have heard of interpolation in the general data world. This is just a way of estimating unknown values of certain variables. With spatial interpolation it's the process of estimating unknown values at certain locations based on known values from the surrounding areas. There are various methods for this, such as the inverse distance waiting, crying and spline, various methods that you could use. These can all be used to create crime maps that predict crime in unsampled areas. Spatial interpolation is significantly important because it allows us to predict values or unknown values for any geographic point based on the values surrounding that area. In the crime world, it can be used to estimate crime rate in new urban developments, for example, that are based on surrounding neighbourhoods. Obviously there is a loss of limitations of interpolation especially in the spatial world. It assumes spatial homogeneity and may not account for unique local factors. Spatial homogeneity basically just refers to the uniformity or consistency of patterns across a defined spatial area. In the context of spatial analysis geography and sorry, I'll start again. In the context of spatial analysis, it implies that the measured variables exhibit little to no variation observed across different locations within that study area. This concept is crucial for understanding spatial data. It impacts the assumptions and methodologies used in various ways. Those are the four main spatial analysis and spatial statistics that are prominent in exploring crime and crime trends. I've also just put together this little table data source resource for finding crime data. I've gone for four main categories, which is the UK data, non-UK data, specific R packages that allow you to directly download datasets into R and various other resources. Yeah, so this was just a, I'm not going to talk through every dataset here, but I do know it is quite difficult to pinpoint specific crime data because so much data exists out there, but it's so much have different variables. Some don't include spatial variables. Some don't include date, some don't include time, some are very restrictive. But yeah, this was just for your information. So this kind of brings us onto the question then, what are the main challenges of mapping crime data? Well, when using open police recorded statistics, this provides point information through the use of GIS. But the accuracy of the spatial data is typically obscured by like geomorphic techniques that serves to protect the location privacy of victims. They never provide the exact location of where a crime was reported. And a technique called jittering as you used to do this, it almost shakes the coordinate of a location as to protect the privacy of that victim or the location privacy of that victim. So what we think might have happened outside of school might have actually happened around the corner or down the road. Certainly police recorded crimes are a known contribution to the grey figure of crime in that they underestimate the actual number of crimes recorded and not just reported, which reduces the accuracy of statistical models due to missing data. And this isn't something that can be overcome in mapping techniques but is something that definitely has to be considered. So again, this is issues with the collection of data. There are some conceptual issues surrounding the definitions of chosen crime types, specifically violent offences and sexual offences. They are recorded as one crime type which should be viewed of caution because it does tend to apply quite a holistic definition by conceptualising these very indifferent crimes into one category. And then lastly the impact of seasonality. How has COVID-19 affected police recorded crime statistics? If we think about comparing the data collected in 2019 which was peak COVID and the data compared in 2020 there are going to be significant differences on the results. This is because of the introduction of increased government restrictions and lockdown rules. In fact, there was actually a reduction in burglary as more people were forced to work from home. So there was a reduction in the opportunity to commit these crimes. So it's not entirely accurate to hold year to year comparisons specifically over the pandemic. Just a couple minutes for you to type and think about what are other main challenges of mapping crime data. What are challenges that you have come across? What are challenges that you're worried that you might come across? Feel free to put in a couple words or sentences and we will slowly move on to the Q&A. Location of crime versus address of a sender or victim. Very good point. Quality Consistency and data Death Consistency and data between different forces. Yeah, that's a really good point actually because different police forces across the UK do have different processes for not only recording crime but also reporting crime. So that kind of makes us question this like reliability of comparing I don't know, Cheshire Police the London Police, it makes us question those like underlying recording processes that happened. Missing data, yeah, absolutely. Missing data is huge in crime, isn't it? And this is why spatial interpolation was so important because in order to have an accurate representation of distribution of crime in an area, we would need we would need data in those areas and interpolation allows us to estimate for those unknown values but these estimations aren't entirely accurate are they? Reliability crimes are intersections instead of actual places. Absolutely that's a really good point as well. We never truly have the actual place of where a crime was committed, right? Numbers will be in high population areas may need to look at rates. This is a fantastic point and you've jumped the gun because this is what we'll be addressing in the live code demonstration. We'll be looking at the differences between mapping crime rates and mapping crime numbers and how different that these interpretations can be. Accessibility Yes, if you mean accessibility of data, I would say that's probably a key factor. Not all data or not all crime data is available to the public. Only police recorded crime which in theory is quite limited to what we can do with if you meant accessibility in terms of like software and working with specific codes then there's of course accessibility issues there as well. Thank you all for inputting those answers. There are way more challenges of mapping crime data that we could talk about such as I don't know like it does encode some sort of systemic discrimination and this kind of links back to that grey figure of crime where there is such a huge gap between the reported and recorded statistic and with crime rate data they are still affected by certain demographics of the population. In fact there was a study by Graber and Stern in 2018 that highlighted that to call the police is a privilege of is a privilege and additionally police legitimacy can affect the willingness to call the police so yeah loads of variables that we could talk about but yeah thanks all for taking part in that I believe that is the end of our talk yeah I'll quickly just talk about the live code demonstration for tomorrow sorry not tomorrow for Friday and then I'm going to take on any questions I can see in the Q&A so if Emma you could share the github link for us in the chat that would be fantastic and I'll quickly just step out of here and show you the repository itself so this is our github repository for all our crime workshops there are currently five that have been produced we will need the data from the Feb underscore 2024 folder so once you have this link you should be able to clone this repository into your RStudios but this involves copying this HTTPS link here that you can see if you copy that and you head over to your RStudios you click file you click new project just give it a couple seconds and once you click new project you then click version control because github is our version control system that we're using we then click get and you simply paste that URL into this link here give it a name, save it where you want to save it and typically open this in a new session just to avoid confusion and create your project and this will allow you to work alongside me during the live code demonstration which would look something like this we'll be working through four main sections, sections one, two, three and four and for those who are not completely familiar with R, there is a prerequisite document here which tells you exactly what I just told you how to clone this repository into R so all the steps that are needed for you if you open up this folder, so Feb 2024 go to code all those sections are also available there, so if you click your prerequisites.rmd file this will just tell you everything about setting up a working directory and installing some of the basic packages needed you can also open this as a html link I believe or maybe not sorry yes, that is just a quick rundown of how to get the data for the code demonstration and yeah thank you all references we found here please feel free to complete the survey as you leave this in the zoom and if you want my contact details are here on the screen and we're going to stop the recording so that we can have a quick Q&A session I'm just going to quickly check the chat the recording will be stopped now, thank you all for listening