 Welcome to this UK Data Service workshop named Mapping Crime Data in R. My name is Nadia Kennell. I am a research associate with the UK Data Service based in the Cathy Marsh Institute at the University of Manchester. And yeah, welcome to this presentation on an introduction to GIS and spatial data. This presentation will be treated as an introduction to the main topics and issues involved with GIS and mapping data. It isn't going to provide a huge array of information, but it will detail what is necessary in order to understand the following code demonstration which is being held tomorrow. So this workshop is split over two days today and tomorrow. Today we're going to specifically be looking at exactly what is GIS, the differences between spatial data and non spatial data. We'll then talk about different types of maps and the main two being referenced and thematic maps. We'll then move on to discuss what are projection methods and how we use coordinate reference systems to make these projection methods. And then we'll start to look at the challenges of mapping crime data more specifically. All of these contents will help to understand what we'll be doing tomorrow, which is the live code demonstration in our studios. There are three main topics that we'll be covering. We'll look at just exploring some of our crime data using some really like basic packages and functions. We'll then move on to understanding shapefiles and how we can use these in our crime data. And I'm going to look in more detail at how we can join census data to our crime data to get more context from this. And we also, given the time, we'll look at some extra topics such as interactive maps using the leaflet packet, which is really fun. And also looking at the Google API function. That is the content for these next two days. I'll talk a little bit more about the code and how to get this up toward the end of this workshop. But yeah, let's let's begin with the presentation. So what exactly is GIS? GIS stands for graphical information systems and they can be defined as computer systems for capturing, storing, checking and displaying data related to positions on the Earth's surface. They can be kind of viewed as a theoretical framework that allows for the creation and analysis of spatial and geographical data. Arguably is quite an abstract platform and it aims to integrate data onto a map through using various methods. GIS is present in virtually every field and every organization as it's a way to share information and to solve complex problems around the world. The biggest benefit allowing for trends and patterns to be studied visually, which provides a new form of analysis. Now there's quite a rich history to GIS in that in 1960s there was a pioneer named Roger Tomlinson and he was commissioned by the Canadian government to create a usable and efficient inventory of its natural resources. Now he tried various manual methods for overlaying environmental cultural and economic variables, but all were too costly. He couldn't find an effective solution. So what he did was create the first automated computing system. And this is why he is known as the father of GIS and this map that you see here is one that he created. And then from there we have researchers such as Jack and Laura Dangamond who developed the ESRI system which stands for the Environmental System Research Institution which is software developed for mapping and spatial analysis and which is used quite a lot in today's work. There are various softwares available for this GIS for the spatial analysis. We have Joda, ArcGIS, FME, QGIS and R. Now spatial day analysis, geo-visualization, spatial auto correlation and spatial modelling can all be done within these softwares. But obviously for this workshop we'll be looking or using R and its IDE R studios. The reason I've chosen to do this with an R is because I feel like there's an increasing amount of packages have become available for this spatial analysis and mapping and modelling. And yeah I feel like R has just been an increasingly popular tool over the last decade so yeah I support its promotion and its growth. GIS, GIS are known to produce two broad types of maps. We have our reference and our thematic maps. Now reference maps are used to communicate location on more static data points that is to pinpoint data on the map and we can think of this to be more descriptive. They tend to highlight some sort of natural pattern or synthetic features including the positioning and heights of mountains or the layouts of bus routes. This type of map is simply referencing hence the name what exists in our physical environment. On the other hand we have what is known as thematic maps and these are used to highlight some sort of spatial relationship as in to study a theme within a map and these can be viewed to be more explanatory. The thematic maps is how we would map a particular theme to a geographic area. It tells us more about a story, tells us more about a place and it is commonly used to map issues such as climate issues, population densities or health issues. And we'll be looking at how to do this tomorrow in the code demonstration through our use of the census data. Now I understand that sometimes the differences between the reference and thematic maps can become confusing. So I've created a few example scenarios questions where they fall within where I want you guys to kind of guess whether these scenarios fall within the definitions of a reference or a thematic map. So yeah, if you would like to head over back to Mentimeter and you can get involved in these next few questions, I believe there's only three scenarios. Yeah, I don't feel worried about getting the answer wrong. All right, we'll hear just for a discussion. So your first scenario is the visualization of road networks to improve road safety measures are a type of reference, a thematic or not too sure. We're quite even split at the moment, but give it a few more seconds, but the majority are sitting within reference maps which is quite interesting. And that's right. Yeah, I would say that at first glance, this might be said to be a type of reference map because we are simply pinpointing existing road networks onto a map, right? However, for those 34% that said thematic, this might also be a thematic map as we are studying those existing road networks to then improve road safety measures which can be seen as, which is an example of a type of accident analysis. This might be, for example, the introductions of speed signs or zebra crossings outside schools or residential areas due to reports of accidents and dangerous driving. So what we're doing here is studying spatially the relationship between road networks and accidents. We are mapping a particular steam to a geographic area. But I would like to just kind of reiterate that there isn't necessarily a right or wrong answer here because in this scenario we are definitely covering definitions of both, right? But the importance kind of lies down to your research questions, your research aims, your research purposes, your conceptualizations. So yeah, there definitely isn't a right or wrong answer. But yeah, we'll head over to scenario two, which states, the visualization of the Earth's surface showing its elevation is a type of reference, thematic, not sure, answers coming in straight away. Got a majority sitting at a reference, 80% reference, 20% thematic. That's really interesting. Just for some clarification, this type of scenario, this type of map can be described as a topographic map, which tends to refer to a like, geographical representation of three dimensional configurations of the surface of the Earth. So in short, it's simply describing where the Earth's surface is elevated. And these maps are normally presented by like contour lines, little like wiggly lines. So arguably, yes, you might see this type of map as a reference map because researchers suggested that studying a topographic map is a great way to learn how to match terrain features with the contour lines on the map, right? Such as the steepness of the terrain, the shape of the terrain or whether the sea level was kind of like above or below. So in this instance, you might interpret that we're studying spatially the relationship between contour lines to different features of the Earth. So yeah, I would say that again, if you answered any of these options, you would be considered to be right because we have this like overlapping definition between this scenario. Just checking that there's no question so far. Everyone seems to understand that there's this real ambiguity between the two. But yeah, we'll move on to our last scenario. Navigation tools such as Google Maps or CityMapper can be classed as reference, dramatic, or you just don't know anymore. Maybe I've just confused you too much. Yeah, so in this type of question, I think you might immediately assume that this is an obvious type of reference map as it highlights important physical features needed for travel such as bus routes, walking routes, cycle lanes, etc. So this is, you know, we're just pinpointing, we're just describing the layout of different travel routes, which would be a type of reference map. However, yes, reference maps portray a basic set of features such as, you know, coastlines, terrains, and transport routes. But can we say that an app that plans your travel is a type of reference map? Because it uses like travel apps and websites, it uses an AI algorithm to get you to one place to another, right? So normally choosing either fastest or cheapest route. So can we call this a thematic map instead, since it's overlaying information on a base map? And this kind of brings the argument back to Roger Tomlinson, this father of GIS, and his automated computing system. It draws importance to like the differences between analog and digital maps and their place in GIS. Because it's really interesting that recently researchers considered navigation tools such as Google Maps or City Matter to be fundamentally different from both reference and thematic maps, which kind of opens debate for a third category, but it could be argued that all maps are navigational depending on how we use them. The difference is that a digital map, one that is specifically let's say interactive, whereas an analog is not. But yeah, this links us back to Roger Tomlinson, he couldn't overlay his data on an analog map, so he moved toward computation. And now one main difference between reference and thematic is that these thematic maps are more interactive, which is why navigation tools might be considered to be a type of thematic map. Either way, this answer, as I said, is open to debate. And the difference is between mapping places and mapping data. But yeah, I'd like to kind of draw on a, I mean, thank you for participating in those scenarios. I hope they're useful in distinguishing differences between these types of maps. And now I'd like to draw on a real life example. In using the tube maps. There was a study that was done that looked at tube maps in London as a tool to predict life expectancy. And it was named lives on the line. Yeah, and, and most people again, would or might highlight that this is a reference map because it shows the location of different tube stations and the location of each tube lines. But then this turns into a thematic maps because we're using these predict life expectancy poverty and medium house prices. I think this is a really good example to highlight how real life examples come into play. And just a kind of more context about this study, typically like most government statistics use geographical units such as wards or lower layer super output areas, which we will address tomorrow. So in this study what they did they used tube lines, which is not a standard standard user of measure. But it kind of brought yeah like a fresh light to, to life expectancy and how there are different ways to to map, map data. So, what can we sum up. So, maps for broadly into two categories, there are ways in which these types of maps overlay or share similarities. It could be argued that almost every thematic map is a reference map but not every reference map is a thematic map. And the decision is up to you is not entirely necessary to define these in your work but it's important to know what type of map you want to make as this can be affected by the data you have. Yeah, this kind of draws back to the whole mapping places versus mapping data. So can anyone here give any kind of examples about any types of maps that might share qualities of both reference and thematic maps. I'll just kind of give you a few seconds to put some answers is as well as giving my self little break probably tired of hearing me speaking. Yeah, feel free to, to type in some examples of any type of maps that you think might share qualities of both these like references the emetic maps, equality conditions, different boroughs and London. Yeah, definitely was a great one because, again, examining those, you know, placing those boroughs would be a type of reference map but because we're using them to examine equality conditions that then becomes a type of thematic map because we're understanding something especially great example. Crime maps maps of housing prices yeah crime up so great one. Obviously great example since that's what we're going to be doing tomorrow. You know you could use this to kind of analyze where crimes are most prevalent in order to reduce crime or introduce a men policy that's a great example of a thematic map. We have deprivation across local authority areas great. I'm hoping to do something similar with this tomorrow looking at the census data maps of soil ecosystem services, burglary rates. Got loads of that I'm scrolling scrolling through disease mapping examples in general a map of all train lines in a given city definitely. Again, you know how would you use that because that might just be read as a reference map but if you're using it to study something spatially then this becomes a type of thematic map. Traffic accidents on a road map definitely that comes back to one of our scenarios that we looked at a map of the UK showing unemployment rates, great example again. Burglary rates. Disease mapping examples. Yeah, these are great examples so fear of crime at lower super output areas. That's a really interesting one. Because again, you know you could just simply map these LSOAs but the fact that you're using another data set some contextual data set to study these spatially becomes a kind of thematic map. And then become something more interactive I suppose heat maps on football analysis great one heat maps are great. Great type of thematic maps. There are some other specific examples of thematic maps, such as heat maps we have chloro path maps we have castograms we have they symmetric days symmetric maps. They're only kind of loads of formators category. That's why they can be surprised to be that more interactive. We have a response in the Q&A which is forecasted areas across the UK. Great one. A forested areas, apologies. Yeah that'd be really interesting that's again. I guess examining that like visual landscape. Thank you for getting involved with that I hope that's been clear these differences. But we will move on. So going to move on to looking at spatial data, what is it and why is it important. I found this quote which I think was quite useful in explaining it. It states that spatial data or geospatial data is a data frame that contains information about specific location, which can be analyzed to better understand that location. And I think this is just a really clear way to understand what this is. So some of those types of thematic maps that we talked about use these these spatial data, you know, the heat maps to understand certain trends or deprivation statistics would definitely be considered some type of spatial data. So what exactly is spatial data and we can break this down a little bit more. In short, spatial data is just a representation of the real world. It attempts to represent the physical features of the data in an accurate way. Spatial data. As I said, yeah, it's a data frame that contains information about specific location, which can then be analyzed to better understand that location. And JS those graphical information systems enables the spatial data to be processed and analyzed. Now there are typically two types of spatial data. So spatial data, which contains points, lines and polygons, and we have something called roster data, but we're not going to be covering that in this workshop. But yeah, vector data is the most common form. And it's made up of these three, these three constructs. Now points tend to represent a pair of coordinates. So this could be, for example, the location of a robbery call could simply just be the location of where a robbery was was found. We then have lines which extend these points and tend to include at least two points. So this could be, for example, the street that that robbery call was received on. And then we have polygons, which then extend the lines and includes three or more points. So this could be the area, the city, the ward, the borough that that burglary or that that street belongs in. So they're really like interrelated if you want to say it that way. So now we have a basic understanding of spatial data. This is where we ask, how do we actually pinpoint a location to a map. So what we have is projection methods, and this is how we move from the 3D to the 2D. Map projections try to portray the surface of the earth or a portion, a proportion of the earth on a flat piece of paper or computer screen. So we're trying to move from a spherical shape that 3D to a planet shape to that 2D. Draw a little, maybe a bit quirky example, but a good example might help you understand this. So imagine you have a football and you begin by cutting up this football. Now it wouldn't fit together perfectly if you try to recreate this into like a rectangle or square, right? You have these like really empty spaces, you have mismatch shapes and these empty spaces can't be filled out perfectly. So what these projection methods do, they form a simple equation that's a telemapping system how to populate these new areas or shapes. So if we wanted to create a rectangle, we might have something where all the area is populated. And that's all that these projection methods do is equally populate an area, a shape into, yeah, into a full area. Now, during these projection methods is important to note that the data can become distorted. The area, the shape, the distance and the direction of points can all be affected. And now although there are algorithms in place to control for this, all four features are rarely preserved. And it all depends on which attributes you're willing to compromise. Some try to maintain the correct distance, others try to maintain the correct shape and others try to maintain the correct area. In this football example, we're obviously trying to maintain the correct area by distorting the shapes in theory. So yeah, it's all about kind of finding the sweet spot, some sort of balance stuff that balances all these factors. And this is why you see so many projections that are for individual countries, regions, districts or streets. There are three main projection families, there are cylindrical, conical and planar. And within each of these families are literally hundreds to thousands of different types of projection methods, which can become very confusing, but it's not completely necessary to know. But I would like to draw an example of how some maps can be distorted because of different types of projection methods. Draw important to the web indicator versus the gal Peter projection. Now these are two types of projection methods, the web indicator is from the conical family. And the gal Peter's is from the cylindrical. Now, the map on the left is typically what we know what we see, what we have seen, you know, in in school and education in, in TVs and shows and kind of every part of life. This is the map that we see the gal Peter's projection is a much newer one. It is unique among world maps because the area ratios of all the continents are the same as they are in reality. That is, in this example, Greenland doesn't seem larger than Africa. Whereas in the Makata projection, obviously the that instance is much different. The Makata projection, what it does is kind of grossly distorts the size of the continent, using a causing like this Greenland is larger than Africa effect, which is really interesting. But it stays true to their shapes. So geographically speaking, the shapes are more important. It is far easier to change the scale of a map for different areas of the world than to adjust the length to width ratio, as one needs to do with the Peters. So this is just a really interesting example of how map projections can cause distortion and cause differences in in map shapes and which is why it's really important to know what type of map projection you will be using. An example of the web Makata projection would be the WGS 84. And if you're familiar with spatial data and GIS, you know that this is a very popular map projection and is which is one we'll be using tomorrow. Now we've had this understanding and example about different projections. We can move on to look at our coordinate reference systems. How do we actually move from the 3d to the 2d. This is where our CRS or coordinate reference systems come in handy. Every place on the earth is specified by three main numbers, as in our coordinates. And these are known as our latitude, longitude and altitude. Each number indicates the distance between the between some point and your fixed reference also known as origin. And these CRS systems. Yeah. That's that's basically just how we move from that 3d to the 2d. There are two main types of CRS. We have the geographic coordinate reference system, which uses three coordinates. This is a reference framework that defines the locations or features on a model of the earth. It's shaped like a globe as seen on the image on the left. And this is the most commonly used system. The most commonly used one is the World Geodetic System, the WGS84, which is the one I briefly mentioned and one we'll be looking at tomorrow. So if your data contains longitude and latitude data, then this is the CRS you'd most likely use. On the other hand, we have our projected coordinate reference systems. And this typically uses two coordinates. A PCS is flat, as you can see on the image on the right. What is really interesting about the projected coordinate system is that they contain a geographic coordinate reference, a geographic coordinate system. But what it does is it converts this geographic coordinate system into a flat surface using projection algorithms. And yeah, these are the two main systems I use and depending on what type of data you have, then you'll be affected by the type of system that needs to be put in place. Just a little note that the geographic coordinate system does fail to measure distance. Therefore, that these projected systems then become meaningful when the information is needed on a flat screen, which can be done via various projection methods. So when working with more than one form of spatial data, it's important to ensure that the data is stored in the same coordinate reference systems, or they will fail to line up with the GIS. And the decision of which map projection and CRS to use depends on the regional context of the area you want to work in, or on the analysis you want to do, or most importantly, the availability of your data. Yes, so that is the complicated stuff kind of out the way. And I hope I've been able to explain that quite clearly. I've just seen a question in the Q&A where someone has asked, I have some data with eastings and northings rather than longitude and latitude. Is there a way to translate one to the other? Yes, there is. If I'm not mistaken, this can be done in... I'm not sure what software you're using, but this can be done in our studios through the... I'm just trying to remember the name of the package. I think it's called the RGDAL package. And it uses the function... UCI might have simply kind of translate these across. There are also coordinate converters online. That can be done for you if you're just kind of looking to do this online, because I'm not sure what software you're using. But yeah, it's definitely possible to translate northings and eastings to the longitude and latitude. All right, yeah, we'll carry on with the last part of the presentation. As I said, we're running quite quickly, but that's no problem because it gives us time to explain prerequisites for tomorrow. But yeah, mainly we've talked about the kind of main components to GIS and we've looked at spatial data. We've looked at projection coordinates. We've looked at coordinate reference systems. We've looked at differences between reference and thematic maps. But now I'm going to draw attention to the use of crime data, as that's what we'll be using tomorrow. And I think it's best to start by looking at the challenges of crime data. What are the main challenges of mapping crime data? Firstly, using open police data can be criticized because police recorded crime provides point information through the use of GIS. However, the accuracy of these like spatial data is obscured by geo masking techniques and geo privacy that basically serve to protect the location and privacy of victims. In that they never provide the location of where an exact crime was reported. I think the methods for this is known as jittering. So let's just say that we had a recording of where a burglary happened in Manchester. Techniques such as jittering are applied which kind of shake the coordinates and move where the actual location happened. So what we think might have happened outside of, let's say, a school might have actually happened elsewhere, might have happened down the road outside Sainsbury's. And this is why, yes, this is kind of why you need to think about how generalizable are these results if you were to conduct some sort of research. Secondly, police record as crimes are known to contribute to what is known as the gray figure of crime in that they underestimate the actual number of crimes recorded and not just reported, which reduces the accuracy of statistical models due to missing data. Now this isn't something that could be overcome in mapping techniques, but it's something that definitely has to be considered and something that will, you know, need to be discussed in your limitations. Thirdly, there are some conceptual issues surrounding its definitions of crime types, especially in police recorded crime statistics. So the police recorded crime kind of combines violent offenses with sexual offenses into one category, and this should be viewed with caution, especially in analysis as it applies is kind of like overtly holistic definition by conceptualizing these in different crimes into one category, because violent crimes and sexual offenses are very indifferent crimes, but they're always grouped as one with police recorded crime statistics. Again, something to consider in your limitations. Another important factor would be the impacts of like seasonality and trend and noise. You know how has COVID-19 affected police recorded crime statistics in the last two years. You would suspect that there's been a, you know, a large decrease in the amount of crime reported due to changes in people's routine activities. More people are working from home, which might reduce the likelihood of burglars happening, for example. But yeah, there has been key research over the pandemic, which has identified a reduction in specific criminal activities as a result of these increased government restrictions and lockdowns and rules. So yeah, it's not entirely accurate to hold year to year comparisons over the pandemic. Can you guys think of kind of any more challenges of mapping crime data? I'll give you a few minutes to put in your thoughts into the Mentimeter. If you can't think of any kind of main challenges specifically to crime data, then even just mapping in general, do you think there are any challenges? This could calm down to cost to time. Oh, yeah, brilliant. They modify all bull aerial unit problem. And this is a very, very common problem in mapping data. This is basically when it's like statistical bias. It's that basically data that is spatially aggregated to like a specific group or a specific region is not entirely accurate. It's not entirely suitable to the data. And it's definitely great problem to do again addressing your limitations. There are ways to reduce your modifiable aerial unit problem. And this would be by using smaller areas rather than larger areas and because this would just kind of reduce that variation, which is typically why government statistics tend to use lower layer super output areas because, you know, comparing crimes to the whole of England to the whole of Wales might not be as so as efficiently comparing, you know, smaller units in between. Crime porting is not consistent. Very good point. That definitely brings back to like that gray figure of crime differences in reporting and recording. We have differences throughout the year due to seasonality. Selective reporting to the police. Yep, definitely. Good point. Definitions of crime differ between different countries. That's a great point. Different countries in fact some great authorities said to have different definitions and different recording practices which make it really hard to compare visualizations across different cities. Behavioral crime is recorded in addition to the most serious other offenses so can result in multiple points being mapped from a single instance on occasion. Yeah, that's a great point. We also have the problem of missing repeat locations as well. Getting that XY coordinates where the crime happened. Yeah. This is back to that like geo masking and geo privacy issue. So yes, there are obviously loads of limitations to the mapping data and mapping crime data specifically. But do these outweigh the pros? Can we still use, you know, our hotspots and our thematic maps as effective solutions to like policy amendment in crime? That's a very big question. And I would say that we can, you know, in the last 10 to 15 years our computational skills have vastly increased and we found ways to reduce limitations and reduce variations in our crime statistics. Great little discussion. Thank you all for that. And I think that draws conclusion to our talk. So yeah, just kind of talk about the references that was used in this slide. So yeah, I referenced a few, a few articles. One that I like to draw importance to is a Ratcliffe, which is the spatial and temporal challenges of mapping data. I think it's just an excellent read. And it goes on to talk about other factors such as systemic, systemic discrimination and like how this links back to that great figure of crime and why there is such a huge gap between the reported and recorded statistics. And yeah, so we also how like crime rate data are still affected by reporting statistics among demographics as a population. And they highlighted that to call the police is like a privilege of being white. So additionally, like this police legitimate legitimacy can also affect the willingness to call the police. These are kind of like really intricate but interesting thoughts and limitations to how your results might be affected by police recorded crime.