 Mwneud hynny ar hyn, 라는 ond gwybod y byddych yn ddweud. Mae'r sefydlu mewn gwirionedd i'r ap amser. Yn gweithredu o'r cwyslic allan y tua, felly mae rhaid i'n cael ei gweld o ymddangodd i gweithredu. Fe i'r gwybod yn cymhwyl sydd ar wahanol yma. Rwy'n cydnod yama'r corff am oed yn gweithredu. Ond yn gweithredu e'r cwyslic allan y tua, mae'n gweithredu lle'n gweithredu neu mae wedi bod yn cyfnod peynig mewn hynod. The reason why it's good is because it's compatible with GDplot, so as we've discussed, the data handling side of stuff will be different because you're dealing with geographic data, but when you get to the visualisation side of things, the structure of the code required to make a map is almost identical to, for example, making a bar plot or making a scatter plot is very, very similar. It's worth mentioning at this point, so SF is what we covered today because SF is, at least it's my interpretation of what's going on, is that SF is like the future of geography mapping and GIS in R, but there is another package called SP. I'm not actually sure SP stands for, but SP is older and it's been around for many, many years in R and that's how mapping, that's sort of how spatial mapping related stuff began in R and it's just worth remembering like some examples you see might use SP related data rather than SF. It's very similar to the data frame and table distinction that we've discussed before. They basically do the same things, it's just a different way of storing data and a different way of handling data, but it's not so much, it's not again, it's not a big deal because you can convert between the two very easily. It's just worth remembering, okay, we're going to focus on SF today, but be aware that some examples you come across online or colleagues or whatever might use SP and you can just be like, okay, yeah, it's SP. And probably what you want to do is just convert it to SF instantly if you want to use the skills we do today. But yeah, SF is definitely the way forward. If you like that I have a book within reach, any like modern book that comes out like this book, which I think is maybe a year or two old now, that will focus on SF. It will mention SP, and it will use SP in the book, but books like this will definitely have a focus on SF because SF is where things are going in the art world. So I'm very pleased that SF exists because it's made my life a lot easier. But there is bad news as well. So I know that there are some people here that are, you know, they are geographers, they're even more, have a stronger and longer geography background than me. But if you are completely unfamiliar with geography, and you haven't used the GIS before, then of course a lot of this territory we will be completely new for you. So the good news is that the visual side of it is easier, but it does mean that you do have to learn a few more skills that are very sort of geography specific and specific to the field of GIS. So geographic information systems, which are basically just software to conduct geographic spatial handling and analysis. People might have heard of stuff like QGIS, or ARC GIS is quite a popular one. These are just pieces of software specifically designed to handle spatial data, but you can basically use ARC as a GIS, but GIS is the field in its entirety. Okay, so I said to me, in my mind, the three things that you should focus on, if you're completely unfamiliar with GIS related stuff, is firstly just be aware that spatial data is slightly different. And I'm going to cover this in a minute. Spatial data is slightly different. The extension of spatial data is the idea of projections, which some people might actually be familiar with like anecdotally a bit because it's basically how we, how we portray a more or less spherical earth on a flat surface. And also there are some additional visualization issues that you might need to consider. I will cover these briefly now, but as always with these things, like it's a bit of a crash course, and if you really want to get into mapping in our, and you're completely unfamiliar with GIS, and I do recommend you read, you know, like books like this, or GIS related books that give you a more general introduction to it. Yeah, so spatial data. So we're, obviously, we're typically used to, when we think of data typically as a social scientist, you basically think of what we've been referring to as data frames or tibbles, like you think of rows and columns in a spreadsheet or in a, whatever that might be. So spatial data is slightly different. So there are two main different types of spatial data. So one of them is called vector data and one of them is called raster data. So we are going to focus on vector data today, purely because that overwhelmingly is the most popular data type in social science type research. I don't have a long formal geography background, but my understanding is raster data tends not to be used in social science research raster data tends to be used in like, like geology type stuff, and what would you call it like physical geography basically physical geography raster data is very, very popular social science human geography is less popular. So you can ignore raster data for now. Vector data, you can just think of it basically spatial data is a representation of the real world. So when you look at a map, it's basically just trying to represent the physical features on the ground in an intuitive and accurate way in a way that you can manipulate and use and use in a GIS piece of software. And it does this by basically representing features on the ground using either points, lines or polygons. So points, we kind of already covered a little bit already because we were talking about crime locations points are basically just comprised of an X and a Y coordinate. So in order to locate a specific area, a specific point in space. So for example, whereas specific crime is committed, you need an X coordinate and a Y coordinate to join them up and pinpoint them on the earth surface. Lines are basically just an extension of that. So you have a series of points, and then you you connect up these points in a particular order in order to, in order to create the line. So there is just four different points within the data. It tells you what the coordinates are, and then it tells you what order they should be connected in. And the lines, and when the lines, the final one when the lines are basically come back on each other and form a closed circle or shape, then it's a polygon. So intuitively might think okay points will probably be used to represent things like crime locations, or maybe, and I don't know if you're mapping out like where post boxes are or something like that you would probably collect that data and visualize that data using point vector data. Lines might be used for things like rivers or roads or something like that. And polygons might be used for stuff like buildings. Sometimes there's a bit of ambiguity around it. So for example, a road is actually it's a lot more than just a single line connected by points. You could represent a road using a polygon, for example, because of course roads have particular widths. Not all roads are the same shapes. They might be like bold out certain places. On body speaking, you will, you will intuitively know, intuitively have an idea about how you want to represent your data. It's just what I think you are. You might not even be creating your own data. You might not collect your own data. You might just download it from websites like the UK data service. And you can just think it's just useful to be aware of the fact that, for example, if you download like sense of block boundaries, like we were talking about LSOAs yesterday, like neighborhood units, when you download neighborhood boundary data from the UK data service, it will be vector data, and it will be comprised of polygons because neighborhoods are, you know, it's a boundary line that's closed at the end in order to form a shape like a broadly rectangular or circular type shape. But we deal with both polygons and points today. So hopefully that will give you a reasonable spread of what we're going to talk about. So to give you a kind of like less abstract example of this, I'll just check the chat in case people have. Okay, hopefully Julie can help you out with the dropbox folder. Yes, so as an example of it. So here we have, this is a satellite image of the LA Dodgers stadium. And okay, this is actually not reality because this in itself is a satellite image. So this is also a representation of reality, but if we just pretend that this image is, we're looking at it from a helicopter and we're actually viewing what's happening in real life. You can see that of course a hell of a lot is going on in this image in real life that this area is comprised of a huge amount of information so of course we have to stadium in the middle. We have the car park all around it. There's loads and loads of roads intertwining different types of roads so like minor roads you have motorways. There's probably a few footpaths going through the forested area, and you have tons and tons of different buildings all of different shapes and sizes. And basically there's a hell of a lot of information going on. So when it comes to spatial data, the idea is, okay, we want to, we want to create a representation of this real life area in a way that's sort of intuitive and useful, and can actually be read and manipulated using R or GIS or whatever it might be. So I use this example here because I really think, I did a project with a colleague recently where we were interested in the relationship between crowding and crime. So basically we pulled information about attendance at major league baseball games, which is why it's LA Dodgers attendance at major league baseball games and basically looked at the relationship between how many people attend each game and how much crime occurred on that day. So that's a relatively straightforward research question, and we were interested in the built environment. So for example, like how many buildings surrounded the stadium, the intuition being that if it's a very densely built up area, then people will be forced into smaller spaces and crowds will be compressed slightly and it will become more crowded and there might be more crime. That was kind of like the hypothesis around it. So of course, you know, realistically we could pull every single bit of geographic data around the LA Dodgers stadium and use that in the analysis, but in reality that would be far too complicated and completely unnecessary. So we just want to simplify it as much as possible. So what we ended up doing, and this is a visual that's been created in our using SF and GD plot, is basically we only pulled the building footprint of the sort of a mile or two around the stadium. And we got that from the Open Street Map API, which in that tutorial I've sent a link yesterday you can work out, you can figure out how to do that. And basically what we've got here, if you just focus on the black shapes to begin with, this is vector polygon data and it's just representing the building footprint around the LA Dodgers stadium. So it's highly, highly simplified, it's a highly, highly simplified version of this previous one, but basically it's enough information to answer the research question that we were interested in. We could have pulled lines that represent the road surrounding the stadium. We could have pulled additional polygon data that represented green spaces around the stadium, or maybe even point data that represented individual trees or something or lamp posts or something like that. But we decided just to go with this. So on this example here, the black stuff that you're viewing at, that is polygon vector data, and each one of those buildings is basically just a representation of a building using a series of points. And within the data, we know the order that those points should be joined up in, and those points all come round and they close together to form a closed shape, which we call a polygon. And also you can see the little red dots and the red dots are the crime location. So they are points vector data, because we got that from the open police recorder crime data in the United States, and they give you a latitude and longitude coordinate, which together pinpoint a point location, and you can then plot it on top of the map. Yeah, so in this data we basically had entirely comprised of vector data, and it's comprised of point vector data and polygon vector data. Yeah, so hopefully that clarifies we're just representing real life in a highly simplified way using vector data. I will just check the chat again in case people have, does anyone have questions on that normally? Does anyone have questions on that in general? I'd also be very interested to hear from geographers about their usage of vector data, or perhaps there are geographers here that have used raster data for social science research. I would say, but maybe back to the point about whether you use points, lines or polygons, I mapped the tram network in Manchester and the stations I had as polygons and the lines, so the tram lines were line vectors. Because although there is a physical width of the line, it didn't make any sense, it's not an accessible width of the line. People taking the tram can't choose to move right or left along that tram line. It's a bit about the logic of what kind of shapes you want to use to represent something. Yeah, yeah, that's a really good example where, you know, maybe the tram line data does exist in polygon form, but even if it does, you might think, okay, it doesn't actually need to be in polygon form. Because, of course, the tram width is probably this identical across the whole network for the purpose of the visualisation or, you know, if you were calculating tram time, tram travel times, you don't need it to your polygon. So, you know, in fact, it might not be possible to calculate travel times if it is a polygon. Yeah, that's a really good example. So it's given example of the paper that I sent people yesterday, which was a practical example of using OpenStreetMap in R. I think I mentioned it, we pulled the locations of London Underground stations from OpenStreetMap, and we created a buffer around the stations and then counted the number of primes occurring in that buffer. So I suspect that OpenStreetMap does have polygon data on London Underground stations, and that polygon will probably just be the main station building. It won't be anything underground, but it's like just the main station building that you'd view on ground when you wanted to get on the tube. But for the purposes of this, of the research, we didn't need to have the polygon version of the underground station because we knew we were going to have to do a buffer around the station. So we only used the point location. We only used a specific XY coordinate to locate the station. Even though, of course, the station is much more than just an XY coordinate, it's an entire building. But for our research question, to keep things simple and keep it intuitive, we just used a point. So sometimes you have the option of that. Sometimes you have the option. Sometimes you're limited by the data available, and sometimes you actually have a choice and you might decide to use one or the other. Yeah, just Sam, can I just build on that a couple of points? So the trams are really a good example because actually there's directionality to it. So on the bus is the same, you've got two sides to the tram and one going one direction and one going the other. So that can actually be relevant. The stadiums is a good example because the building could have, if it's that big, you might want to look at what's happening within different parts of the stadium. So it can actually have polygons within, so you could break it up into a set of polygons. And the third one that's confusing is sometimes people aggregate crime data to grids, to squares. So in effect you can use some of the raster techniques because you're using grids, so you're aggregating point data to a raster in effect. That's kind of the blur between them, but it depends on what you want to do in terms of thinking especially about it. Yeah, that's a really good point. So yeah, for people who aren't aware, like raster data you can, but basically it's a very fine grained grid over an entire area. That's why it's often used to measure cloud cover or something like that or pollution, because these things cover entire areas and you might be able to measure or estimate like a pollution level for every meter squared of an entire country, for example. Which is why it's often not used in social science research because we're often not using data at that level. But like Andy said, some crime research, what people do if they create an artificial grid. So for example, it could be like 100 by 100 meter squares, probably of polygons, probably of polygons, but these areas cover an entire array, like an entire city, this huge grid of 100 by 100 meter squares. And then you aggregate those points into the squares. So that brings up another good point of firstly, sometimes if you use stuff like grid analysis stuff in criminology, it might kind of have similarities with raster data, even though I suspect that most of the grid analysis people do is using. It might be a grid of squares, but it will probably be factor data that I imagine. What was the other point I was going to say. Oh, yeah. And secondly, of course, you know, vector data, spatial data doesn't necessarily have to represent something that's even real. Like, okay, so here were the crime locations are real in a sense, because a crime occurred at that particular location, or roughly that location. And of course, the buildings typically exist, but you can also create vector data of something that's completely synthetic, like a 100 by 100 meter grid does not exist in reality, but it's quite useful for the purposes of the research. And a 100 by 100 meter grid might make sense like theoretically or just empirically for your research. So usually vector data is a representation of real life, but it can just be completely artificial as well if it's useful in that way. I mean, neighborhood boundaries kind of don't really exist in a formal sense on the ground. Like people have their own idea about what a neighborhood is, but it's useful for the purpose of the research to draw boundaries for LSOAs to create a neighborhood. But on the ground, these boundaries might not exist, or it might just be a road in reality, but it's like a rep sensation of something to make it useful. So the other bad news, and I think this is the final bit of bad news, is that on top of getting used to the whole idea of vector data, although to be honest, like I think hopefully it's relatively intuitive and you don't need to understand it in a highly technical way, it's just knowing that it's a thing. Spatial data is kept in different formats. So today we will focus on a format called a shapefile, which I think is probably a little bit dated. I think a lot of people who work in GIS use other formats of keeping spatial data like Geo Jason, or KML, these are other formats of spatial data. Shapefiles are extremely popular, and when you download data from practically any government website, whether it's UK or America, whatever it might be, you will get the option of a shapefile, so I think it's a useful one to know about. And yeah, they're a very popular format to store geospatial vector data, so it's just useful to know about it. When you download shapefile data, it's just useful to know that it basically, usually you download it as a zip file, a zip folder, and that folder will contain a number of different components. So one of them will actually be having a file extension of .fhp, which is the spatial data itself, like the lines or the points or the polygons, but it will also contain some other files that sort of provide additional information, so you probably will have one called .dbf, and that's basically the data underlying the spatial information. So let's say you go on the UK data service website and you download a shapefile of LSOAs in Manchester, which is what you're going to use in the example. You'll get a zip file, and in that zip file there'll be a .fhp file, which will be the actual boundaries of the LSOAs, but there'll be a .dbf, which will contain the data associated with each LSOA. So if in that data it included the number of people living in each LSOA or the deprivation in each LSOA, that's what's contained in the .dbf, and the .dbf linked to the actual spatial data. So you have each polygon represented in the neighbourhood will have data underlying it from the .dbf to tell you something about that particular attribute. And it's quite useful to know that .dbf can be opened on their own, so I'm pretty sure if you click and drag a .dbf into Excel, it will just open as a normal data frame with rows and columns. And you can also load .dbf into R. There's a function called read.dbf from a package called foreign, and that will just load in wherever the .dbf is as a data frame, as you're familiar with and as you're used to, and it will just ignore all the spatial components. So that's quite useful to know as well. Well, so I say, yeah, different types of different types of spatial data. And, you know, again, I'd be interested to hear from the geographers here about whether they still use shapefile, sorry, but there is a debate over, you know, what data formats are actually quite useful. I think people here have said, oh, if you're using raster data, then use GOTIP, whatever that is, I've never used it. Vector data, there we go, entry shapefile, that's what we're using, but CSV apparently you can keep vector data. I don't even know that. 3D data, a whole different ballpark. I've never used 3D spatial data, but that will of course require different data format as well. In my mind, if you're a social scientist and you want to create some maps to supplement your analysis or explore some data, I think you'll probably get by pretty well with shapefiles, so I wouldn't overthink it too much. Projections are probably the most complicated thing I've come across in GIS related stuff, but again, it's something where you can, in my mind, and I know some geographers will disagree with me, but in my mind, you can conduct perfectly accurate, like good quality research and analysis with a very minimal understanding of projections, which hopefully we will cover today in sufficient detail. If people haven't come across projections before, it's basically the idea of, and people know this intuitively, of course the earth is, I think it's an oblique spheroid shape, like it's more or less spherical, right? But of course, when we look at a map on a wall that's been printed off, or we look at a map on a computer screen, that map has been created and it's undergone some kind of transformation from the earth shape, which is spherical, because it has to be flattened out onto a piece of paper or a screen in order for it to be, in order for us to look at it and manipulate it in the GIS software. And that transformation is known as a projection. The projection itself is obviously, it's a highly technical process and probably highly mathematical to make that projection. Of course, we do not have to have that technical understanding of it because software like R or QGIS, they do the projections for you, but it is very important when you download data or collect data yourself that you think about what projection you are using. So it's commonly referred to as a coordinate reference system, that's how we refer to it in R. So, one that you will have come across before, which is known as a geographic coordinate reference system, like it's sort of codename is WGF84. I'm not actually sure where this link takes you, but WGF84 is basically latitude and longitude coordinates, and it's a very popular way of identifying where things are on the earth surface. And often when you download data, it will be in latitude, longitude formats. And when you load it into R, the important step that I will show you later on is you just need to tell R what CRF it's in. So if you've downloaded data and it's latitude, longitude coordinates and it's WGF84, you just need to tell R like basically warn it in advance that the data you're giving it is WGF84. But there are other types of CRFs, which are, yes, they're more useful and more commonly used in research are projective CRFs. And there are many, I think there are probably hundreds or maybe thousands of projective CRFs. And that's basically because the projective CRFs that you use, which one you use might depend on what you're doing. Because that transformation process that we spoke about is imperfect, that there's always going to be a flaw in making that transformation from something that's more or less spherical to something that's flat. And different projective CRFs are good at different things. Some of them are particularly good at navigating, for example, some of them are better at representing the shape of something. Some of them are better at measuring distance. So there are multiple different projective CRFs, even for the same geographic area, like the United States probably had dozens of different types of projective CRFs and they're probably good and bad at different things. For the purposes of what we're doing, the one that you must know about is the British National Grids, which is basically the, I would describe it crudely as like the standard projective CRFs for the United Kingdom. So, as an example, we'll go through later. If you download data, okay, if you download data from the UK data service, I'm pretty sure that it will always be projected in the British National Grids. I'm pretty sure it will be because that's the standard and that's probably the one, it's almost certainly the one that you should be using for your maps and for your spatial analysis. But in some cases, and luckily we do get this example here, you'll download data for the United Kingdom and it will be in latitude and longitude coordinates, it will be in WGS 84. And if you download data and it's in latitude and longitude coordinates, the first thing you do is tell R that it's WGS 84, but the second thing you almost certainly want to do is project it to the British National Grids. Because when you do things like measure distance or calculate area, it's going to be more suitable, it's going to be more accurate and more suitable for what you're doing because the British National Grid is like tailored for the United Kingdom. Hopefully that'll make a little more sense when we do it in our, the main takeaway message I think is that basically be aware of the CRF that you're working in. Be aware of the CRF when you download data, take note of what the CRF is of the days you've downloaded. And then whenever you're conducting, or whenever you're measuring distances or measuring areas, or sometimes visualising stuff, make sure that you're using an appropriate CRF, which fortunately, if you're doing research in the UK, you can just stick with the British National Grid. I'll do more of that later. If you want to read more about projections, there's loads of stuff in the QGIFs documentation. And it goes into loads of detail about how projections are actually done, the various different accuracies, so here you go, like map projections are never absolutely accurate representations of the spherical Earth. They're always distorted. I actually have a link up here. Hopefully you can still see this. This is a little sort of app website, and it basically just demonstrates the idea of the distortion. So if I go on the same locator here, it is a very popular way of, it's very popular projection for the whole world, basically. You'll see it like people recognise that shape instantly because it's the projection that's often used on like school maps and textbooks and posters and things like that. So what you're looking at the right is the Macata map, and on the left is actually what's happening in reality on the Earth. So what I know of this in advance, I'm sort of cheating a little bit, is that the Macata projection doesn't distort things that much around the equator, just for the way that the projection is made, the way that the transformation is weighed around the equator, the distortion is fairly minimal. So if I move this up and down, side to side around the equator, you can see that that square on the left of the actual Earth is like, it is distorted, but it's not too bad. If I move it west-east around there. But if I move it north to south, you can see that it begins to heavily distort. It goes towards Greenland, which is a very common example used as this distortion. You can see that it begins to squeeze things around the poles, and the same thing will happen if I go to Antarctica down there, it will squeeze things. And that's just, it's just, and it's different for different types of projections. So here's a cylindrical projection, you can see at the equator, it's actually like pulling things apart. It's just like stretched more, more so than it was with the Macata. And it's a different type of distortion at the poles. And every single time that you do, you make a projection, you make that transformation, this process is happening. And it's different for different, you know, different projections distort things differently. And that's why you should think about what projection is most useful for what you're doing. But I will stop talking about it now, because I'm also not an expert on either. I'm not an expert on projections, but I'm still absolutely convinced that you can, yeah, you can use the right projection and you can conduct good analysis and appropriate analysis and appropriate visualisations by just understanding the idea like the kind of abstract idea of what a CRF is and just make sure you're using the right one and transforming the right one. I will stop talking about that now. The other bad news is visualisation issues. So there are challenges when it comes to making a bar chart or whatever it might be. But when you begin to make maps, other issues come up. I linked to a blog here by a guy called Joe Radcliffe who, he is a criminologist, but he does like spatial related stuff quite a lot and policing and things like that. And he wrote this sort of quite amusing blog about basically like very sarcastically saying what you should and shouldn't do when you're making a map. Things like a scale bar, for example, are things that you obviously wouldn't really consider when you're making a bar chart, but they can be very important when it comes to making a map. I hope this link is, no, it's gone. OK, I will fix that link. I mean, if you Google Jerry Radcliffe's map blog, it will almost certainly come up because I think it's quite popular. It was on the UCL website, apparently. But I think it's on his personal website as well. Sam, I think he had to take the sarcastic one down because people in America took it literally. He works in America and he's had to redo it to a non sarcastic one, unfortunately. I do remember realising that there was both a sarcastic one and a deadly serious one. So maybe he's sort of sarcastic one down. Yeah, but it's worth a read. And of course, if you look at generic GIS books, they will also give you a good idea about what you shouldn't do when it comes to mapping. Another thing, and this issue you might be familiar with because it's becoming increasingly sort of controversial when it comes to mapping political results. There was quite a famous tweet about the United States election because when you look at a map of the United States, if you look at the latest election result in the United States on a map by county, I guess, of what they have, it kind of looks instantly like the Republicans won a landslide victory. But it's very misleading because, of course, very densely populated areas are often vote Democrats. So when you look at a map of the United States, there'll be this sea of a particular colour representing the Republican Party, and it will look like it's just dominated the whole of the United States because of the massive varying difference in the size of different counties in different states. There can be quite misleading. This is a paper, again, talking about myself again, this is a paper that I did with Rika from the University of Manchester, where we basically investigated the same issue, but for the Brexit referendum result. Because if you look at the Brexit referendum result on the left, so this is the proportion of people in each electoral area that voted remain, if you look at the image on the left, you might think if you weren't really familiar with this is just England, I think, isn't it? Yeah, just England. You might think, OK, or barely anyone voted remain because it's just absolutely dominated by blue and green, which are, you know, the lower percentages of remain voting, but of course very densely populated areas of urban areas tended to vote remain. So that map in itself is at best not particularly useful. And at worst, it's actually potentially very misleading when people want to draw a conclusion from the map about the election result. So we looked at various different ways of basically manipulating these boundaries to better betray the underlying data. It's kind of like what I was saying yesterday, like this visualization is an active communication. And you want to try and convey the underlying data in a way that is understandable and accurate. And this is kind of a cancer in two. No, not cancer in two. It's sort of ironic situation where actually mapping out the raw data can actually be quite misleading. So maps have this challenge and we use various different methods. So cartograms, you might have come across before because they expand and contract polygons according to a particular variable. So cartograms, which are basically like a completely bespoke method and that map performs the best. We did a survey about how people under how people understood the maps and the hexagram performed the best. That was basically practically invented by Richard Harris at the University of Bristol and some of his colleagues, I think. The square grids and hexagonal grids, where basically each each that and the hexagram each grid is still a is still a local authority or whatever it is, but they're all regular shape. So your eye isn't drawn to particular areas on the map. And yeah, we cover in the extra material. If you go on the worksheet and you click on extras, I cover the code required to transform this map here of Manchester into the regular grid that you see on the right. And as I go, as I say in that exercise, these are not not always the best things to use. You can be misleading by using a regular grid. You can mislead people more than using the raw boundaries. It very much depends on the variation in size of the polygons you're using. Then it's still quite useful sounding to be aware of. Okay, I thought I wasn't going to talk and then I just end up talking for like 35 minutes, but hopefully that's given you like a little crash course in mapping in general, GIS, and specifically in our and probably not going to talk too much more. If they don't have any questions, first of all, whether it's GIS related stuff or about FF or R, I'll give it a minute in case people think of something. I just want to say I posted a couple of links in the chat to some research done on maps and how different maps can show you different things and change the appearance and also one that uncovered like geological features of a region based on voting patterns. It was quite a popular Twitter thread last November, I believe. Cool. Yeah, that sounds good. I haven't seen them. Excellent. In the mapping, I was quite pleased to see it. There was actually a whole segment on the BBC. I can't remember when, whenever the last election was, and they specifically spoke about the issue of, well basically this issue, which was great. I think the BBC ended up using a kind of hexagram type map to report on the election results for that specific reason. And they gave a bit of justification to it, which I thought was good. So it's like becoming a common thing to understand, I think. I think, yeah, you're right. It's definitely a laudable change that people are expecting to have to justify their choices. Yeah, yeah. Exactly. I mean, I saw, even though it's becoming popular, I think some people still fall into the trap of doing it. I saw a paper in the Lancet, which the Lancet is quite highly regarded, I think, and it was about how mobility has changed during the pandemic. And they used a map that was almost identical to this one on the left, but it was actually much smaller than that as well. There were like six maps all like this. And you just could not tell what was going on. Like the map might as well have not been there, because everything was like London was almost invisible. So obviously most people know roughly where London is, but if you weren't that familiar with it, you just wouldn't be drawing either any conclusions from the map, or you might draw completely the wrong conclusion from it. So it's quite, yeah, it's definitely worth thinking about. And yeah, the hexagon things look quite cool as well, to be honest. Yeah, I think they look better than squares. Yeah, yeah. And in this example here, the square and the hexagonal grid, like people basically drew more or less inaccurate conclusions from the map by looking at the square grid and the hexagonal grid, which is quite interesting. You really can mislead people by using these techniques, but they can be quite useful. Yeah, okay. I think that's basically it. The breaks at 11, isn't it? Yeah, we've got a bit of time. Yeah, so what I might do is just very briefly go through the first few lines of this exercise, because hopefully that will at least clarify the idea of coordinates and the idea of the CRS. I'm sorry, the map, the maths one, which I know some people might have started already, but I'll just go through it very quickly. So the data for this exercise is burglary records, which you will have on the drop box, hopefully. I'll open up an Excel just so just so everyone's seen it. So I think I did clean this a little bit, but you can very quickly see. Actually, I'll just load it into R because then people are familiar with it. So again, I will, these skills are things that we covered a little bit yesterday. R, because that's how I load in data, and I'll load the FF package, which is the spatial related stuff. So I will create an object just called burglary underscore DF again. Sam, can you move your screen onto our studio, please? Oh yeah, sorry. Yeah, you're still sharing the web. Yeah, you basically have to stop sharing and share again. Has that changed it? It does seem to have, yes. Yeah, cool. So yeah, tell me if it changes. I've loaded in readR, I've loaded in FF, and I'm creating an object called burglary underscore DF, and then I'm going to load in this burglary records CSV. So this CSV that I've given you, which I will show you once I've loaded it in, it's basically the same as what you get when you download Open Police Recorder Crime Baser in England, the Wales, from that website that we covered yesterday. I've loaded it in, I just click on it to take a look at it. And you can see we've got a crime ID for an individual crime, so each row is a specific crime. We've got the month that the crime occurs, we've got the other way that the crime occurred in, but remember this is individual crime records, so the other ways are repeated. The other ways are repeated because some crimes occur in the same other way. So this is not the outcome of the crime, so investigation complete measures but identified. It has the longitude and latitude coordinates. So, and it has a local authority. So basically, when I keep buying on about the CRF, if you download data like this, you take a look at it and you see, okay, they've given me the latitude and longitude coordinates. And what I should be thinking is, when I load this into R, I need to tell R that it's in the WGF-84 coordinate reference system. That's basically what you should be thinking. For example, you might intuitively, based on my probably not very good explanation, you might think, oh okay, this data is from the United Kingdom, and it's been released by British police forces, and therefore it will be British National Grids. You might load in this data and you might say, okay, latitude, longitude and other coordinates, and it's projected in the British National Grids. If you do that, it will be wrong. The points will be completely inaccurate. You'll probably be able to tell because it will look ridiculous. It won't look like what you expect to look like, but the point is the points won't be representing reality in any way whatsoever. So it's very important that you get that step right. I frequently get that step wrong purely by just not thinking or maybe misunderstanding the data, but I usually, I think, realise that I've done it wrong because I always conduct some checks afterwards to make sure that it's projected in the right way. Like I might overlay a satellite images of the area just to make sure that, like, actually things overlap in a way that's correct and that it has the right projection. So, I know it's six minutes, but I'll just show you the first line of code that you will do, and this is if you go, I do cover this in the example. But basically, if you have a data frame that's not spatial at all, like this, we have a data frame that contains coordinates, but it's R doesn't know it's spatial. The first thing you'll want to do is convert it from a typical data frame or table into a spatial fff object. So I'm going to create an object now called burglary underscore ff, so it's certainly different, but different. And there's a function called st as ff. So you'd say it says convert foreign object to an ff object. And the contents of this function will be not, will probably be a data frame and you just have to specify the things in that data frame that are spatial. So the first argument we can put is, okay, I think it's at, like I said yesterday, if I do question mark fff, it tells you like the contents of what I need to put in. So acts is the object to convert it to an ff. So acts equals burglary underscore df. And then the, I think, yeah, the second bit of information needs to be put in is the coordinate. So in case of my screen is in the way that it says in case of point data names or numbers of the numeric columns holding the coordinates. So you can say chords equals, and then you do use the little C when you're going to say like a list of something. And then you say x equals, and in this case, it, you've got, you've got, you've got to get the name right. So it's like latitude and longitude with capital letters. So x equals longitudes, because longitude is like the equivalent of the x axis. That's something I have to Google almost constantly is which way around longitude and latitude are, but I don't think I'm alone in that. Maybe it's just me. Yeah, so you say, okay, this is the data frame that contains spatial information. Then you state with the coordinates, the chords argument, you say these are the columns that contain the coordinates that I want you to convert to points spatial data. And then the final one you need to do, which is the which is the really important one is you say what the CRF is. You say CRS equals. And what you put in that what you will the options that you put here are, and I'm not sure what the technical name for them is, but basically each projected CRS has a unique identifier number. In order to, I think it's EPSG or something like that. Yeah. EPSG, geodexprima data, and it's basically you can just consider it to be like a library of various different coordinate reference system than each one of them have this unique ID. And when you want to refer to that CRF in R, you just need to know what the what that ID is. So we want the ID for WGF 84 because it's latitude and longitude coordinates and we need to tell R that we're dealing with coordinate data that that has the coordinate reference system WGF 84. Now, because I do this constantly, I've remembered that the CRS number for WGF 84 is, I think it's four three two six. I'm probably going to get it wrong now, but I think it's four three two six. We can actually check if I do if I Google EPSG WGF 84 and go on special reference dog. So it just gives you a summary of what this coordinate reference system is, and you can see at the top it has four three two six. The British National Grid, which is a projected CRF has its own code as well that we'll deal with in a minute, but for now you want to make sure that you're telling R that you're dealing with WGF 84 with the CRS four three two six. So now when I run that creates a new object, burglary FF, and importantly what you'll notice is it's absolutely identical to it in terms of the content. If I click on it, it still has all the same information. It has the month, the other code, the crime type, the outcome, but instead of having the latitude longitude coordinates, it's basically converted them into a new column called geometry, which is completely identical. It's just the latitude longitude coordinates from the original data frame, but now I know that it's spatial, and it knows to treat those coordinates with the WGF 84 CRS. And the final step I will now do is hopefully to encourage you I will show you how straightforward it straightforward it is to them plot these points on the map, because I can just use the same structure of code as we did before. So GD plot burglary equals FF. And just the way I happen to write GD plot code is I'll then I just specify the data and then I do the plus sign, and then I add a geometry, but rather than, rather than like points or bars or whatever it might be the geometry for FF is geom underscore FF, because it's a simple features object. I make sure I have FF installed. I write geom underscore SF. So all I've done is specify the data and I say that the geometry is SF. I couldn't find function GD plot. Okay. I haven't loaded the GD plot package. So I will do that now. GD plots in. And then it will fingers crossed work. There we go. On the right hand side, you should see all these points. And just as an intuitive check, although I probably would do further check, at least I know that that is basically the shape of Manchester. And if I'd specified the CRS of the British National Grids, which is 27700, it would, it would look, it would be completely wrong. I'm going to see what that looks like. Yeah, we can look for a look. So I'm going to change the CRS to 27700, which is basically I'm lying to R and I'm telling R that even though it's long as latitude longitude that actually I'm telling it is British National Grid. So I run that again, and then I will try doing it. And it looks completely without like it's squished, and you might intuitively be like, okay, I know this is wrong. It's so wrong that for some reason GD plot hasn't even managed to plot plot an X and a Y axis. I'm not sure why that is. It's probably because these coordinates don't even exist in. Yeah. And you can tell it's wrong. And if, for example, in QJS or in R, I overlaid a satellite image on top of this, it probably wouldn't have any information or I'd be in the middle of the sea or something. Very frequently I do this and I'm like in the middle of the sea. And then I know that it's wrong. So always a sensible check. Am I in the sea? Yeah, exactly.