 Mae'n amser. I'm a research fellow at the University of Leeds doing criminology, geography, related type research. Mainly sometimes in collaboration or working with police force data. So that's a lot of what we're going to be working with today. Police data is not perfect, but it has a lot of interesting advantages for teaching because it's often spatial, like they often have coordinate data about where things happen, and it's also temporal in a sense you can look at things over time. So I spent a lot of my time, probably too much time dealing with data like that. But that's kind of what we'll be focusing on today, but a lot of the skills, the visualisation and mapping skills that we'll be looking at, they are basically generic, which sounds like an insult to what we're doing. But actually, I do think generic skills are like an excellent thing and you can more or less apply them to anything. But yeah, you can you can any questions you have throughout the day or tomorrow or after the workshop, you can email me and there's also content details on my website or if you're nosy about my background or something, you can you can look at that and all the material today was OK. So some of the material is on the drop box. So you should have had access to the drop box yesterday and there's now there's day two and day three. So on the drop box, hopefully you can see me cycling through the drop box webpage when you click on the data visualisation as a data folder, which is the data that we're going to be using today. And then there's a PDF of the slides that I will be going through and the same is true for tomorrow for the mapping. So if you want to look at the slides on your own laptop, that's actually fine or you can skip ahead if you want to. The actual worksheets for now, I put them online. So that is a web link here. So the short and web link is our pubs.com slash Lantern underscore. And of course, if you get the slides from the drop box, you can just click on the link easily when you go on that link. There are three separate worksheets that I've created for today. So yeah, number one worksheet graphics. It's the idea is that we go through that today if you want to skip ahead. That's completely fine because I know people are different, different experiences. But the idea is that we do the graphics worksheet today, the match one number two tomorrow and then the extras probably later on tomorrow once you familiarise yourself with the graphics on the match. But the purpose of the giving you these now is that you have time to go through them in your own time, whether it's to go back in time to check something or to skip ahead if you're feeling super confident. I can't see the link at the bottom because of Zoom, but there should be a link to GitHub there where it says material. So whether you use GitHub or not, I don't know some of you might do. If you're interested in downloading the material on GitHub that way, you're more than welcome completely up to you wherever you're comfortable with. But just with the drop box and the our pop-up link, you should have everything for today. OK, so structure of today is more or less what it was yesterday. I think I am just going to talk and give a presentation for the next 45 minutes or so. I will try and not make it exclusively slides because I don't want to completely exhaust people. I will do a little bit of presenting on the slides and then do a little bit of stuff in our studio. Then we have a break at 11 and then there's the practical, which is hopefully the idea in theory is that you go to that our pub link and you work through that graphics worksheet. If I've taken it, you'll see there's a content page focusing on GG Plot 2, which is the main, basically, what we're going to be doing for two days, various different examples of what we're doing and then there's some extra resources at the end. It's very long. It looks like horribly long and you might be quite worried about it, but there's a lot of obviously there's a lot of visuals. I reckon half of it, if not more, is just visuals. So you can work your way through that. And periodically, I will probably answer questions and also just if I randomly think of something that is hopefully interesting for you, then I will button and just begin talking about it. And yes, something like lunch break. And then we've got the like guest sort of presentation to finish off. Does anyone have any questions on that? I've already seen some chat related stuff. OK. That's not for me. That's good. OK. So data visualisation in general. So I think we all know intuitively what data visualisation is, but I think it's quite useful to try and like step back and have a think about it before we even begin doing any art. So the guy called Andy Kirk, who's like data based expert and all round motivating a nice person, he describes it as a visual representation and presentation of data to facilitate understanding. So that probably aligns with what most people think intuitively. And I think to borrow words of the guy called John Byrne Murdoch, who I will refer to you constantly today because he's like very just talented data visualisation person for the FT. And he uses art for almost everything, I think. I mean, he's described it like data visualisation is an act of communication. It's just it's basically a method for conveying information. So that information might be like some results that you've come up with from very complex analysis and you basically you want to convey that information to people in a way that is understandable. So they understand the data that's underlying visualisation. They enjoy the visualisation like it's not horrible to look at. It's like it conveys the information, but it's also enjoyable to look at. And then also importantly, and this is something I wouldn't think of, but John Byrne Murdoch has mentioned is that ideally visualisation should be quite memorable as well. So you're representing information in a way that people understand, they enjoy and they're actually going to remember and retain for the future. So, you know, so if you're an academic, you want people to be able to talk about your research in the future. So if you can give them a visualisation that they understand and they remember, they're more likely going to be able to recall that and talk about and share with other people. So data visualisation in art this is just a random selection of visualisations that the BBC use for the BBC. A lot of their visualisations actually use art and actually have a lot of teaching material for art as well, which I can share with people. But, you know, these are just examples of communicating information to people. So the top left thing is obviously quite relevant for today because we're talking about crime, so that's homophobic hate crimes, I think in the United Kingdom. They've got information about the victims and suspects, the age group, the time span that we're talking about. So there's this underlying data probably from police recorded data or maybe from a survey or court victims, very complex data that involve lots of filtering and selecting and joining just like you did yesterday. And the person has basically said, how can we convey this information in a way that people will actually understand and enjoy? We've also got some political examples. So I think that's the German Parliament, maybe. I know Sweden Democrats say like, yeah, election results. And of course, rather than just doing it in a bar plot or something, we've actually come up with this kind of figurative array of sitting in the parliament, which is quite a cool work showing it. You've got the temporal temporal aspects of data in the bottom left about brexit. So they've kind of made it look like a calendar, obviously quite an intuitive way to think about time for most people because people are used to looking at calendars. And then the bottom right, of course, we have maps which are widely, widely used in political context and in sociology and in criminology and pretty much any field you can think of, but also I would say relatively straightforward to make in our, I'd hopefully you'll see tomorrow, but not an easy way, not an easy way to visualize data in general. You know, there's lots and lots of challenges with visualizing geographic data that you don't have with normal data, but we'll discuss that a bit later. But the point is, OK, you see these visuals all the time and every single time you're looking at this visual visualization and consuming the information, you're hopefully having this reaction of understanding the underlying data, enjoying it and hopefully remembering it for the future if it's a good visualization mode. So I just wanted to quickly go through like some examples of this visualization that I think is quite good and or influential visualizations, both crime and non-crime related before we even begin to look at our related stuff. See, people might actually know this already. So this is by the John Byrne Murdoch guy. And he has he's become even more prominent in the past year because he has been using a lot of skills in art, including GD plot, to visualize and track the spread of the COVID-19 pandemic. And these visuals are very, very powerful and they've almost been influential enough that internationally they're very influential. And yeah, I think it's just a fantastic example of a train of conveying a lot of information in a very sort of concise and powerful way. And you might see if you go on, if you Google something like COVID-19 cracker, hopefully you can still see my screen. Like this web page is now free. Ft is behind a paywall, but this web page is free. And all this information is there's a huge amount of data handling that's gone behind this information, gone behind these visuals, sorry, but he's using tools like GD plots to convey that information in a way that is very enjoyable for people. Even if it's quite a morbid topic, it's very enjoyable, a bit weird, but it's very, very powerful and very widely circulated internationally. And these skills are exactly the skills that we'll learn today, if that's motivating for people, even though it's a morbid topic. Yeah, mapping is another great one. This again is a skill we're going to learn tomorrow. So of course we're going to learn how to make maps, but this is what's called a bivariate scale map, I think, where basically you're mapping two variables at once. So on the if you look on the bottom left, you've got greater health deprivation and zoom is ruining my enjoyment of this visual because I can't see it, but I think it's something like the risk in the population of Covid. Again, huge amounts of information in this. You've got information about health, information about age, deprivation, lots of different stuff underlying it and also two different variables on one map of Greater Manchester. And that's also quite a powerful way to visualise stuff. Same guy, Colin Angus, has been there. This is basically tracking the temporal relationship and temporal spread of Covid throughout the UK. I think these are local authorities. Yeah, I mean, it speaks for itself. I guess you can you can I'm not 100% sure how that ordered. I guess it's just ordered of first contacts with Covid or something like that. But this is the kind of thing. I mean, as soon as you look at that visualisation, you are seeing a pattern. You know, within within two or three seconds of looking at the visualisation, you see that there's an underlying pattern to it. And you can consume and understand a huge amount of information very, very quickly when information is portrayed like this much more than if it's in a table, which is like blasphemy to a lot of visualisation people. Interactive stuff in ours is also quite good. We're not going to cover interactive stuff today, but I'm happy to tomorrow if people are interested in it. So this visual was done by Trafford Data Lab, which is kind of like the data science bit of Trafford Council. It's quite a heavy web page. So hopefully this is not going to crash my zoom and hopefully you can see that. Basically, if interactive dashboard of stuff like deprivation, various different dimensions of deprivation, like income, employment, all stuff like that crime, including crime. And every time we click on this kind of thing, it's probably like querying an API or something like that. And it updates interactive map. We've got a bar chart there. We're going to look at deprivation data great match to today, actually. But again, just a great example of using data visualisation to increase. I would say like public, it's basically local accountability to local government, I would say. Accountability to statistics. You know, it's like democratising data, this kind of dashboard to me that the public can look at open data in an accessible way about the local area. So very important stuff. I will periodically very self-indulgent to use examples for my own work. That's why I didn't introduce myself too much at the beginning, because otherwise I just talk about myself constantly. But I did this as an example of different mapping techniques. So this was the reason why this was quite odd is because it was designed to be a poster. So I think this is like 60 centimetres high and like 40 centimetres wide or something. But yes, I did I did this to basically show how you can visualise neighborhood deprivation in England in various different ways. So if you click on this link at the bottom in the PDF, you go to the more high detail version of it. But the idea being when you map things like deprivation often areas that are highly deprived, the neighbourhoods are geographically quite small because deprived neighbourhoods tend to be very densely populated and the way in which we design neighbourhoods in England means that very wealthy. Like if you look at a map of wealth or something in England, geographically on the map, everything looks like England is very wealthy, but that's just because rural regions are very large and rural regions tends are often wealthier regions. So mapping can be very misleading in that respect. But there are various different ways of visualising it using like hexagrams type maps or grid type maps or dawling maps if people have heard of these. Again, we do deal with stuff like that tomorrow. But that's just an example of like, OK, when you're mapping stuff out, you don't have to use the raw underlying data. You can do it in various different cool ways. A visualisation doesn't even have to be to investigate a particularly serious social science topic. So this is a very, I would say it reminds me of the shape of Wales for some reason, but this is a visualisation that basically I think it's called the random pie walk. So I'm not a mathematician, so I'm not going to go into it in too much detail, but obviously the number pie is a number to infinite numbers of decimal places. And basically this visualisation person has generated basically a scheme where I think like zero was at 12 o'clock. One was at zero going all the round to nine goes round a clock like this. So I don't know what the number is, but if it starts with like 0.3 something, then the first walk goes like this. And then basically you end up walking a random direction, not a random direction, but you end up walking in the direction of whatever the number of pie is. And you end up with this quite incredible visualisation of like if I'd walked this route in real space and tracked where I'd gone, this is what you'd end up like. So of course it's kind of like a trivial topic, but for someone who might be want to be interested in maths or something like that, you're conveying a number to millions of decimal places with one visualisation in a way that is basically artwork to me. This is like bridging the barrier between maths and arts. So it's a very, very cool thing, very, very cool thing to do. And also I'm not in essence sure that Nadine, the person who did this graphic, I think she did it in GD plot as well, because that kind of thing, you have to do computationally, you're not going to use like paint or something in order to draw something like this. You have to do it computationally. Crime data specifically is becoming I think because in criminology and social science in general, people are becoming much more familiar with coding or with software in general. So I think that stuff like GD plot and visualisation in general is becoming increasingly used and it shows as well. You know, when you go to, when you watch presentations like either by police forces or governments, but also academic conferences, like to me, the most powerful presentations by criminology or crime science researchers are the visual ones and they're the ones that keep people engaged. So this was a paper by Matt Ashby, who's at UCL. I think this paper, he was looking at how whether might predict crime victimisation in the United States. So there's got five different cities down. You can see on the right hand side. And I think the red estimate is. I think outdoor crime and the blue one is indoor crime. I think it's something like that. But the point is he's conducted some very complex regression analysis. Lots of data handling has gone behind this. Lots of different statistical techniques have gone behind this, but he's conveyed a huge amount of information that could have been a horrible, horrible table that either people wouldn't have understood the table or if they'd seen the table, they would have been instantly turned off by it. But he's portrayed these estimates in a very accessible way and in a way that people enjoy and in a way that people will probably remember as well. Just a small, just that small bridge from the table to the visualisation can have quite a big impact. This is, I guess, a less colourful visualisation, but I think it equally powerful one by Rika, who's a lecturer at the University of Manchester. So she did some research using Twitter data. Basically looking, I think it's basically looking at so Greater Manchester Police. Specifically, they often tweets when there is a missing person and it was basically looking at to what extent people actually share these tweets that Greater Manchester Police put out. Because Greater Manchester Police put out the tweet and sometimes they put a photograph and sometimes they don't of the person that's missing. And she's basically interested in what does having the photograph impact whether people share the tweet and does actually the content of the photograph affect whether people share the tweet. Again, a lot of you have to query the Twitter API or scrape data from Twitter somehow. You have to clean it all, you have to handle it, you have to analyse it. A lot of work has gone behind that, but then the results are like basically you do justice to all that work by betraying the results in this way. A lot of information is behind this graphic and you get it, you get it across to people in a powerful way. So I think that's another good example, but just not colourful, but it shows that black and white visions can also be quite powerful, I think. This one is more colourful, which is good. So this goes back to the mapping type thing. Again, this visual will have used ggplot, the skill we're going to learn today and tomorrow. This is basically a number of crime incidences in downtown San Francisco, I think. So he's embedded a satellite image from Google Maps as sort of the first layer of the graphic and then we've got these, I guess, they're grid cells or raster cells or something like that to betray crime hotspots. Another example of crime mapping for this is Ian, who is, I think, an analyst at Essex, please. I think he might correct me on that, but I'm pretty sure he is. And yeah, he's a big user and he's basically correlated a lot of different data to create interactive dashboards that the please actually use for deploying resources around places like Coulchester. So again, it's kind of similar to the Trafford data lab map because you can tick things off and zoom in and look at various different things. But of course, this is quite useful when you want to identify like spike locations, like if a new pub opens or a new shop opens or a gas station opens or closes to that matter and you want to identify the trends for the past couple of weeks to not to try and necessarily predict what's going to happen, but just try and identify areas that are currently problematic and they I guess may be problematic in the future as well. Yeah, it's a very cool practical example of it. You know, it's not just an academic thing. Just talking about myself again. This is a paper that I did in the past few months basically about the impact of the COVID-19 pandemic on crime. And the reason why I put this one up specifically is because a very important aspect of data visualisation is communicating uncertainty. So whether you're doing statistics that, of course, most of the time, estimate things or summarise things rather than necessarily giving you all the information or whether you're making predictions about something like we did here, you have to ideally try and communicate that uncertainty in the visual. You don't when you people look, you want to convey a message and you want to communicate to people and you want you want it to be powerful, but there's a very fine line between presenting something like an advert to try and persuade people and then potentially misleading people by not showing the uncertainty in what you're what you're betraying. Like a good example of this I've seen is everyone always talks about the R number during the COVID-19 pandemic. And for example, lots of headlines will say, OK, the R number is currently not point nine. Although I said the R number is one point four or something like that. And what they very, very rarely say or they very rarely show in or well, not rarely show, they don't always show in visuals or in text is that when people say the R number is not point nine, there's always uncertainty around that number. So I don't know if people know David Spiegelhalter. I think he's an academic at Cambridge, but he's like sort of superstar statistician person and he's written quite a few books on it. He's he in particular is always very keen to point out that when people say a particular number or show a particular statistic, there's always uncertainty or there's generally uncertainty around it. So that's something about what I think of in your visuals. Like here we have like the confidence interval around what we expected to have been during the pandemic. Many people for the sake of simplicity might not have included that shaded gray area around the expected value, even though I think it's very important to communicate the fact that these things aren't certain. So it's, you know, data visualisation is a science to some respects, but you also have to just consider how people actually consume information as well and how people might interpret it good and bad. So, yeah, as I've been blodding on about the common thread of all these things is GD Plot 2. So GD Plot 2 is a package in R because you've already used it yesterday. So you must be reasonably familiar with the name, at least maybe. It's part of the tiny verse, which again you used yesterday. So there's a few reasons why that's good. I mean, the key thing, the key reason why GD Plot 2 is useful in terms of using art in general is that it's compatible with the tiny verse. A lot of the skills and the kind of intuition of what you use yesterday, it runs through to GD Plot 2 as well, because they're literally designed to be designed to work together. Another reason why GD Plot 2 is cool in a kind of not necessarily in an art specific way is that the reason the GG and the GG plot stands for the grammar of graphics. So whilst we're in this workshop, we are interested in learning the code behind GD Plot 2, OK, because you have data that you want to use and you want to be able to visualise it. But actually, data visualisation is a whole field in itself. There's a lot of research gone into how people consume charts, what chart designs are most appropriate for certain things, what colour palettes are most appropriate for certain things or certain people because a lot of people have colour deficiencies in terms of colour blindness and things like that. What charts can be misleading? What charts can be inaccurate, completely unintentionally? And a lot of this research goes into informing the grammar of graphics. The grammar of graphics basically being a framework for creating data visualisations. So just by using GD Plot 2 and just by following the framework and the code that the package gives you, you're already kind of in a good way of narrowing yourself down to the grammar of graphics and you're narrowing yourself down to creating visuals that will probably be appropriate for what you're trying to do. It doesn't mean you can just do whatever you want in GD Plot 2 when you'll create a fantastic visual, but it does mean that it does give you a structure for creating visualisations that are consistent with the grammar of graphics and consistent with a lot of the research that goes on on data visualisation. So it's kind of like I don't want to say it cuts the corner because there's still a lot to think about, but it definitely does structure your thinking around creating visualisations and it gives you, it gives you a good framework to start from, which is the grammar of graphics. And you can read up on the grammar of graphics. I would recommend it because there's just one one paper. I can't remember when it came out by Hadley Wicham who created GD Plot and I would recommend just like just briefly going through that paper, just to familiarise yourself with what the grammar graphics is, because it's just quite useful to know about. And I think I know. Yeah, I also put the grammar graphics a little bit more. I will just check. Does anyone have any questions about what I've spoken about so far? I don't know if I haven't really spoken about R too much. OK, yeah, Andy's Andy's put some links for what Ian being ethics please. And I'm from Juniors Galaxy A14. And yeah, so I'd recommend looking at either some some really interesting stuff and that there are other people out there and in police forces in the UK that use are for this kind of thing. And you know, a lot of public sector bodies are like notoriously behind in IT related stuff, but I think a lot of people in the police are actually quite forward thinking about it. So that's definitely something worth looking into. I always love looking at practical examples of what people are doing, because it's just inspiring in general. It's either motivating or it gives you ideas for your own work as well. OK, so grammar graphics. I haven't taught 11, right? Yeah, OK, so grammar graphics. Yeah, it's OK. It's a package for current graphics and are based on the grammar graphic. OK, I've actually already put a link there. That's probably to the paper. Yeah, so that's the paper, the Honeywicken paper. Journal of Computational Graphical Statistics 2010. So there's a preprint of it there. I do recommend you take a look at that. It's quite interesting. Yeah, so a fundamental component of the grammar graphics is that graphics are made up of layers and this idea at first might not be that intuitive. So if you're used to, for example, creating graphics in Excel or an SPSS, the idea of layers might not necessarily be intuitive, but I think you kind of have to force yourself to get into the mind of using layers and I think hopefully relatively quickly, you'll realise it's actually quite it can be quite an intuitive way to think about graphics. And it's a very useful way of structuring your thinking about creating the graphic. So I will go through these in turn, but basically the first layer of the graphic is the data. So you have a data frame of rows and columns, and that's what you want to. That's what you're going to try and visualize. Then you have the aesthetics, which are basically the variables of what you want to what you want to visualize. So on a basic level, that might just be the X and the Y. So if it's a scatter plot, for example, if the X and the Y and then the geometries are basically the shapes that you're going to use to convey that information. There are more layers to this thing, but these are the fundamental three that we're going to focus on today. And if that seems sort of abstract to you now, hopefully it will become slightly clear when we begin to go through the code. So I'm going to give you an example of using ggplot just in the just within the slides. But I will before 11, I promise you, I will actually open our studio and show you how to do this. But I will just go through this example quickly within the slides. So here we have a data frame, very, very simple data frame. We've got three variables called var 1, var 2, var 3 and we've got five rows of information. So var 1, var 2 are just like numbers. So that could be, I don't know, age and income or something. And then var 3 is what you might call a categorical variable or what's sometimes called the factor within R of like a, a, a, a. So that could be gender or something like that. So you might want to say, OK, what's the relationship between variable and variable 2? We go back to our layers. So the first layer is kind of already done. We know that it's df1 and in R, in order to specify that first layer, the function is ggplot. And all we do is we say ggplot and we say data equals df1. Df1 being the name of the object in the R environment. As you've done yesterday, you know, when you load it in data, you assigned it to an object called like my data or whatever. So in this scenario, df1 is that data frame of three columns and five or six rows. When you do that, you just get a completely blank space. But then slowly, and this will make more sense when I go into our studio slowly, you basically build up the layer. So you've already laid out the data layer. Then you want to move on to the aesthetics, which are the variables. In this case, it's var 1 and var 2. So you add to the ggplot function. You lay down the first layer and you say data equals df1. And then what's called you map the variable 1 and variable 2 to the aesthetics. And the mapping concept, it might be slightly odd, but basically you have these various different aesthetics. For now, let's just say the X and the Y. The mapping you can just think of is basically like pulling out the variable you want and sticking it on the X or the Y. Axis. So the mapping is the aesthetics. And here we just say, OK, the mapping equals the aesthetics X. We want to be variable 1 and Y we want to be variable 2. And when you execute that code, you still haven't actually done anything yet. But basically this basic graphic will appear in the plot window. And you can see it's grabbed variable 1. It's mapped variable 1 from df1 and it's stuck on the X axis. And then it's grabbed variable 2, i map mapped variable 2 from df1 and it's stuck on the Y axis. And all it's done is basically plotted the extent of those two variables. So like the maps of our one is 12. So it's automatically made the maximum of the X axis roughly 12. It's probably actually 30 or something like that. And then it goes into around 17 on the Y axis. But again, we haven't actually done anything yet. So we then have to think about the final layer, which is the geometry. So we know that our visual is going to have this X and Y axis, but then what shapes are we going to use in order to betray the relationship between variable 1 and variable 2? And I think intuitively, many people would say a scatter plot is probably what you're going to use. And in Gigi plot, a scatter plot is called, we refer to it as a point basically. So Gigi plot has loads of different geometries and they're always a geom underscore and then something and a scatter plot is geom underscore point. So when we add that, it then it's already got the mappings laid out of the X and the Y. And then you say, OK, the shape I want to represent the relationship between variable 1 and variable 2 is the geometry of point. And then, of course, you can see that it's fixed the point on. We'll go through this with the prime data later on. But that's the general idea about what you do. What becomes more interesting and kind of more exciting is when you look at aesthetic. So you can add if we've dealt with an X and a Y aesthetic, but you can also add stuff beyond that. So if you wanted to look at, for example, how variable 3 factored into this relationship, you can you can use various different additional aesthetics. We've already used an X and a Y, but you can also colour things according to variables. You can fill in the colour of things according to variables. You can shape them. You can find them. You can alpha them if that's a word, but alpha basically being making things transparent or you can change, for example, like the line type. So it'd be like a solid line or a dotted line whenever it might be. There are loads of different aesthetics. This is just some examples of them. So if you go back to that data frame, you think, OK, how would we represent the variable 1 in the scatter plot? And you look at these aesthetics and I would probably say, OK, well, we could colour each point according to the variable 3. And then I say simply, it might not be simple at first, but you have the mapping of X and the Y and then you just add this additional aesthetic of colour and you map it to variable 3. So again, you pull out the information for variable 3 and you stick it into the colour aesthetic and then you colour the points according to a particular variable. And that basic concept, that's not basic concept, but that concept is basically applicable to any type of data possible. And the more you go through these exercises and the more that you use very different, very different examples of aesthetics and different types of variables, the more you will begin to think in this kind of like layered way of adding the data, mapping the variables that you're actually interested in to particular aesthetics, the aesthetics being the X, the Y and other stuff and then choosing what geometry, what shape you actually want to represent this data with. And so with that being said, you can add other things. So, for example, here I used, I don't know why I'm pointing on my screen, you can't see, I used the colour aesthetic, but you can also use shape. So if you just see the difference to these two screens at the top, all I've done is change colour to shape, which basically means rather than colouring in the points by variable three, I want to shape the points by variable three, and by default, it chooses a circle and a triangle if you've only got two different categories. It updates the legend for you and it said, OK, all AA's will be circles and all BB's will be triangles. And the great thing about this and the great thing about using that grammar graphics on the layers is that there are loads of different geometries, so a geom point we've dealt with, but there's geom bar for bar chart, there's geom density for basically a smooth histogram, geom smooth, which is basically drawing a line of best fit, but there's dozens and dozens of different geometries and the structure is the same. Like all you have to do is get your head around that example that I just gave about mapping variables to different aesthetics and then choosing your geometry. I think all you have to do, this might be, this will be challenging for some people, but you will get it eventually. All you have to do is get your head around that idea. And basically the same structure is applied to basically every single visualisation in ggplot 2. So this example here, these are just random examples. We've got a density plot, we've got a bar plot, but a bar plot that's been filled in by a variable as well. We have a scatter plot, which if you remember is geom point, and then we have a map as well. All these visualisations were created basically with the same code. Like we're talking like a matter of one word, one or two words being changed between these bits of code and they create drastically different visualisations, which is why ggplot 2 is so useful and why it's worth investing time in understanding how to use it even in a sort of rudimentary way. Because just by altering a few lines of code, you can create a huge diverse array of visualisations and these visualisations will be consistent with the grammar of graphics, they'll be consistent with an established like respected framework for creating visualisations. So hopefully at the very least that persuades you that it's a cool and very useful thing to use. So what I will do is I will just give you a very quick example in 10 minutes of applying those skills to the data that you're going to be working with. So if I go on the. If I go, actually, I won't show the worksheet because you're going to go to the worksheet already, but this is all completely transferable to what you're about to do. So what I'm going to do is I'm going to load in a CSV file, the CSV file, you know, like an exos spreadsheet, the CSV file you have already you have access to is on the drop box and you will be doing this in a minute. But I will just I will load in the data and I will basically apply the same idea of data aesthetics geometry but to the to the real life data set. So I will first just load the packages that I know will be needed and to try and make it clearer, I will just comment. I might zoom in a bit, actually. Hopefully let me know if you can, if anyone can't see the font, please let me know. I'll just begin by loading in the packages. So I know that I'm going to use a package called read R, which is you might have used yesterday is in the tidy verse and it's just used for loading in loading in data, basically. And now I'm going to use read R and I know I'm going to use ggplot2. So the first thing I will do is just load in those packages. You might have to install them if you haven't already, but you probably do have them. Then I'm going to load the data. So I'm going to find the data to an object called bergrit underscore DF, which is that this is just my way of naming objects. I tend to put underscore and then whatever the type of object it is, because I know it's going to be a data frame. I'm like bergrit underscore dia, because that's the data that you're going to use. And I will use read underscore CSV in order to get the in order to pull the actual data. And I think the CSV that you're going to use first is called GMP underscore 17 CSV. So you see it's loaded in. I will then view this data just to show you what's actually in it. So what it is, it's basically LSOA level, LSOA being a neighbourhood unit in England and Wales. So these are basically neighbourhoods in Greater Manchester. And so for each LSOA, we have a burger account. And I think it's the burger account for the whole of 2017. We have the local authority that that LSOA is in. We have the IMD score, IMD being index of multiple deprivation. So that's basically how deprived the LSOA is. We have its rank in the whole of England. We have the death file of how deprived it is. And then we have the income score. The income score basically being a measure of income deprivation. So I'm pretty sure that it's counterintuitively the higher the score, the higher the income deprivation, the more deprived it is. So, for example, in this data set and you're going to do this a bit later. The one of the first things I would be interested in, if I was to look at this data, just as an example, is the relationship between income deprivation and burglary. So we have what? It's 1,600 neighbourhood units in Greater Manchester. And we had the burger account for each one of these things and we have the income score. So, basically, I would like to know what is the relationship between income deprivation and burglary? So in other words, are areas that have higher income deprivation more likely to have high burglary accounts? Are they more likely to suffer from burglary victimisation? Or, I think it could be argued both ways, are areas that have less income deprivation, so wealthier, are they more likely to be burglarised? Cos, perhaps, it's more attractive targets if you have a big house and lots of fancy things. So, to look at the relationship between those two variables, I think you could describe them as continuous variables. You've got a burglary count, which is like 1 to 236, and then the income score, which I think is a percentage, but basically it's clearly continuous. There's lots and lots of different values to it. We want that. This is where we want a scatterplot again. So you can already begin thinking. You know what the data is because it's burglary underscore df. That's going to be the first layer that you input into your ggplot code. Then you think, OK, the next layer is aesthetics. So remember, with aesthetics, you're going to be thinking, number one, what variables am I interested in? And number two, what specific aesthetics, i.e. x, y, colour, fill, whatever it is, are we going to be mapping those variables too? And in this case, because we've decided to do a scatterplot, we know that we're just going to be using x and y. So the data is going to be burglary df. The x, intuitively, will be the income score and then the y will be the burglary count, just because typically people put the dependent variable on the y axis. We probably know intuitively beforehand that those are going to be our aesthetics and because we kind of cheated a little bit already, we know that the geometry for a scatterplot is geon point. So you can go through this again. So we can just say, OK, like, I don't know, scatterplot one and as per the slides and the framework for a creating ggplot code, you write the function ggplot and you say data equals burglary underscore df. So just as we did before, if I now run this, it does what it did in the slideshow for that mini, mini data frame. It doesn't do anything, but we have laid down the first layer. I like to think it's like laying out. If you when you say when you say ggplot data equals something, you're basically laying out the canvas for what you're about to do because like the plot window has been activated, like it's gone gray and it's basically ready to receive whatever information you want to give it, you want to give it. So the next stage we've laid out our data, then we want the aesthetics. So just as we did before, you say mapping equals aesthetics and then the aesthetics we're going to be using are x and y. So x equals, which was in control, y equals burglary count. And sort of if you look because when you use that function, if you look on the bottom left, it tells you what variables were, what variables were loaded in. That's how I knew what they were. I haven't got that good a memory. So if I now run this, you're executing the first two layers because you've got the data and the aesthetics. So if I now run this, it's done as we expected based on the little example I gave, it's laid down the x and the y. So it's pulled out the x, which was income score, this variable here on the right hand side. It's pulled that out and it's mapped all those variable, all those values of income score onto the x axis. And then it's pulled out all the values of burglary count and it's mapped it onto the y axis. But currently we're still lacking the third and final layer, which is the geometry. So at this point, you add this little plus to the end. And the plus is basically how you add new layers to things. So after the break, I'll maybe do, unless you're sick of me talking, I will do another example where I basically add lots and lots of different layers to the ggplot. And you'll notice each time I do that, I put a plus sign at the end, which hopefully is quite intuitive because you're adding something new to the plot. So I've laid down the data, I've laid down the aesthetics. Then all I need to say is g on points. And if I write g on and don't do anything, it will give you all the different geometries that are available. So some of these you'll be familiar with, like gion boxplot is a boxplot, as you can imagine. If I keep going down gion density, we dealt with look at distributions, error bars for displaying uncertainty. We've got histograms. We have just a standard line graph. Lens on those are different stuff, but the one I want is gion point. If I go on gion point, you can see it says the point gion is used to create scatter plots. So if I just click that, it fills it in for me, and then I can run this whole bit of code. And to that, it's basically done exactly what we did in the presentation with var 1 and var 2. It's pulled out the acts of income score. It's pulled out the wire burglary count, and we're betraying that information using the geometry of a point.