 My name is Dr. Julia Kazmaier. Please ask me any questions that you have at any time. Oh, it's misspelled my name, of course it has. Okay, so first I wanna talk a little bit about what crime data we are using today. As I mentioned before, the British Crime Survey, it was at this time called the British Crime Survey 2007 and 2008 since the Crime Survey for England and Wales, but the unrestricted access teaching data set available from the UK data service. I'm also using criminal justice statistics and sexual offenses in England and Wales. For also, I got this data set through the UK data service as well. I'm using a subset of this data set. And I'll also be using some UK population estimates that I got through the Office for National Statistics, again, I'll be using only a subset of that. And I will be using some national statistics that I accessed through the police and border forces, seizure of drugs in England and Wales. They put out data tables every year about different kinds of drug seizures. And I'll be using some of those statistics, again, just a subset, rather than use the entire data sets I've used subsets. And I have manipulated them in a couple of different ways to make them applicable to the questions that we're looking at today. All of these data sets are fully available online, no restrictions, you can just go to these sites and get it, the full references, including the links to the websites where you can get the full data sets for these things are available in the folder for day one. But now the real question, what is the tidyverse? It is, it describes itself as an opinionated collection of our data science packages for data exploration, manipulation and visualization. And it's key features are that it uses a structure called a tibble. It uses tidyR and dplyr for data manipulation and ggplot2 for visualization. We will get into all of this, so don't worry if that sounds like a bunch of gobbledygook. Let's start off with the grammar. What is a tidyverse grammar? It's a set of functions that accept and return tibbles, tibbles being the structure used within the tidyverse. You can string functions together with what is known as a pipe. It is written percentage sign greater than percentage sign. You can, rather than type that out, I mean, it's only three keys, but you do need to hit the shift each time. So there's a shortcut, which is control, shift and M. So that'll pop you down a pipe right away, make it a little bit easier. The goal of tidyverse grammar is readability and order. And I'll show you an example of what that means. So let's compare the following. A non-tidyverse set of functions looks like this. That looks really terrible, but it is essentially the same as this, which is to say you take a data frame called my data frame and you pass it to a function called group by, which groups it by user ID, date, category ID, and it passes that to summarize count. So you can see, looking at these two things, they accomplish exactly the same, but it's much easier to read the tidyverse version. At least I think it is. I assume you also think it is. Tell me if you don't. So let's look a little bit at tidyverse structure in function. Tidy is about shape, which describes the structure or organization of a data set. It is not about the quality of that data set. Any given data can be tidy or not. And again, that is not about the quality of the data. It is about the structure, the shape, the function, the organization of that data. Any given data can be made tidy or not by restructuring that data. And tidyverse functions work best when data is in tidy tibbles. So what exactly are we talking about with these tibbles? But we'll come to that. Data may, oh, that's a grammatical error. Data may be, wait, well, let me start over. Data may need restructuring before effectively using tidyverse functions on it. So because tidyverse functions work best when data is in tidy tibbles, if your data is not in a tidy tibble, it will need a little bit of restructuring before you can expect the functions to work well. Well, that one missing word really threw me for a loop. Let's talk about what shape tidy data is in. Each observation has its own row. Each variable has its own column. And each value has its own cell. And that might seem obvious, of course, if we're talking about, you know, grids and tables. Surely that all data should be tidy, right? Well, let's look a little bit critically. Here we have a small, incredibly small table, possibly too small to be worth talking about, but this is just an illustrative example. Does each observation have its own row? Again, with a typo, I really should have gone over this once more. Does each variable have its own column? And does each value have its own cell? Feel free to write me in the chat, or you can turn on your microphone if you're happy to for your voice to be recorded. And tell me, is this tidy or not? And if it is not, which of these features of tidy data it does it not meet? Give people just a moment to think about that, maybe to write in answers or suggestions. I will, of course, tell you the answer so you can just wait, but it is good to challenge yourself to come up with the answer before I give it to you. As before, the deafening silence is ringing out in the chat. So I suppose I'll just, ah, yes, we have a suggestion. The date, yes. If, I mean, in this particular instance, these are all the same date, but you're right. If we split it into a column for day, a column for month and a column for year, that would be useful if we wanted to group things by day, a month or a year, for example. Yeah, so you have picked up on the date. There is another issue with this data that I will show you now. And it is that the each row was not its own observation. If you look back on this, you'll see that each row is a country, but that there are two observations, one for incidents and one for resolutions. So if we swap that out for this, we see that country, each country now has more than one row because the actual sort of value of each type of observation, now each one has its own row. Essentially, this taken a wide data set and made it longer by replicating the country, but by splitting out the incidents and resolutions by type. Yeah, Julia Demigliel Velasquez has pointed out each incident could have its own row. Yeah. Right, so let's move on. This is a little bit hard to wrap your head around. You're like, well, you know, this is less easy to read for people, but it is easier to read for the tidy verse. And so how we as individuals as people use data in tables is not exactly the way the tidy verse works best with data. And so that's a little bit of a shift in mindset. But let's let's carry on a little bit more about tibbles and how they're different than classic data frames. When you print a tibble, it specifies the type of each column. This is not something that's true of classic data frames. It only shows you 10 rows unless it is told otherwise, whereas classic data frames will just print them all. And it fits the columns to the display, whereas classic data frames will go off to the right-hand side and you have to scroll sideways to see all the columns. This is pretty unremarkable stuff. Most people probably don't print a whole lot of data frames in our studio, but it's good to know that it's different. Another important difference is how you subset the data within a tibble. You can subset a column as a value using the name of the tibble. That's this part here. The dollar sign and then the name of the column. You can also do it with the name of the tibble, double square parentheses, quotes column name, or the name of the tibble, double square parentheses, and the number of the column. And in R, there's no zero column. It starts with one. You can also subset a column as a variable. You will notice in this case, there is only one set of square brackets rather than two. And you can subset a row as a variable, again, using one set of square brackets. And the comma is different. You'll notice in, if you want a column, it's comma and then number. And if you want a row, it's number and then comma. This tells you sort of, it tells the tiny verse, which direction it's subsetting something in. But again, it counts from one. And something that's different between a tibble and a classic data frame is that you can embed subsetted variables and values using a pipe and the full stop. So you can pass things around between functions, between, you know, sort of data frames using these. Don't worry about this too much. This kind of thing is all in the cheat sheet. And you may or may not find these useful sort of shortcuts. Maybe you prefer to go to do the long way round, sort of saving the variables as separate entities and then using them. And that's fine because it can help to do things the long way round at least at first so that you really understand the processes. So we've talked a lot about what tibbles are. How do we create a tibble? Well, if you have an existing data frame, you can smash it into a tibble by naming, creating a new name for your tibble, using the assign function in R, which is less than dash. You can use as tibble function at parentheses and you put the non-tibble data frame here. What this does is it takes this object, the non-tibble data frame and applies this function to it and saves the result under this name. You can also create a new tibble from existing columns using this function. You'll note that this time it is just tibble. There's no as underscore. But it's the column A, column B, and column N have to already exist in your data environment. And finally, my personal favorite is that when you're importing data, you can just pass it through the as tibble function so that when this data is imported with the read function, in this case the read CSV, but it could be read table or read text as you like, that is then passed immediately to the as tibble function and the whole thing is saved as your tibble name. So we'll be playing around with that a little bit. Once you have the tibble, you can add columns to it. Again, you have a column variable. You pass it to the add column function and that is piped. So the original tibble name is piped to an add column function and a column variable. The whole thing is saved as a new data name. This will come in handy. You will actually need this for the workshop exercises today, but don't worry. This function is available in the cheat sheets. You can also add rows to an existing tibble much the same way, but with add row instead of add column. So we have arrived at our first set of exercises. So what I want you to do is open up, if you don't already have it open, the day one tidy verse worksheet and work through the first group of exercises on tibbles. Stop when you get to this, oops, stop when you get to this symbol, this sort of line break that I've added. The next set is tidy exercises. And I want you to know that a fully functional code for each question in these exercises is written out in a separate and appropriately named rmarkdown file. It does contain all the answers. So if you just really cannot be bothered, you can open that up and just look at the answers. But I think you will work best. You will learn more effectively. If you try them yourself, make a bunch of terrible mistakes and then look at the answers, you're more likely to actually remember the steps and the processes. Also, the cheat sheets are available. There are some on tibbles, for example, on how you create, how you add, how you subset all of these things. Thank you.