 Well, first of all, thank you very much, Julian and Sam, for having me here today. I'm sure you have learned, like all of you have learned a lot with both of them. I know Sam very well and he explains everything super, super clearly always. And he's also a very good coder, much better than me. So, and you're very lucky to have him both. But what I'm doing here today is basically showing exemplar presentation, exemplar study of how ARC can be used for different research purposes in criminology. And also to make a critical point about the use of police data, for instance, to explore crime-related research questions. So for the last couple of years, yes, last couple of years, I've been doing some research about how police recorded crime data that I guess you have used or you will use soon. It's affected by different sources of measurement error in bias. And why we need to be aware that these problems of measurement error can affect our understanding of crime, can affect the theories that we can construct when using crime data, but also the geography of crime understanding when we use police data to do crime mapping, for instance, and others. So although police data provides very valuable information and it's very easy to access, for instance, in the UK, we have APIs and open access data that we can just download and put on the map. We need to be aware that this type of data is affected by some sources of bias and some sources of measurement error that can affect our results. So that's what I'm trying to explore here. So the title of the presentation is assessing the impact of measurement error in crime data and represent two applications that we have done when trying to understand how this measurement error that affects crime data affects the results of regression models when we use police recorded crime data as a dependent or independent variable. And the other one, how it affects the geography of crime and crime mapping when we use police data for geographic crime analysis. This presentation has been called by many people from different universities. One of the papers presented has also been called by some length and others. So the overview of the presentation is, I will explain a little bit, what do I mean by bias in crime data, why we say that police recorded crime statistics are affected by measurement error and which sources of measurement error are involved in this type of problems, how it affects crime analysis in practice, but also in theory, in research, two research questions that we will be addressing today and how we have addressed those questions using simulation studies. So let's start from the beginning. What's crime statistics? Crime statistics recorded by the police is any crime that has been reported to or recorded by police forces and that they make it public for anyone to research and analyze it for different type of purposes. And so we know that police recorded crime data used by police forces to design and evaluate policing strategies. So for instance, they decide where to locate more police patrols based on known police records, but also for policymaking. The police designs and evaluates crime prevention policies based on police statistics, and we know that, but also by academics, like researchers make daily use of police recorded crime data to develop and test theories of crime and deviance. So we need to be aware that if we are using this type of data for so many different purposes, we need to be aware of all types of measurement error that affect this data and how it can affect each of those different outputs. So we know that police statistics are affected by different sources of measurement error. To mention only three of them, not all victims are equally willing to report crimes to the police. So we know that these changes, depending on the gender of the victim, the sex of the victim, ethnicity, age, the perceptions about the police of victims. So we know that victims that do not trust the police will report their victimizations less frequently. But also we know that the police don't control every single area equally. The police locate more patrols in some areas than others, and therefore they are more likely to witness crimes when they happen in some areas than others. And also counting rules. We also know that, for example, in England and Wales now there are some common counting rules that should be used by all police forces, but that's not the case. We have several reports published by HMIC saying that some police forces follow some counting rules different than others. And therefore in some police forces, measurement error can also affect crime data in a different way than in others. All these affects what we understand as the Darfigur, what we call the Darfigur of crime, which is all police or all crime data that is not recorded in official statistics of crime. However, this doesn't necessarily need to be a problem if all areas or all victims or all local authorities, whatever level of analysis we are using, suffers the same amount of missing data. So if the proportion, the likelihood of crime victimization being reported to the police is the same for every single individual in society, then our analysis will not be affected, any of our analysis will be affected by measurement error. But if some victims report more than others, or some areas report more than others, or some regions report more than others, then all our analysis are likely to be affected by these differential sources of measurement error affecting some places or some individuals more than others. So that's what we try to address this here. How these differential measurement error affecting some individuals or some victims more than others and some areas more than others affects our understanding of crime. So this is a very easy to understand representation of the Darfigur of crime. So Darfigur of crime are all crimes that are not detected or not considered as such by victims or not reported to the police. While official data would include different sources of data, but we are here exploring measurement error in one of them, and which is police recorded crime data. Not only that, but during the last few years, in criminology, there's been a move towards analyzing smaller and smaller levels of analysis. So we know that in the 40s, 50s, and 60s, criminology started doing maps of crime at the neighborhood level, for instance. But then we also realized that there were features of the environment that were generators and attractors of crime. And these were located in very specific geographical areas, like it can be a bar, it can be a pub, it can be a club. And we know that around those areas, crime increases. So we started doing maps at very small levels of analysis to understand which features of the urban landscape affect crime more than others. And we call them microplaces. We started doing maps of crime at the level of microplaces. And I guess you have already talked about this or you will talk about this tomorrow. What we try to do here is understand whether this measurement error that we were talking about affects this type of maps at large geographical scales or this type of maps at smaller geographical scales. Our hypothesis that we will address later is that maps produce a very low-level left geographical scale, such as microplaces, which can be street segments, streets, or different types of grids, very small grids. I'd expect that by more bias and measurement error, that maps produce at larger geographical scales. And why do we think that's the case? Because communities are much more homogeneous at smaller levels of analysis. For instance, in one street, you can have 80% of the population that do not trust the police. And in the next street, this proportion can go to 40%. And therefore, the likelihood of crimes being reported to the police will change very much between microplaces. While if we choose larger levels of analysis, these proportions will merge together. And the characteristics of communities will become more homogeneous across all areas. And so we're trying to address this type of question here. So we theorize or we have hypothesized that the figure of crime may vary very much between microplaces. And so I guess you will use this type of data source tomorrow. In the UK, we have data.police.uk, anyone can access and download police data and produce maps at very low levels of analysis. But we don't know how measurement error affects our research output when we do that. So we are addressing two specific rich questions. One is, are crime maps produced at small, very socially homogeneous spatial scales at a larger risk of bias compared to mass produce at larger, more socially heterogeneous scales. And so this rich question was addressed in a paper that I did with Angelo Moretti and Sam Langton. And it should be published soon in the Journal of Experimental Criminology, but still impressed. And the second one is, to what extent this measurement error that affects police recorded crime data impact the estimates of regression models exploring the causes and consequences of crime. And this is another paper that's now in under review that I co-authored with Josepina Sanchez and Brandon Smith and Alex. So let's start by the first research question, which we were trying to understand how my face is in the middle of the results. How the geographical scale we use for our output when we do crime mapping is affected by measurement error and bias. So what we have here is the relative difference. We did all this research using simulation studies, by the way, and I can explain later how we simulated crime statistics for each case, but I didn't want to spend too much time on that because I wanted to just explain the results for now, but we can address them later if you have questions. So let's start from the first question. How does bias in police recorded crime data affect crime mapping at different geographical scales? So we have here the relative difference between police recorded crime data and all crimes that happen in society. This again is simulated data. And here we have the relative bias at the level of output areas, which are very small levels of analysis in the UK, LSOAs, which are a little bit larger, MSOAs, a little bit larger, and words, which are our largest scale in this study. So we see that the mean difference between all crimes and those known to the police is 62 across all geographical scales. However, when we analyze the standard deviation of this difference between all crimes and crimes known to the police, which is the variance between one and another, and I will visualize this later, so it's going to be more clear. This variance, or the standard deviation in this case, is much larger when crime data, police crime data are aggregated at the level of very small levels of geography. What does it mean that in some areas the figure of crime, the difference between all crimes and those known to the police, will be very large and in other microplates or output areas will be very, very large or very small, while at the world level this difference between areas becomes very small. So all neighborhoods or words will have similar amounts of that figure of crime. And this is visualized better here. By the way, San did this visualization. So it shows for different levels of analysis, output areas, LSOAs, MSOAs, and words, and different types of crime, residence crime, theft, vehicle crime, and violent crime, when we choose very small levels of analysis, the difference between the figure of crime in areas becomes very large. In some areas we know around 80% of crimes, you know, in some areas we don't know around 80% of crimes and in some areas we don't know around 30% of crimes. So crimes produced from police recorded crime data at the level of microplaces may be very unreliable. Maybe, again, this is a simulation study. While if we produce maps at larger scales of geography, such as war, MSOA, even LSOA, this difference between the figure of crime becomes smaller for all crime types. This is again visualized better using maps. Here we use when analyzed police recorded crime data at microplaces and colors represented a figure of crime. So when it's very dark, it means that in this area between 80% and 100% of crimes are not known to the police. And when it's very light, it means that between zero and 20% of crimes are not known to the police. So very small levels of analysis that a figure of crime varies very much between areas. This is Manchester, while when we choose larger geographical scales, this proportion of crimes unknown becomes very similar between all areas. Now, the second question, how does this measurement in crime data or bias affecting crime data impact the results of linear models when we use police data, either as a dependent or independent variable? And I will show both examples. So let's start using this very simple linear model. We're trying to explain worry about crime in areas from a measure of crime affected by measurement error, because it's police recorded crime and a measure of disorder, neighborhood disorder, perceptions of neighborhood disorder. So we estimate this model, but only one of the variables could be affected by measurement error. So here again, we show the bias in beta one, which would be the regression coefficient of the first part of the model, of the first independent variable of the model. So here in the black line shows that the random error is the difference between the figure of crime in areas is known and is what we observe in the data. And no, it's not in gray. We have is as observed in real case based on on the on the campsite offering language and in light gray is twice as observed. So the bias integration coefficient becomes larger in all cases is very large, but becomes larger when there's no a random error between between areas. Now, this also affects not only the bias in the coefficient of this first variable, but also the other variable. So when we have one variable in our model affected by measurement error, not only the coefficient for this variable will be affected by bias, but all regression coefficients in our model. So our whole model may be somehow wrong. And we also show it here, the bias becomes larger, of course, when the average level of on the on the recording becomes larger where more crimes are known to the police. And it also affects the measure of of error of the regression coefficients. However, because we not only want to be negative, we also want to be a little bit positive. And we we try to do the same analysis, but log transforming crime rates, instead of introducing police recorded crime data just as it is, and as a number will log transform it as a counter. So we see, because here the scale of the white and white axis is much smaller, you see, when this one is the same, but in which much smaller, and this one is also smaller, this one is same. The measurement error has, I have questions in the chat. Yeah, every question have your PowerPoint slide deck. Definitely. Yeah, I can share this, share it later. And well, when we transform police recorded crime rates using low transformations, then the impact becomes smaller. So we don't know we get this is one first research and we don't want first study with that we have done in this, and trying this type of transformation, but it seems to work. So one way of addressing this problem of measurement error when we use police crime data for linear models could be low transforming. But we can also use crime data as an, as an independent variable, right. So here in this, in this case, we are trying to explain crime affected by measurement error using measures of worry about crime and perceptions of disorder. Again, regression coefficients when our dependent variable is affected by measurement error go over the roof when the figure of crime becomes larger. However, if we log transform again, the measure of crime and these make this the impact of measurement or integration coefficients becomes much smaller in every single case, not when we analyze the, the error of the coefficient, but when we estimate the coefficient, and these are in most cases, not affected by measurement. So again, log transforming crime rates can be a good practice if we use police data for, for linear models. Some conclusions of these two papers. And we have seen that aggregating crime data and recorded by the police at very detailed levels analysis can increase the risk of producing inaccurate maps. And while producing maps from police recorded crime data at the level of neighborhoods or larger scales may show a more accurate image of the distribution of crime in, in geographies. And from the second paper, we can say that linear regression models of youth police recorded crime rates are biased, we can say that. But this might, this bias may be mitigated if we log transform crime. So these two papers are published for now as preprints. And the second one, the one about geographical analysis of crime would be published soon and in the general experimental criminology. And all our codes are available on GitHub. So you can log in, see what we have done, reproduce it in other, in other areas and so on. So for instance, in this one, you can access our GitHub and see the codes that we have used and also recycle them for other purposes. And so this could be the map that we used, the, sorry, the code that we used, you can access them and reproduce everything in our, in your computers. And yeah, so that's all for, for today. Hopefully it's been interesting.