 All of that backstory of three minutes in, we're still making good time. If you want to look at my notes for this talk, any future research that I do, there's a link here. Feel free to grab that link. It goes to a document that is a live document. It has this presentation and some other stuff that I'm working on alongside of this included in it. And as I mentioned before, I could add so many more names to this, but at the time of working on this talk, these were the names that were very prominent to me. And the idea of this came from a friend of mine, Wayne Jones, who had done something similar by documenting the shooting incidents in New York City. Now, New York City is a lot larger of a space, a lot more crowded than San Diego, but I feel like if he was able to make sense of all this data, I could as well. So I'm asking like, what is the purpose of doing all of this? It's not to prove a point. It's not to do any of those things. It's simple. Make data readable. I mean, it's, I have seen so many times and I will show as we jump into this that the data that I got was unreadable at first. There was a lot of stuff. And I mean, I will say that some of it wasn't CSV file, so sorry. But it was very, very challenging that racial identification and profiling act data that I came up that I was trying to discover was separated between 12 different CSV files that all documented the same 20,000 incidents. So having to create joins and make sense of all that data just didn't make sense. So my goal is to make data readable for people who make real decisions such as politicians, people who manage city and municipal budgets, as well as zoning laws and other things and training. But to also think of other use cases for this data, you see, if, if we make the data readable, it isn't just for those in tech, it isn't just for those in Congress or at the state capital. It could also be for real estate. It could help people make decisions when they're moving to an area. And I'm talking about this specific data set my police call data. So we can see, like, how many parties have the cops called them on them in a single area of where I'm from and also allow resource allocation. If we know that one particular area of San Diego has more police calls and more traffic stops than another area. But that area is grossly underfunded or under resourced. We can start reallocating those funds and resources to other points. And of course for training. So I say all of this to encourage you because many cities themselves have open data. They all follow what's called the PDDL or the public domain and distribution license, which means the government has to make this available. And you're allowed to use it for whatever reasons you want. So, I've done enough talking. Let's look at some data. And at this point, I do need to switch screens because browser sizes and me looking at cameras will not make sense. So, let's find. Okay, there we go. And perfect. Awesome. So, I wanted to mention first. Fine, my mouse. There we go. As I mentioned before, all of this data is available on San Diego's open data portal. And I just simply went to police, went to police calls for service. And grabbed all 3 million records that existed in here. And what you can kind of see here is you have an incident number and some other information here. This is not readable. It doesn't give us any help. So, we're going to go to our Jupyter notebook here. It's Python. I'm going to walk everyone through it. I'm going to be using a data framing tool called pandas. Pandas allows us to basically create, recreate that Excel spreadsheet look and feel on data. So, I do this read CSV command. And at this point, I just load the data set in and I'm actually going to make sure nothing breaks. All right, cool. Perfect. So, I'm doing this live. All of the data is stored on my computer or in a safe cloud place. So, don't try to do this without the data. So, you see, the first thing that I did was try to get a definition of what these call types are because I'm sure if you've ever watched your favorite buddy cop film, you've seen the comments of like, oh, we've got a 1016 on Juniper and Ivy, like that. We don't know what that means. But if we have that description, we can add it. So that's the first thing that I did was add the call types data set. And I built some helper scripts. And I had to do this, unfortunately, because not only was the data unreadable, but it was also not consistent. Over the years, the process has changed and improved for the better, which is great. However, we have to also make sure that we can read the data as well over time. We've built our helper scripts. We've done that. I'm just going to tap that really quick to make sure those exist. And then we start uploading data. I'm going to use just the 2021 data because I'm live streaming this and I don't want my computer to explode. So what we're doing here is we're calling CSV file, this data set from just 2021, which I don't think is, I think it's last updated in early April. And then we're going to read that data as well. So we just hit that really quick and boom, we now have all of that data and you can still see those incident numbers time of day, we didn't change anything here. So this is where it gets fun. I'm here to have fun. I hope y'all are too. What I did here was I created what was called the DF with call type. So the DF with call type is simple. And by simple, I mean it's rocket science. We're going to do a merge on our data frame and that call type data set that we added. We're going to merge it on the call type. We're going to remove all the empty values in that area and we're going to drop the duplicates now. Can I make a request? There's been a request for you to make this bigger. Absolutely. To make it large. Can we do it? I tried to do it, but I don't have the power. There we go. Yes, I hope that was that's better. Okay. So I should still be able to read this also. All right, so I'm going to break this line up a little bit. The PD merge basically takes that data frame that glorified Excel spreadsheet that we made. It's going to combine it with those call types. The fill in a is going to remove any of the in a in not a number fields and it's going to make it just a blank string, because sometimes a blank string is just so much easier. And then something weird happened when I did this. It started creating some duplicate records. Not sure why that happened. I've asked questions and haven't gotten the right answers yet, but I was told that if you drop duplicates, it will not affect your data. I tested that and of course the numbers did not change, which is good. That's what we like to see. So can we make sense of this data? Well, let's find out. If I get an error message, I know why because I did not run this message. There we go. Let's try that one more time. There we are. All right. So we're going to create a sorting function. That's what we did here. This lambda function just says hey sort it by the length of the beat. Now a beat is just a district that will not even a district. It's a segment of the space where police officers will control. So you'll hear me talk about a particular beat here. For this example, I'm going to use beat 521. So the we're going to go to the beat and the call type. And we're going to sort in those areas. We're going to sort by the beat number first. So if we hit that, what we see is a list of beats and a bunch of calls. Now remember, this is just from January to about April. So beat 521 very busy, over 9,000 calls. And we can go through that. Now I'm giving this data in a Jupyter notebook because I know in the data science space that seems to be kind of the default for Python data science and presentations. I want you to keep that in mind as we run through the next five minutes. Beat 523, we have that. Let's look at just those calls. If I hit update here, we get some records. We can see the day of the week, the address, the number we've already covered all of this before. You've seen it and a description of what those calls actually work. A little bit more readable now instead of 1186, we got special detail. Great. So what about intersections? I want to see what are the busiest intersections based on our records. And that involves doing a group by and some more sorting, but it is possible we can do that we can get the busiest intersection which just happens to be fifth and f that's the busiest for my city. But of course nothing is more readable than a chart. So if I just, I'm going to update this chart really quick. We can actually see what are the priority of calls based on this one beat beat 523. And it looks like most of the calls are priority to priority to actually means that it is not of the utmost emergency, but police should respond as fast as possible. So three, you can take your time one you should break laws to get there. And then we have some other ones that have kind of faded away in time. And of course, the other thing that we can check is this are what are the most frequent police calls why are people calling the cops in areas in San Diego. They're disturbing the peace. And then someone's reporting a crime that's happening and would like someone to come in actually do something about it. It's great. It's a little narrow. As I mentioned before I'm a developer advocate for elastic so in most cases when I need a lot of data, and I need to understand a lot of data. So what I can do is take the elastic route which is elastic search. So what I can do is, I can take all of this data not just 2021 but all of the data and store it in an elastic search server, which gives me the ability to then use visualization tools like kebana lens, as well as save my computer from exploding. So now I can actually show you that. And that's what this is. There's a lot going on a lot of colors. I'm a colorful person. I like colors. But we can now break down by the day of the week or the date. What the calls were. And how they work. And we can still see the data persist. We can see disturbing the peace. If we want to check on a beat. I don't have 520 odd to have 521 right here. We can go to 521. And we should see based on the geo coordinates 521 maps should have changed. And quick interruption to say for more minutes. Perfect. So we can actually see where beat 521 is which is in this area which it's a really nice area. There's also a peer there. You should go sometime if you're ever in the area. Now. I mentioned earlier that elastic search is beautiful as it is is not how data scientists present data to people who make decisions. They do it through Jupiter notebooks. Well, we have a tool called Elon and Elon is pandas, but for elastic search. So I'm going to show you how easy it is and I'm going to do it in three minutes or less because I will get in trouble if I don't. So traditionally what we would have to do is we would have to import elastic search set up a client. We're going to have to do that anyway. And then we would have to do a bulk operator to bulk upload all of the 2021 data provide any manipulations on it that we want. That's what this actions line is here. But now what we can do is we can just do all that work inside of data frames. We can do things like results client search on the index and match all of the hits that you want. And when we do Elon, it's just one line of code. Elon data frame the name of our data frame and we just have to pass in our client and we get this nice little elastic search ID, but we get all the same information in a slightly different order that we were looking at before. And as I mentioned before, we can do all of the same things where we can sort and we can highlight just a particular beat like in this case where we just get beat 523. They look the same. And as I mentioned, the most readable way to do this is to give them a nice graph. We can even use the pandas functionality by providing graphs and plots using map plot live. Nothing changes. Our data stays the same. Ignore really long red errors because I did not reload that. And that's it. So my encouragement for you in the last 20 seconds is if you want to take action if you want to observe and you keep being told. The data doesn't say the data is probably available and it's probably there but not many people have read it or seen it because it's not really readable. But hopefully using tools like panda using tools like a lot. Maybe even tools like Kibana where you can pull up maps and fancy colors. You can make your data more readable, and you can present it and hopefully make the change that you do want to see in the world.