 So, I'm going to talk to you about a project that has consumed me for the past six months. It's basically a Moby Dick project, which I've done several of. I just can't sleep until I solve it. And it's also, I think, one of the projects that's perfect for this industry, this domain of data visualization. I feel like if we solve it, we can essentially, like, mic drop and just stop, because it's one of those things that I think is so hard. I think we will have tread so much ground if we're able to solve it. The story behind this project starts with Paul Lavondi, who's here somewhere. And I basically saw him give a talk at a D3JS meetup in New York City. And he told the story about incarceration data starting with this slide. And this is to paraphrase him, basically, incarceration has gotten really bad in the US. So on the left-hand side, the axis is prisoners per 100,000 people in the US. And in the 70s, it was around 300 people. And then over the past 40 or 50 years, it's gone up dramatically. It's at such the point that if you're a man in the US, there's 1,000 incarcerated individuals per 100,000 people. And if you do the math, since we're all statistically inclined here, we're around 1%. And that is a US average. If you look at Louisiana, for example, it's much higher. We'll see their data in a second. And the funny thing about this is that it's a bipartisan issue. All Republicans, Democrats, they're not psyched when they look at this chart. More or less everyone wants to decarcerate, which is great. And the current rhetoric around how we would decarcerate is pretty familiar. A lot of you in your heads, you're probably like, yeah, this number is ridiculously high. And obviously, it's the war on drugs. So let's just stop doing that. Or it's systemic racism or mandatory minimums. How many people know what mandatory minimums are? Okay, actually, that's a lot more than I thought. But so you commit a crime and there is a minimum sentencing guideline for the judge or private prisons. So that has become a hot topic in politics that there's a financial incentive for the private prison industry to keep people in prison, which could be the case too. I mean, these all could be the case. The funny thing when I ask people, how much do you think this contributes to the incarceration rate increase? Everyone's like a lot, but nobody really knows. At least anecdotally when I ask them, this is perpetuated also by a lot of the stuff that we're seeing in media right now. So how many people have seen this documentary on the right-hand side? 13th? Okay, it's a real tear-jerker. It's very intense. He talks about racism in America from slavery to the present-day criminal justice system and spends a lot of time talking about the war on drugs. And Jay-Z also did this great video with the New York Times on the war on drugs. And you see those and you're like, oh, we just need to decriminalize marijuana and cocaine possession. And if we stop doing that, the incarceration rates will plummet. And this is a really interesting story. And I think when you're on the top of the layers of the onion, it kind of makes sense, but when you pull back those layers, you start to see a lot more complexity. So this is a chart looking at some of the state-level data that I mentioned before, Louisiana at the very top. So they're roughly around 0.8%, 0.9% of their population is in prison right now, which is crazy. That's like 1 in 100 people, women and men are in prison in Louisiana. And there are countless reasons for that. And a lot of that's obscured when we go back to this chart and also from what we're seeing in the media and in culture right now. And then we also have the fact that there are states like New York, which have decarcerated over the past 20 years. There, that second to bottom line right there. And then if you pull back the layer of the onion one more, it's actually just New York City and New York State has actually increased the number of prisoners that it's had over the past several years. So really we're going as deep as we can and seeing how complicated incarceration is. The other kind of ridiculous thing about going back to these line charts is that incarceration is a flow system. So the way that we do it right now is every December 31st, they count how many people are in prison, all the states submit their data, and we get this aggregate line right here. But prison is a flow system, so imagine it like traffic. You have people on the fast lane, this lane right here. Those are your one to two year time served individuals. They're in and out of prison before the counts are even done sometimes. And there's people in the slow lane and those are the people serving 20, 15 year sentences. And they're gonna be there for a really long time. But if we just stopped, I'm actually close keynote, if we just looked at just this frame right here, we're really getting a myopic view of what's going on. This is a flow chart that I didn't make, but it's actually really helpful. It's done by the Bureau of Justice Statistics several years ago. And what we measure is that circle on the upper right. That's the prison system. But these are all the things that contribute to prison, starting left-hand side crime. So people commit a crime and then it's reported to the police hopefully. And then the police decide to investigate and then maybe it goes to an arrest and then it goes to the DA. It's basically your episode of law and order and then you hit charges filed. And then they plead out or not. And there's all these if-then statements and then we end up right there in prison. And how we go from the left-hand side to the right-hand side is one of the things that I can't even wrap my head around and it's been six months researching this. Even the rates at which we move from arrest to charges filed, little tiny percentage changes in that dial can actually propagate to way more prisoners downstream. So when you're thinking about, again, going back to the Holy Grail line chart, what will bring down that line to the levels before has tons of different inputs going into it. And reducing it to something as simple as the war on drugs, even if it's accurate would be a much more simplistic view of what's going on. So my goal with this project, which was actually done for OpenVizgon, was to visualize all the things. Like if I were to just spend a ton of time researching all the facets of this funnel, what would we end up with? How could I understand the system more to look more like the traffic video than the line chart that we had previously? The first thing that I did was started collecting the data. This was a Google doc I was working on with someone else, of each of the stages of that funnel. So we have crimes committed, arrests, crime clearances, felony filings, and all the various data sources that would be there for each of these stages of that funnel. One thing that makes this whole entire project really difficult is that the prison system is not like an API that you can just call. There are actually 50 different states doing entirely different things. The way Louisiana runs their prison system is obviously different than New York State and they are either good data bookkeepers or they're terrible. And putting all that data together, which is what the Bureau of Justice Statistics does, is an immense undertaking and they do the best that they can with their budget, but having uniform, clear, clean data is not a reality for visualizing the system. The other really annoying thing is that the good data, I can't even open. So you'll see this little red highlighted background thing. And this is the best data set, I think that's out there, which is from the National Corrections Reporting Program. They give you a fence level data, like the number of people in prison for marijuana versus property crimes versus murder and to open this data set to the extent that I wanted to, you have to have a PhD and be supported by an academic program. So that's crazy and it's not like they're just being annoying, it's because there's a lot of personal data in that data set. Like these are people that have felony convictions. They already have a hard time, like having them be the guinea pig for a data study and their name's getting thrown around is something that I start to empathize with why they're restricting the data access. The data that I was able to get was slightly anonymized, but the problem is almost too anonymized. So for example, sentence length right here, this isn't the actual sentence length, it's a bin number. So number four is something like less than a year. But number one is probably something like three to five years. So we're not getting the level of fidelity that you would need to actually visualize the system. You can't just say the car's on the road for five to ten years. It needs to be five years one month. So attempt one on this data using what I just showed you, I put out a couple months ago. And the first thing I wanted to answer, at least visually using the data I could find, was how many people are in prison for certain crimes. And the thing that I'm always shocked by is that the drug crimes are about 16% of the total prison population, which is a lot lower than what people think because we keep hearing it's higher, and that's because a lot of the data that is reported is shaped and nudged a certain way to seem higher than it actually is, but on a national level it's still much lower. And what that meant for me is that even if we release all of the drug criminal, the people in prison for drug crimes, we would still have a hugely high incarceration rate. I left out the statistic before, but the US has 25% of the world's prison population, but 5% of the globe's human population, so the US disproportionately has a really, really high incarceration rate, and even letting out all the drug criminals will not really move the needle that much. And the really disturbing thing is that half of them are violent criminals, and we're talking about decarceration and having softer on crime policy, like we're talking about people who committed murder and robbery and rape and aggravated assault, and those are things that are almost tough to talk about when you're thinking about softer on crime, less punitive policies. The next thing that I wanted to know about was how long it took for certain prisoners to leave prison. So one of the anecdotal things I kept hearing was, oh, you have a low level drug offender in prison for 20 years, and that's probably the case. I'm sure there's someone in prison that got the short end of the stick, and they're serving a much longer sentence than they deserve. However, this is the 350,000 people that went into prison in the year 2000, so here they are marching into prison, and the years left in prison is what they're distributed into right now, so about half of them will end up with a less than 12 month prison term. So after one year, they're moved over into the free world bucket, and half of them are already out. And then within the course of the next few years, by 2003, so three years later, about 75% of them are out, and then over the course of five years, 86% of them are out, and obviously there are still people left in this 10 year bucket, which is highly unfortunate, but most of the people in prison will be out within a very short time period, which starts to make you understand how much churn there is in this flow system. The traffic is moving very fast. People are getting on and off constantly. With the crappy data that I have before, this is the best I could do to try to simulate this turnover rate. So we start the year 2000 with what is that black line, and then add in the admissions and remove the people released. Move on to the next year. These are all of the admissions, which is a staggering amount of the people that were already there. Again, the turnover rate is really high, and at the end of the year, we basically let out the same number. So that's why the prison rate is roughly stable on that first slide. Huge amount in, huge amount out every single year. So I was actually really underwhelmed by these results. The thing that I wanted was the bird's eye view of the system, and what I had was basically an abstraction using a lot of assumptions that the data had. And that was the case until I came across this data set. And I basically freaked out and stopped, I cleared my calendar and just started working on this again. So on our data sets, so reddit slash r slash data sets, there was this post, the entire database of the current Florida population, 100,000 plus inmates. And I almost couldn't believe it until I went to the site, which is right here. And there it is, we have a 1.2 gigabyte Microsoft Access database file available for you to download. And it started to dawn on me why they do this. If you ever heard of Florida Man on Twitter or read news about the crazy things coming out of Florida, it's not because crazy people live in Florida, it's because the journalists have unprecedented access to what happens in the police and court system in Florida due to their stance on transparency. They will grant more FOIA requests than most other states. And for whatever reason, they're doing it for prisons. So the detail on this is just amazing. You can look up any offender, so not to pick on anybody, but this is the level of detail, photos, names. This person is actually deceased in prison. What they went to prison for, while after they passed away, so they were there for a felony gun charge. But then they had previous prison history of burglary. So they were in prison May 1st, 1990, left prison December 31st, 1990, then back again a year and a half later. Like this is the bird's eye view that I wanted, right? A little bit more detail about the data in the Microsoft database access form. Hundreds of thousands of rows of data about exactly why they're in prison as well. So this is the adjudification charge, so we can see this person failed to appeal or felony bail. There's some people up there, sexual battery unspecified. Kidnapping, ransom, like crazy amounts of detail. The funny thing is they never obviously clustered this into drug offense. It's just actual raw text that I'll have to parse eventually. And their names. So we saw this before in that profile that I brought up. Like the individual's names actually have their addresses as well, which is just mind-blowing. And I'm kind of scared to even mention it for fear that they pull it down. And as well as tattoos. If you look at row two, Yosamity, Sam, Rose, Dragon, Linda, that's a cool data by this project as well that I'll probably do at some point. All right, so the prompt that I had to myself was if this folk were to be the wardens of the Florida prison system, like what would we make? Like we wouldn't make a simple bar chart. Like I would want some thing that really gave me an aerial view of what's going on. And the thing that I kept coming back to are these blocks by Elijah Meeks, who maybe is here and I haven't seen him. And this is representative, but particles moving through different nodes. And Nico earlier from Uber gave an example of illustrative particles that were randomly generated. I want these to be people. Like I wanna know how long this person goes from here to here. Here's another example that I was really inspired by. You can imagine that this is the BJS flow chart that we saw before. Crime on the left hand side, release on the right hand side, or parole. Well, the thing that this would be missing is like the recidivism rate, so the reoffense rate. People that leave prison then come back. So attempt two. This is a project that I have been coding for the past few weeks up until the moment of this morning. So they say never to do a live demo, but screw it, we're gonna do a live demo. And let's make this a little bit bigger. Okay, so this is the Florida, this is the Florida prison population on January 2010. And the way that this is framed is kind of that linear timeline that we saw before of sentencing on left hand side, people locked up in prison and then released on the right hand side. Each of these boxes represent 50 people, but it's not an abstraction of 50 people, it's actually 50 people. So if we were to click one of these nodes, like these are the individuals in that little cluster up there, which I'll get to in a second in terms of what it means, and we can click on their profile and see all the detail. Obviously, I could also add that into the visualization, which I'll hopefully get to, but this is the level, we're not working with abstractions, it's not illustrative, it's trying to visualize the system in its detail. And then there's people in this section here that are there for life. And it's a little bit of weird to see this level of humanity in a data visualization. I think actually really important when we're talking about statistics in prisons, like these are just all old men, predominantly old black men, and thinking about a punitive criminal justice system and saying like we're gonna lock them in and throw away the key, like you almost empathize to the fact that it's probably unlikely these people will recommit and now it's like well, why are they still in prison for life? One of the other cool things is advancing the timeline. So it's not on autoplay quite yet, but to advance it one month to February 2010, we have the little red squares moving from the left-hand side to the right-hand side and adding on to the visualization. So if you were the warden, just watching the whole system from your bird's eye view, all the red squares would be new faces from January to February. So continuing this process of the flow moving left to right, or sorry, moving from January to April. So four months have passed by, again all the red squares are new faces and this is getting to the turnover rate of prisons. So if we advance this to July, only seven months later, this entire section up top is almost entirely red. Again, completely new faces entering into prison. That top section is actually, for the people that can't see it, people serving less than 12 months. This middle section, people serving one to three years, people serving three to five years. So if you're entering your 10-year sentence, for example, you're gonna be slotting in right to the square, which are about 4% of people admitted every month. The other thing is just jumping around in this timeline with the data that we have. So moving to, for example, let's say present day, December 2012, or December 2016, it looks pretty much the same, which is the case because we have 97,000 people in Florida prison right now and when I started this visualization, we had 103,000. So again, really, really high turnover rate, but roughly where we were before, going back to that crappy example that I had using the anonymized data. The other thing that's really fun with this is watching the release column build. So that's this far right column right here and because they only have a one month left on their prison term, if you advance it, they move over here and I have a box just tracking how many people have been released from prison over that time period. So if we move forward, again, like six months, we have 17,000 people released from prison, which again is insane considering there are only 103 people in prison. So again, really, really high turnover rate, which I think is just kind of mind boggling when you can see the entire system set up. All right, so that's my live demo. The user for this is not the academics. I can show this and they're like, yeah, water is what? I already know all of those things. They know that drugs don't drive most of the prison system. They know that terms are really low, but they actually are pretty inconsequential in the policy. The people who drive prison policy are our grandparents and our parents and our siblings, the voters. So the biggest fulcrums in the system are actually prosecutors and the people in the policy that we want our police officers to handle, are they arresting you for marijuana possession or are they showing a blind eye? And if I can get my grandmother to understand some of these things and not that it's just down to mandatory minimums and three strike laws and the war on drugs, things that aren't actually going to decarcerate the system, it's a huge win and that's the promise of data visualization in my mind. I left off one point, which I really wanted to mention was that this is still visualizing the prison system and I want to emphasize that the people getting released, they matter a lot too. So these are felons that will have a record, the rest of their life potentially. It's hard to get jobs, it's hard to find housing, it is the scarlet letter that we place on our society. So thinking about that like 20,000 people released in six months and the percentage of the people that are just in prison already is really, really crazy. There's still a lot of work I want to do on this beyond my train ride up here. Running simulations, what would happen if you cut the admissions in half next month and did that for all the preceding months? How long would it take for the prison population to go down? Which obviously is going to happen very quickly. In fact, if you go back to some of those slides that I was showing you earlier, because about 30% of people leave prison every year, if we just stopped admitting criminals, we would be down to the 1970s right in a few years. So which just thinking about admissions as the main driver of incarceration is like incredibly impressive for understanding the power and control that we have over decarcerating America. Offenses, so I still haven't added in offenses because I have to parse all of those annoying fields. Repeat offenders, so I've started looking at this already, just haven't added to the vis. In any given month, about 25% of those people had been in Florida prison previously. So showing that movement back into the system is really important. And also for understanding that scarlet letter that we place on former felons are people who have committed a felony and have been released from prison. But they're committing crime again or violating their parole and they're ending up back in the system. And then upstream data. So I met a woman working on this, a team working on this for the ACLU. They're thinking about arrests in Massachusetts and tying all of those things together and thinking about the little tiny percentage rate changes that could propagate into the prison system is immensely important. That's it, you can follow along on the pudding and thank you. And the last thing I wanna talk about is our talk yesterday that ended our day, Matt's talk. We've gotten several emails from folks and there's been some really wonderful conversations that have happened since that. And there's a couple of things I wanted to say and share with all of you. The first thing is that we recognize that the use of the photos was against their photo policy. Those folks did not consent to have their photos shown and they certainly will not be in the videos. We'll blur them out. And I think, thank you all who pointed that out. It's really important and valuable and we'll take them out. The second thing is that of course, of course we are regretful and we're sorry to anyone who felt uncomfortable by that talk. OpenViscount is a place that is safe and a place where we have a conversation as a community around things that are really hard, around our work, around data ethics, around the right and the wrong things to do with our data. And Matt has received some of this feedback. He also feels regretful that his talk made folks feel uncomfortable and he would really, really love to improve upon his work. And so he's here today. He would absolutely love to talk to you guys if you feel comfortable relaying some of your feedback to him. If you don't but still want to relay feedback, feel free to email us at openViscount.boku.com and we will send it to him anonymously after the conference. He really does want to do a good job. And I just want you all to know that we're committed to continue making this a safe space for everyone. I know that this is a hard conversation, but in the same rate, I'm really proud of us as a community for being able to have it. We're here, we have a diverse audience and a diverse set of perspectives and we can really help each other learn. I know that I've been in data sets for a long time where you just sort of lose it. You lose your perspective. And it's really wonderful that we're here and are able to share that with each other. Hopefully you guys feel a little better, but I'm all happy that you're here and that you came back. Oh, thank you, thank you, thank you.