 Hey folks, I'm Pat Schloss. From time to time on this channel, I like to take a step back from doing demos and instructional content and instead try to apply those principles to create data visualizations. I've done that in the past with some baseball data. We've written a paper together on YouTube here. Well, here in the United States, it's the middle of June as I'm speaking to you. June 19th is just a few days away. June 19th in the United States is a special holiday, I would say that's probably not very well known or appreciated outside of the Black community. It is the anniversary of the Emancipation of Black Slaves in Texas and it is, as I understand it, widely regarded as kind of the true Emancipation Day from slavery in the United States. It was June 19th, 1865. The Emancipation Proclamation from President Lincoln was released in 1862 freeing all slaves in Confederate states and so again, these things take time to get out. Unfortunately, it wasn't until the amendment out long slavery that slavery was banned throughout the United States in 1865. As we all know, unfortunately, the end of slavery did not mean the end of racial violence. It did not mean the end of racism by any means and we know that from recent events over the past year and many years, George Floyd, Breonna Taylor and so forth. So what I would like to do as a commemoration of June 19th this year is spend a couple episodes looking through a database that I was lucky enough to find online from researchers at the University of Georgia, University of Washington and University of Illinois, Chicago and I'm sure many other places where they have painstakingly gone through and tried to catalog, validate and so forth lynchings that occurred between say, I think, 1880 through 1950. So what is a lynching? If you're like me, you perhaps have a certain image in your own head of what a lynching is. The researchers that have put together this database as well as other researchers that also study this racial violence history define a lynching as evidence that the person was killed, the killing was illegal, at least three people were involved in killing the victim and the killing was justified with reference to tradition, justice or honor. Now, again, as the developers of this database and the curators of this database are very quick to point out, this does not include all cases of racial violence. It ends in around 1950. It also doesn't include police violence, right? And so there's a lot that's not captured in here. Yet I still think it tells a very stark story of racial history in the United States from reconstruction going forward. And I think it is useful for telling a story, thinking about how we can use data visualization to tell that story, and some of the factors that might go into designing an effective data visual that tells that story. So I know I am a very limited person and taking on this large data set and really important data set, I think bears a lot of responsibility. And I hope people will appreciate what I have to share here in that spirit. The data that I'm going to be using, I obtained from the CSDE database at the University of Washington CSDE stands for the Center for Studies in Demography and Ecology. So because the data aren't mine and the data aren't publicly available, I'm not going to be posting the data or making the data directly available to you. All the code that I do develop will be linked down below so that you can go to, again, the blog post, see the code that I've developed and see, you know, what I'm doing at different places. I would encourage you, if you're interested in this history, to go ahead to the CSDE website, put in a request to get the data. I found that Professor Amy Bailey was very responsive in getting me the data very quickly. There's the data that I'm going to be talking about, but then there's all of kind of the background material kind of validating what they are calling a lynching. This project, I know, was initially started by Dr. E. M. Beck and Professor Stuart Tolne, who originally, as I understand, at the University of Georgia. And then I believe Professor Tolne went to University of Washington. Again, I would strongly encourage you to check out this website. They've got a lot of great resources there that really encourage you to investigate, to learn more about this horrible chapter in American history, which I think you could safely argue, you know, has probably not ended but continued on on the different guys. To get started here in our studio, I'm going to go ahead and create a new project so that I can have a directory with all my data, my code, and any visuals that I end up producing. I've got an existing directory. So I'm going to browse to that on my desktop. I've called it Juneteenth. And so this is going to be the directory that I use to house everything I do for this project. And great, I see that I now have in my files corner down here in our studio, the Juneteenth.rproj file that tells me this is a project now, and that I've also got this weblist IDs CSV file, which is one of the files that I obtained from the CSDE database. So as I think about this data set, a couple of questions immediately come to mind. The first is what is the time course like for a number of lynchings? You know, perhaps across the United States, as well as by state, what states had the most lynchings? And basically, like how widespread was this practice of lynching? So again, these are the types of questions that I have that we will be approaching with this data set to get going. I'll go ahead and do library tidyverse. So we can have all the functionality from the tidyverse here. We'll then do read CSV. And the file is again, weblist.id CSV. And I'm going to go ahead and pipe that to view. So we can see what's going on in this database and what it looks like. And so this, again, is a rectangular file that has 3935 entries, 27 columns. We have information about, you know, this was a lynching, the date, the names, name, alternative names of the victim, their race, their sex, their age, the county, the state that had occurred, information about the mob, you know, what was the method of death, was the accusation, the source, all sorts of information to help tell a story about lynching in the United States. So one of the other things I noticed about this data frame is that I've got mixed capitalization for my column names. I'm going to go ahead and rename those columns. So do rename all to lower. And this then gives us our column names with all lowercase. So from there, I want to start thinking about cleaning up the data. One of the things that I want to do is I want to limit the analysis to only the black victims. There were whites and Hispanics who were subject to lynching. But again, the focus of what the story I want to tell is based on the black victims. So I'll do filter, victim race, I guess it's victims race equals equals capital B black. And so we now see that there's 3,246 victims there. So to look at the numbers by year, I could go ahead and do a count and do count year. And this will then tell me how many victims were there in each year. And so then this becomes kind of a starting place for moving into GG plot and building a first visual of looking at the time course of lynchings across the United States. I'll go ahead and save this as lynchings per year. And then do lynchings per year and pipe that into GG plot AES x equals year and y equals n. And then we'll do geomline. And I'll go ahead and save this as lynchingsperyear.pdf. And I'll do width, let's do five height five. I'm going to go ahead and clean up my access labels as well as the theme. So I'll do labs x equals year, y equals number of lynchings. And I'll do theme classic. I'd like to have a title that says something about the number of lynchings, the total number of lynchings, as well as the year that the number of lynchings peaked. So to get at that, let's come back up here to lynchingsperyear. And again, we have this data frame where we've got the year and the number of cases. And maybe I'll try to come up with the total number of lynchings. And so we can then do a summarize n to be the sum on n. And so you see there's 3,246. So I'll say this is total lynchings is that. And then let's let's get at the peak lynchings. So we'll do lynchings per year. And then I will do top n. And then we will do that by the n column. And we'll do n equals one. And so we see that occurred in 1893 when there are 137 lynchings. And so then I can do poll year. And so I'll say peak year. Okay, so I'd also like to get the range of years that we have data for. So to do that, I'll go ahead and grab my lynchings per year. And I can then feed this into a summarize function, where I'll do early equals min year, late equals max year. And so that gives me 1877 to 1950. And maybe I could go ahead and create a new column that I'll call range. And I'll use my glue function for that. And I'll do in quotes, I'll do early hyphen. And then again, curly braces late. And it's going to yell at me if I don't load that the glue library package. And so yeah, so we now look at lynchings per year. And I also want to pull the range to get that and I'll call this year range. All right. And now we can come down and add our title and say something like the title will be between and then I'll put in curly braces, range and curly braces. And I also before I forget need to put this inside of the glue function. So between those years, there were, and then we will say what was the total total lynchings. There were total lynchings with the peak occurring in and then peak year, I believe was what I called it. Yeah, peak year. And so now, so instead of range, that should be year range. And so I'm noticing a couple things. So I forgot a word. I need to say 3,246 lynchings with the peak occurring in 1893. I'd like to make this a little bit bigger and bolder. And so I'll say there were, let's get this cleaned up. And maybe we'll also do is theme. And then I'll do plot dot title, and I'll do element text box. Simple. And to get that, I also need to do library gg text. I'll go ahead and add some arguments to this. I'll say size equals 18 face equals bold. So that's looking a little bit weird. What I'd like to do is it's kind of justify the title to be the left side of the figure, not to be justified to the y axis. So to get it to be justified to the left side of the plot, I'll go ahead and do plot dot title dot position. And I'll then say plot. So that moves it to the left side of the figure. I am still noticing though that my text gets truncated up at the top in here, add margin information. So I'll say margin equals margin. And again, I believe it starts at the top, the right in the bottom. Yep. So do top equals five, zero, b equals five, and then left equals zero. So that looks a little bit better. But let's go ahead and add some more to the margin. So I'll do 10 and 10. And I think that looks pretty good. Maybe we could just make the bottom. Let's go ahead and make that 15. That looks nice. I think one thing we could do to add a little bit of polish would be to add a comma after the three to do that. I'm going to come back up to total lynchings. As you'll recall, this creates my n. I'm going to go ahead and do a mutate on n to then be format on n. And I'll do big dot mark equals and then encodes a comma. And so now if I look at total lynchings, I see I've got that three, two, four, six. And I'll go ahead and do a poll on that big n. So I think this looks pretty good. One thing that as I read this between 1877 hyphen 1950, when we just go ahead and put an and in there instead of that hyphen. And again, we can do that up here. We have the range glue early and late. So I think that looks pretty good. I'm happy with this. One thing that I'm kind of thinking about is color. I don't generally don't like having color for color's sake. I feel also that this is a pretty stark grim story, right, that that there is this story of lynchings in the United States. And I don't feel like making it prettier, you know, adding color to kind of distract from that story. And I think a simple black and white figure would go a long way. The next question that I had was, how did the lynch number of lynchings vary by state? So I'm going to go ahead and create another R script now that I'll save as Juneteenth by state. And I'm going to go ahead and grab some of this other stuff that I had for reading in the data, right? For now, I'm going to go ahead and remove that variable lynchings per year, because I want to get lynchings per year by state. So I think I know what I want to do that I want to count year and state to see what things look like. So again, this was the data frame we had counting year and n. If I do year and state, I don't think state was the was the column though. So let me double check what the column was. I believe it was lynch state. So it was the state that the person who was lynched was from or in. And so now I see that I have year lynch state and n. I'm going to go ahead and call this lynchings per state per year. That's a mouthful for sure. And we can then think about, you know, how might we plot this data? Well, we can make a line plot, right? So we could do ggplot aes x equals year y equals n and then color equals lynch state. So we're gonna have 12 lines. I already know this is not gonna look good. So we'll go ahead and do a GM wine on that and see what it looks like. And then I'll save this lynchings per state per year dot PDF width equals five height equals five. So again, this is a hot mess. This is not very attractive. Something that I am thinking about though is what if we were to turn this into a heat map, we've got 12 states, so we could put each state as a different row. And we could put each year as a different column. And then we could turn up the heat or turn up the color, perhaps from like white to red or something like that to indicate those times when there were higher numbers of lynchings in a state. If you've seen my episodes before, where I talk about heat maps, you know that I think they're appropriate when we're looking at more qualitative comparisons of the data, which is really what I'm trying to get after here trying to look for those peaks in cases of lynchings. So x will still be year y will be the lynch state and color will be the end. And I don't want GM line I want GM tile. So this more or less looks right. I did my silly thing again of using color instead of fill, but you can kind of see where the cases were again on a very qualitative basis. Let's go ahead back here and instead of color we'll do fill. And we again see that like, you know, Mississippi seems to have higher numbers throughout the history. Louisiana had one really bright year here as well. I'm going to go ahead and clean this up. I'll do theme classic. Also do scale fill gradient where my low I'll make white so ff ff ff high of so red is the first channel and zero zero zero zero zero that's the hexadecimal. So that gets us pretty good shape. I might like to go ahead and remove this y axis go ahead and do theme and I'll do access dot line dot y equals element blank. And I'll also do labs x equals year. And I think the y is obvious that that's the state. So it looks pretty good. I don't know that I need those tick marks. And maybe it might be nice to clean things up a bit and get rid of that x axis line. So I can remove that dot y to remove both axis lines. And then I can also do access dot ticks equals element blank. A couple of things that I know I want to do is I need to change the name of my legend here. And I also want to bring in the actual names of the states. So looking at the legend, I can do here name equals number of lynchings. And maybe what I'll do is I'll put that lynchings on a second line with back slash n. I'd like to have a zero line on my legend. So what I'll do is I'll go ahead and do limits equals C zero to an a that gives me the lower edge to indicate that white means zero. Next thing I want to turn my attention to as I mentioned is bringing in the actual state names. So our has a number of built in vectors. So there's like letters, right, which gives you the lower case letters, uppercase letters. But you can also get state dot ABB for state abbreviations and state dot name for the full name, right? So I'm going to create a data frame of abbreviations and names that I can then join to my data to get the full name of each state. So coming up here, I'll do a table. I'll do ABB state dot ABB, and then name equals state dot name, realize I forgot my E there. And so that again gives me a table. And so I'll call the state lookup, right? And then in my lynchings per state per year, in here, then I could go ahead and do an inner join. And then the data that's coming through will go into that period. Then I'll join that to state lookup. And then I'll do buy. And then I need what's coming through is going to be lynch state. And that's going to equal to the ABB. And that needs to be in quotes. And so now if I look at lynchings per state per year, I see that I have the name the year the N. And again, instead of Y being lynched state, then I can do name. It looks good. I think one thing I would like to add to each state name is the total number of lynchings, perhaps in parentheses, right next to the state name. And I can do that. If I come back up to, again, lynchings per state per year, that's the data frame I'm working with, I can use that to get a lynchings per state. So I'll go ahead and bring down my lynchings per state per year and group by and the name is the state name is now name. It's kind of unfortunate name. And we'll do summarize capital N equals sum on N. So now I have my N for each state. And I can then do lynchings per state. So I'll create a column that has the name plus the number in parentheses. And so I'll again do glue. And in curly braces, then I'll put name. And then in parentheses, maybe I'll put the capital N. And again, that needs to be in curly braces. And so we see now that we have the name and then that glue column. And so that's not quite right. And so this I'm going to make this to be state equals that the next thing I want to do is bring in that stylized name into my lynchings per state per year data frame. I can do that here online 24 as part of my overall GG plot pipeline. I'm going to need to do that in a joint. So I need to double check that I know what the column names are. If I look at lynchings per state per year, the column that I've got lynch state and name and lynchings per state, I've got names. So I can do an inner join on that name column. So I'll take that with lynchings per state by equals name. So that joins together and looks good. I can then fold this into my GG plot. Instead of y equals name, I can do state. So now I've got my state name with the number of cases in parentheses, that looks good. I would like to order these rows by the number of cases to do that. I will come back up here to my inner join after this. And I will do a mutate. I'll do state equals fct reorder state and then the by column is going to be the end. And so that should be a column that's again coming through that lynchings per state, right? So I've got this, I'm going to order the state column, this last column by the end. And now, as you can see, Mississippi is at the top, followed by Georgia and so forth. With the lack of a map, we don't really have geography to sort these. Previously, we sorted them alphabetically. I don't think that makes sense. I think this makes sense to alphabet alphabetize by total number of lynchings across this time period. I'd like to go ahead and put a title in here. And so I'm going to come back to my Juneteenth by year, and I'm going to grab that title. I'm going to recycle some code. And so we'll go ahead and put a comma there title between year range. I'll say Mississippi had the most lynchings. And I think I got extra parentheses there. Okay. So I need to get year range. And I think we saw how we could do that up above here, right? So lynchings per year, we'll take all that. So we'll put that here. We need to modify this a bit because we don't have lynchings per year. We have lynchings per state per year. And this gives us 1877-1950. We see we still have this problem with the styling. Again, I'm going to recycle my theming from over here, this. And let's go ahead and add it to the rest of this theme stuff where we were modifying the axis ticks. And we should be good. One thing I noticed that we have a little bit extra margin space around the title. Something else that occurs to me is that we could perhaps program in the Mississippi. You know, perhaps the database gets updated and now Georgia is above Mississippi or whatever. You know, who knows. So let's go ahead and do that where we can do lynchings per state is again the data frame that we're looking at for that. We could do lynchings per state, top n, n equals one, and do it by the n column. That gets us Mississippi. And then what we could do is pull name. And we'll, so I'd give us Mississippi. Let's go ahead and put this on separate lines. So it's easier to read. And we'll do peak state. And then in here, down in our title, instead of Mississippi, I could put in peak state. So now we've got Mississippi embedded in the title, using code without me having to type it and worry about misspelling Mississippi. Again, let's go ahead and shrink that margin, perhaps down to 10 and five. One last thing that I'd like to do is put a caption in on both of these figures indicating where I got the data. I'll come back to my title here and add caption equals data provided by CSDE lynchings database. I will put that into my Juneteenth by year labs as well. So like we saw with the plot title, it's aligning it relative to an access. And so it's putting it basically to the right edge of the x axis. So I can move that all the way over to the right by adding here plot dot caption dot position equals plot. I'll also do plot dot caption equals element text. And I'll do face equals italic. And I'll make sure I've got this same styling for my by year figure. So I think both of these figures tell a compelling provocative story. And one of the things that I kind of like about having both of these together is that they allow us to kind of drill down into the data. This first one allows us to look across the United States. It's again, those 12 Confederate States plus West Virginia, West Virginia for those of you that don't know was part of Virginia before the Civil War and then seceded from Virginia to become its own state and then drilling down now to the state level over time. One of the things that kind of rings in my ears when I recall the protests from last summer and thinking about Black Lives Matter and whatnot is kind of the constant refrain of say their name and saying the name of Breonna Taylor and so forth. And so one of the things that I think gets a little bit lost in this is that we kind of boil it down to 3,246 lynchings and that we're not saying the name. And so in some contexts, it's not possible to get to that very fine scale level of data that this peak represents 130 people and they're actually people, right? They're not just data, they're people who were killed. And so what I'd like to do in the next episode is think about how can we say their name with these data. So I encourage you to make sure that you're subscribed to the channel so you know when that video is released and you can see again what I'm going to do as an approach to say the name of the victims of these lynchings between 1877 and 1950. And I think I've got an idea that I'm kind of excited to put together and to share with you. Again, I hope that you, even though this is not my usual fair of microbial ecology data or whatever it is I talk about, hopefully you can see that there's really nothing new here, right? Everything that I'm doing here is what I've done in previous episodes and that one of the great ways that you can practice your skills, learn our, is to take those skills and apply them to something very different. And also, you know, in some ways have a responsibility, right? So I hope again, this is my hope that I can build awareness of our history of lynching, the build awareness of Juneteenth as a holiday to commemorate and through data visualization. And so I would really encourage you to figure out, you know, what are you passionate about? What is what's going on in your community, your part of society that you can use your R skills to better the community. And at the same time, of course, improving your own skills. So anyway, please share this with everyone. I will put these figures up on the blog post that's again, linked down below in the description of today's episode. So again, check this out. If you're interested in the data set, I'd really encourage you to go to the CSDE website. I'll put a link for that down there as well. And we'll see you next time for another episode of Code Club.