 So thank you everyone for coming today, whether you're joining us remotely or here with us in the room. I'm Rebecca Cummings. I'm the Interim Director of Digital Matters. And before I turn the time over to our presenter, I do want to make one quick announcement. And that is that at that time of year again where we're accepting applications to the Digital Matters fellowships for fall 2023. So the deadline for that is March 31st. If you have an idea for a digital project that you'd like to see funded, please do feel free to reach out to me. So without further ado, I am so pleased to announce our speaker for today. Who is Dr. Kaley Alexander, who is an ACLS Emerging Voices Fellow and also in residence right now in Digital Matters. Kaley has her PhD in art history and visual culture from Duke University and is currently teaching data visualization and culture here at the University of Utah. Kaley's talk for today, as you can see, is depth by data auditing the U.S. Cemetery landscape. Please join me in welcoming Kaley Alexander. Thank you, Rebecca. And thank you everybody for joining today. I'm going to be talking about a really new project if I knew, you know, the starting in the fall of 2022. So it's definitely a work in progress. And I'm going to give some background into what led me to this project. Talk a little bit about the early process and methodologies that I'm using. And then hopefully a little bit of result, but we'll see. And I'm always, you know, open to comments and questions and suggestions at the end of the talk. There we go. So the project that I have started here in Digital Matters, I'm, you know, calling for now at least the U.S. Cemetery audit. And you see we've got our lovely Salt Lake City Cemetery there representing this project for now. So this project is part inspired by the monument labs, monument audit that came out about two years ago. They basically get a survey of all the monuments in the U.S. trying to look at the different demographics that were represented in public sculpture in the United States. And I thought, well, why can't we audit the cemetery landscape as well? And what would that mean? I'm going to go a little bit background on myself and how I came to this because I am an art historian. What is an art historian doing, studying, you know, cemeteries as these kind of built environments. But I've, I started my Ph.D. working on French cemeteries in the 19th century. My book is going to be out in August. So this is a little plug for that will be on pre-order in July. What I was working on was really how can we study monuments that no longer exist in an aggregate form. In France, you're only guaranteed five years of burial. So over time, there's a lot of missing monuments, which gives us a very different view of the cemetery than, you know, what we see if we go to famous places like Perloshes today. So I wanted to study these more vernacular forms of non-elite individual that didn't survive into the present, things that haven't been deemed cultural heritage, and try to see what those types of objects could tell us about people and how they were commemorating their loved ones in the 19th century. So I was really interested in survival bias and how to use data to combat survival bias. Of course, we're never going to eliminate all of that bias, but at least working in the aggregate, we can start to see patterns and trends that we don't get in traditional forms of our historical research where we're looking at one or two objects in detail. So I was using a lot of text-based sources as proxies for this inaccessible material evidence. We had books where I had records of 2,000 monuments, and it was like, okay, that can give me a little bit of a start. And then I also, because the French are really meticulous about record keeping, I had burial records for everyone who had been buried in the city of Paris from 1804 to 1902. I did not gather the data for that far because there's a lot of manual transcription. But combining these different sources together, I was able to somewhat account for what monuments would have looked like in the 19th century and what the individuals who purchased them, what their demographic makeup would have been and how they were really perceiving, you know, death and commemorative action of this time opposite of the lead individuals. Also, I've been an editor for the Radical Death Studies blog for the past couple of years. So Radical Deaths, and they're collected for Radical Death Studies is an organization of both scholars and professional death workers working to decolonize death studies. And they do a lot of different work in the intersection of very different disciplines, working with different geographies, but really trying to de-center the typical narrative of American cemeteries and burial practices from white, European, and centric narratives. So I was working with them, working on my French project. And then I also start collaborating with the Friends of Gears Cemetery. Gears Cemetery is the first public African-American burial ground in Durham, North Carolina. So that's where I was based while I was doing my PhD. And as I was finishing up my dissertation, they were getting ready to set up this exhibition in the cemetery called In Plain Sight. And they really wanted to give the history of the cemetery and show how the cemetery reflected the history of Durham's rich black community, which was known in the era of Jim Crow for its black Wall Street, among other things. So my contribution to the In Plain Sight exhibition was really about how to mobilize data to study the cemetery in depth. The cemetery is not in great shape. They're still cleaning it up. Almost every weekend they have cleanups, but there's a lot of work to be done still. There's not a lot of monuments available. To us, there's about 200 extant monuments. Some have been uncovered recently as they've been doing more archaeological work in the cemetery. But there's not a lot of physical evidence in the cemetery to go on to tell us the story of the individuals buried there. That said, the Friends of Gears Cemetery, which kind of formed grassroots in the 1990s and then officially in 2003, has been working for the past 30 or so years to gather as much information on people buried in Gears Cemetery as possible. One of the things that this has taken the form of is a Google Drive folder with a lot of different documents, which at the time they were calling a database, but wasn't really searchable and it wasn't really useful in terms of studying people en masse. One of the things that they had been collecting though were these death certificates for every person they could find that have been buried there. Like I said, there are only about 200 monuments in Gears Cemetery, but by 2021 they had records for about 1,500 individuals based on archive records and documents like this. I see this and I see a wealth of data that we can use to study the community in various ways. So I went to Family Search, which is a nice, easy tool for tracking death certificates. And they had most, not most, they had more than the death certificates than the friends of Gears had collected over time. And a lot of this information had been transcribed. So I let a team to kind of scrape all this data from Family Search, clean it up, and then add to it. Because as you can see, not all of the fields that are represented on the death certificate were transcribed on Family Search. Things like the attending doctor, the hospital where the person may have passed away, the cause of death is not listed. For instance, even the names of the undertakers are not transcribed here, but it are on the death certificate. And this was a really rich source of information. So after scraping it, we kind of took to manual transcription, filling in all the blanks from the death certificate scans themselves. And we ended up building a database for mostly for descendants and genealogists to go and research individuals by name and get a little bit yet about what that person's life might have been like. So their age, when they were born, occupation, neighborhood they lived in, this kind of summary information, and then of course leaving out what might be more traumatic information like the cause of death, which often was a racialized category at the time. So we kind of wanted to keep that on the back end and make sure people knew it was available if they requested it, but not make it so public and in your face here. So what do we do with this next? Now we have a tool that ancestors and genealogists can use, or sorry, descendants and genealogists can use to search for individuals. But I also wanted to deploy this as an aggregate study to try to figure out how we can study the cemeteries and microcosm of the city as well to find out more information because it had been seen as an elite cemetery and we wanted to kind of test if this was true or not. So we started plotting individuals that were buried in the cemetery and we wanted to give it this kind of humanizing effect as well, where it's not just a bar chart where people become increasingly anonymous, but really thinking about visualization techniques to humanize as well. So here we have every individual represented by the year in which they were buried and this is an interactive visualization. You can hover over the block to get information about the individuals themselves and then perhaps be prompted to go a little bit deeper. That said, we did want to build some basic charts to understand what's going on. And like I said, we started with about 1,500 individuals. We ended up with over 1,800. The list is still growing, this is something we're working on. We have a data collection team here. And what this allowed us to do is really track the cemetery and what's going on. There's no death certificates prior to 1909. So no information. We can't really say anything about those who were mostly born during the era of slavery. We mostly have that kind of middle of Jim Crow era data. And then you can see that the information starts moving forward. And that's about when the city actually opens a city-run cemetery for African-Americans in Durham. So people start moving to that location instead. And then the history of gear kind of stops there in terms of burial history. So we know a lot about this chunk of time period now based on the death certificates we're able to look at different types of occupations. This allowed us to see that it was not such an important time period. It was a time period where we were able to see that it was demographically diverse, representing people from all different types of walks of life, from doctors to servants and what not as well. So a much richer history of the cemetery came out of this. Which made me think, okay, we've done this kind of study of one cemetery using data working in the aggregate. And so that represents a deep gap in the history, whereas for a lot of white cemeteries at the time we have really detailed documentation to study individuals from. So I wanted to see, is there a way to kind of pull as much information together as possible and maybe study how different groups are underrepresented. And so that represents a deep gap in the history whereas for a lot of white cemeteries at the time we were able to see how different groups are underrepresented in the cemetery landscape. So that's the origin of the cemetery audit today. So questions, how can we study cemeteries on a large aggregate scale? What might that aggregate study of U.S. cemeteries reveal about access to burial or representation in burial spaces? So do the cemeteries and the people buried there represent each other? So are local communities represented in the spaces of death that are adjacent to them? And then what methods could be used to undertake such a study? And that's a big thing for me, I'm really interested in the methods of doing data driven humanities work and trying to find humanistic solutions for kind of computational humanities. So the data, as many of you may know there's lots of different websites out there for tracking burials. There's Fine Degrave, Billion Graves and Tormund.net and all of these have really rich collections of data accounting for millions of burials across the globe. Ultimately I wanted to keep my data consistent at least at a starting point. So I opted to just focus on Fine Degrave. It's a well-known source of crowdsourced data. It was founded in the mid-90s as kind of a hobby project for tracking famous graves. Eventually it became a much bigger data source and since 2013 it's owned by Ancestry.com. They have over 400,000 known burial grounds accounted for in the US and over 107 million individuals. This is considered one of the most comprehensive databases though it doesn't have as much information about the cemetery itself which is definitely a gap. But in terms of scraping the data the website's also got a pretty consistent format that made it easier to collect the data from. So consistent web layout and I'll discuss some of that in a moment as well. So the methods that I'm using you can call it slow digital art history. This is a term borrowed from Coon Brozans at the University of Luba in Belgium. Slow digital art history is basically gathering data on objects over a long period of time. It's a project that continues to grow and doesn't have an immediate maybe result but over time will have more and more rich results to come. So very slow digital art history on this project. Then I'm using web scraping to find a grave. So I spent the fall semester using Chrome's web scraper extension to gather data on 20 states in the US including DC. And you chose the web scraper because I'm not a coder I come from a humanities background and I like the web scraper as a kind of first step tool for understanding how web scraping works kind of a non-coding version of getting your feet wet. So I basically created two different selector charts which you can see here to grab the data. And what the web scraper does is I told it to look for the city grab the name of the city then go into the cemeteries here grab the names of all the cemeteries and then for each cemetery grab again the name the location sometimes it has coordinates sometimes not so that's something we'll have to think about the cemetery ID so I can always link back to find a grave and check information and then just the number of memorials there. So this is important because it's not the number of burials it's the number of memorials recorded on this particular dataset. So this isn't really a question of how many people are buried there but how many people have been reported as being buried there. I had to do a second scraper because as you can see sometimes they have it broken down by city sometimes it's just the cemetery on that main level. So then I created a second one to grab the name of the cemetery and then go in and grab the same information just on two levels instead of three. So then after grabbing all of that of course the state is very messy the web scraper doesn't allow you to grab everything in the most optimal way you have to kind of grab boxes of information and kind of the bigger the box the more information you're likely to get correct the first time and not have to redo it but this means that the data needs to go through a big cleaning process to make sure there's not too much information in single cells. So I use open refine here to split up the different cells look at the different components of the address for example but I was also using this as a method of categorizing new cemeteries. You can see the variables for context type cultural religious remains as a binary variable representing whether or not the location houses remated remain new and these were all kind of built from text mining in the names of the cemetery new perfect method but if you search for certain keywords you can start adding these types of attributes and trying to understand if they're specific people that are being buried in certain spots so if it's a religious affiliated cemetery it's a military cemetery is another one that came quite often a lot across what type of cemetery is it so it's a military history we have cemeteries but we also have burial grounds and graveyards we have individual family plots and whatnot so I wanted to gather that information and then if it was religious what is the religious group that's being represented if it's a specific cultural group that's being represented I wanted to grab that as well for instance you can see if you just type in the word Baptist chances are it's going to be a religious cemetery so that was a pretty easy one but it's a lot of trial and error too because you have terms that will appear in multiple types of titles so for instance me being Jewish I thought oh if I type saint that'll definitely be Catholic well apparently it's not just Catholics that have saints so that didn't work I had to redo it also finding the Jewish cemeteries was a lot easier for me because I know the keywords to search for so there's definitely some bias that's introduced there that I'm going to have to reckon with as we go along so next step was trying to visualize this because now I have like 90,000 cemeteries and I can't really comprehend all this information on its own you may have seen this on our advertisement for the talk today so here I created two textile maps one showing the density of memorials one showing the density of cemeteries all the white ones I haven't gathered data for yet so more looks like this but that doesn't look like the US but just to give you an idea of what I have so far and you can see states like New York have more memorials but less cemeteries states like North Carolina have the most cemeteries but maybe fewer memorials this is because in New York you have a lot of big cemeteries with thousands and thousands of people in North Carolina you have a lot of homestead burials with maybe two or three family members buried there so this is another thing they have to account for what this data actually represents using the individual locations too you can plot them in Tableau here and try to figure out what the different locations represent I built a couple of dashboards which you can access with the QR code if you're so inclined this is a very, very beta version of the dashboards just allows you to kind of explore a little bit the data hover over the dots and see which cemetery it is you can filter based on cemetery types if you want to look at memorials versus cemeteries in the map on the right you can if you want to look at just religious cemeteries you can do that as well and sort of get an idea of what's represented alright so some very preliminary observations before I talk about how I'm going to completely change this project so the first finding is reporting itself tends to be overwhelmingly white and Protestant this is not surprising this was also the finding that the monument lab found when they did their study of public sculpture this is probably less true of the actual site than it is of find a grave and the people who use find a grave but I still find that to be an interesting thought who is reporting whom is a really interesting question that I'd like to answer for instance we have a lot of indigenous burial grounds that are recorded on find a grave but then it says zero memorials so what does that tell us about who is being accounted for and who should be accounted for so that brings us to inequities and reporting and research attention some cemeteries Greenwood in Brooklyn lots of information other cemeteries I don't know in St. George Utah not so much information on these pioneer cemeteries so I need to refame my research questions and definitely need to update my methods and this led me to the question to code or not to code which is the dilemma of digital so again I am not a coder I'm self-taught on all the technologies that I use and when I teach to I like to use technologies that are very user friendly from the start kind of drag and drop systems that a nice programmer has made for us to use and then I can revise later if need be but well I was at digital humanities Utah this year I met Jesse Vincent who's a programmer at BYU in their department of digital humanities and he basically said well you spent the whole fall gathering data and you could have done it in two days and I'm going to help you do this in a much more efficient way but I still see the benefits of this kind of trial and error right because you get to test different tools you get to see how the technology works and then you get to go and collaborate with people who actually know what they're doing and can refine the project so not all was lost so my next step is to throw out the 90,000 data points I have and recollect the data so Jesse has written a Python script for me which works with layout inconsistencies so that means that I will be able to gather data not just from find a grade but eventually I can also gather a billion grades and kind of sync the data up to fill in gaps and missing information it also scrolls automatically so I don't have to go one page at a time which is nice and I can gather more detailed information faster so I was gathering just the cemetery level data but now I can gather memorial level data too so his test run which took a couple of days and made me very angry he got 54,000 almost 55,000 cemeteries scraped an index almost a million different memorials across 816 different counties and basically this is the data that I was grabbing just the number of memorials but with his scraper I can gather this information too you can see there's at the bottom of the screen there's a description of the cemetery not every record has this so it would have thrown off my scraper if I tried to grab it for some and not for others now I can grab it but it's still a starting point he's also and I don't know if you can see it because of the zoom thing there but you can also grab also known as different names of the cemeteries which again wasn't at every single page so I couldn't grab it before now I can and then it goes into the few memorials which I thought was going to take me years and years and years to do and it'll grab the life dates of every person included in that view memorials tab and this helps me to answer one question that I wasn't really sure how to address before which is how do I add a time series to this data and maybe you know in the description it'll say a founding date here and there but ultimately that's going to be very rare in the scale of the data but with this memorial level information I can grab first known burial year which will allow me to kind of estimate when the cemetery open and see how the development of the funeral landscape in the US developed from the 18th century to the present century so I can continue working on my categorization methods and levels once I have the other known names of the cemetery this might become easier as well because I'll have more information to go on descriptions to will allow me to look for more information and evidence about what type of cemetery this is also I tried my hand at a little bit of topic modeling didn't yield very much but it still needs a lot a lot of work because 50 clusters is way too many to comprehend but a couple this is just topic modeling on the names of the cemeteries for now but a couple weird things do emerge you can start to see that cemeteries are being all kind of consistent names so memorial garden is something that comes up quite a bit that's a mid 20th century kind of way of talking about cemeteries that denies death a little bit in traditional literature but you can see here it's also associated with war and military cemeteries and that's an interesting decision if you're going to start calling it a memorial garden to cover up kind of the traumas war you can see also some religious words that are clustering together so what's going on what are people calling it fiscal cemeteries for instance what what did we know about evangelical cemeteries and so on and so forth it's interesting to house is the number one which makes sense because you have a meeting house and the cemeteries right next to it so you can see the model is working quite well but it will continue to refine once I have more detailed data about the individual cemeteries yeah so next part two is to use the descriptions to find the first reported burial force as I mentioned also going to welcome any suggestions on finding ways to account for area Justin Sorenson here found a great data layer for GIS that had all the cemeteries in the US but it had almost half the cemeteries that I had and it didn't have area included so there's not that data out there but thinking about how to account for area so you can really look at you know number of individuals buried per square meter or something of that nature and so I think that number of cemeteries and cemeteries are more crowded than others is definitely a question I'd like to answer more visualization trial and error apparently python trial and error as well and then yeah I'll open it up for questions now suggestions I know that was fast and not a lot of information but hopefully it sparked something sure presentation Caley I think you're interesting point in this project. One of the questions that I had is simply have you tried in the other of the sort of repositories. Yes, I mean it sounds like you've got a more robust way to capture more data and find a grave. I think it would be really interesting just to see like, you know, our different graphics using different methods, if at all, or different, you know, repositories or whatever. So I'm just curious if you've taken a look. I haven't taken a look yet. I know that in terms of that tends to have more historical information about the cemetery, which definitely one. Yeah. But with so much information on the website it's hard to tell from a glance which groups are reporting more on which all tend to be used for genealogy. So they're built for that they're not built for people looking at all the cemeteries in the US, but no that's definitely something I want to explore once I get the find a great data straight. You know find space on a computer for all that, and then I can go in and get the other repositories as well and yeah definitely comparing the three. But I need a team for that team and money. So there was so much I found exciting about this presentation, especially that you found such a good collaborator at DHU. I think that's like the point of why we do it so it's really great to hear how you two connected, and how Jesse's been able to help your research. So one question I have for you is just find a great know you're doing this like have you corresponded with anyone there. I'm not. I know they're just in the high school. And I've spoken to a couple of people are like, Oh, do you know the founder. I feel a little silly saying no. I probably have to do that though so they know. It'd be kind of interesting if you know at some point they had a link to your data set where other users and find a great could interrogate your data set over time. Yeah, I mean I definitely, you know, like to, to make it something that anybody could interrogate over time because I have a set of questions but there are endless number of questions that people can ask even looking at self sets of particular. On the sort of memorials age for those hyperlinks for each individual name. Yeah, so you can ancestry profiles. So those sort of profiles on finite rate. But they, they might link to documents that are done on ancestry, but to keep things anonymous I don't want to grab all that information. I think the stripper will automatically grab their name and then I'll wipe it out and give a record number but yeah what I'm interested is just the dates for that. We have a question online. Yeah, if you want to take one of these. So coming from I think it's Canada staff regarding size monuments and occupancy of the cemeteries will you try to account for as an example. Land burials versus mausoleum versus Columbia, I'm not sure that is each with different numbers of interments and space. I personally don't know a column bar is where you kind of inter pre made it remains. And those I have accounted for because some of the cemetery names include that there's a column variant at that site and so I've accounted for that with like a binary variable that says yes or no it's there. So in terms of accounting for memorial size that's hard because none of the sources I found actually give the dimensions of the tombstone so without somebody going out and measuring everything that's kind of difficult to say. If I find a way to get the area of the cemetery one thing I do want to do is calculate for density because one of the things we found a gear cemetery for instance with that. The cemetery was almost twice as crowded as the local white cemetery. So that's definitely a question that needs to be answered and we'll give a lot of weight to this project, but again this is very slow digital history so it's going to take some time to get to that point but it's a really important question. I don't know I might have to chunk it up. Okay. I've not worked with the data set that big yet. I assume I'm not going to be doing it on my laptop. I assume I will not be putting it all into Tableau either. So, I'll be so curious to find out if open refine is actually. Yeah, for that amount of data or not I have no idea. I mean already with the 90,000 it was moving pretty slow when I was using text filters. Yeah, for doing like big batch changes it's fine. Yeah, but I think I'm going to have to split it up and clean per state, ultimately. The cemetery is not a segregated cemetery. Is there a way to keep track of race or is that even knowable or some of this greatly. It's not knowable for public and religious cemeteries at large. For instance, I do have a lot of cemeteries that are marked AME so after that the statistical for those you can assume that it's mostly African Americans buried there. There are some burial grounds for formerly enslaved as well that we can account for. So it's not perfect but you can start to get it marginalized spaces of death. Asylums and institutional burial grounds that I'm trying to account for as well. But yeah, not perfect, but it'll give us some idea. To the individual memorial pages, I find a grave that maybe linked out stockings the ancestry for those indicate race as well. They could I mean a death certificate from, you know, the early 20th century would. You have to think do you want to go that deep and maybe violate privacy issues as well. Another thing we found using death certificate data is there tends to be some really purple terms used, and I don't want to replicate those traumas either so it's it's a balancing act that we're going to have to figure out as we go. Yeah. So another fascinating thing to me about this is, you know, there's sort of this debate in digital humanities centers it's like, do you equip people to learn the digital tools or do you hire programmers to help them with their research. And I feel like you hit this inflection point like you spent a ton of time learning digital tools which is sort of in the digital matters system up to this point as we fund people to learn on their own. Whereas like BYU has a lot of programmers to help with their research. Do you mind saying a bit more about like the pros and cons of each of those different systems. Yeah, I mean, I think you need both. Honestly, I think having a programmer in, you know, proximity to you is something definitely necessary once you start getting to a certain point. You know, down the digital humanities rabbit hole. There's definitely projects that don't need programmers. And my first step is always to try to do it myself without a programmer. So I at least understand the data that I'm gathering. I found this to, you know, with going through the French burial records. There are a lot of tools out there that'll, you know, try to OCR handwritten script, but it's imperfect. And then I'm not engaging with the sources much and I'm not getting that information as a historian out of it. So I'd like to do this kind of combination of really easy tools, manual labor, and then when necessary go to the programmer. As I said, working with Jesse is kind of in the middle as well. He's written the script, but I'm working with him that he'll teach me how to run it. So now I've gained that skill as well. And then I can, you know, engage with my data firsthand rather than having somebody else do it for me, and then transmit it. But I, you know, it's a delicate question and everybody comes with different skill sets to digital humanities. And it's still even, you know, so many years now that's been here it's still kind of an emerging field right. And we don't traditionally train humanists and programming so you know if you have somebody with a computer science and history background that's great but that's probably not where you're going to have but that's why digital commanding requires so much leverage. So again anyway wants to work on this project. It's definitely not something I can handle. Well, I mean I'm crazy. I won't come at each grant or something like that. Ideally that would be, I would like to get a research group together to work on this. So yeah, grants will probably be done one. We'll have any more questions online or in the room.