 Hi, I'm so glad people came. I wasn't sure if anyone would actually care about this, so I'm glad you did. I actually went to college in Portland. Did anyone else are reading in here? I know Gabrielle is. We got it. Yeah, all right. So, so cool. So it's awesome to be back here. And I teach digital humanities at UCLA and I wasn't sure coming in if people would actually know what that is or not. Does anyone, like, who knows what that is? Maybe, sort of. Okay, cool. Well, the fact is, like, if you don't know what it is, you're in good company. A lot of people don't know what it is and there's even a website called, like, whatisdigitalhumanities.com, I think, and every time you refresh it, you get a different definition. It's very frustrating. I like to say it's the use of digital tools to explore humanities questions. And I always say explore and not answer because, as you probably know, the humanities isn't really about definitive answers to questions. Like, you can't just be like, there's one interpretation of Jane Eyre or one interpretation of, like, medieval art. You're always kind of gesturing toward interpretation rather than coming to distinct conclusions. And because of that, we have some particular needs and difficulties in our field. That I was hoping might be interesting to you. I was hoping they might be interesting enough to even get you to, like, think about working on some projects or tools or talking to us about getting involved because, as you'll see in a minute, we have some needs. So why study data in the humanities classroom? As you know, we probably have some humanities majors in here. The humanities, you know, is about interpretation, close reading, you know, we love paradox and changing the terms of the debate. That's kind of our thing. And that's not necessarily something one associates with data, except that, as you know, like, the entire cultural record is slowly or quickly becoming digitized. And it's become important, I think, to help students and other people who think about the humanities to not cede all of that digitized cultural record to people who are not trained in the humanities. So we've been, oops, excuse me, we've been at UCLA really working hard on helping students to think about data as a representation of one view of reality. So developing some fluency and literacy with data, but also understanding it in context, and that's certainly not the only way to think about the world. Right now, we have about 70 students in our DH minor, and we're growing pretty fast. We've only been around for about five years. And I think DH programs are popping up all over the world. I mean, certainly in the United States, I think they probably number in the hundreds, at least. So it's really a growing field. At UCLA, we're on the quarter system, which is 10 weeks. So in our intro class, which I usually teach, we have 10 weeks to get students up to speed very, very quickly. And our students are really lovely, but they tend not to come in with a ton of data literacy or indeed technical literacy. The typical DH minor is an undergrad non-STEM major. Their experience with technology is usually limited to like Microsoft Office, you know, Word, PowerPoint, and maybe Excel. Maybe Photoshop or the Adobe Creative Suite. And we have 10 weeks in the quarter system to get from like an absolute zero to data literacy, while also trying to incorporate a lot of theoretical reading and discussion about like what the humanities are, what data is, et cetera, et cetera. So to try to get them there, you know, we want to teach them how to work with data in a basic way, but we also want them to be able to put a data set into context with a lot of critical reading and research, because otherwise, you know, a data set visualized on its own doesn't really mean anything. So it's like if you're working with data about comic book characters, which is one data set we've used in the past, and you're trying to visualize genders of comic book characters, we want you reading all the critical secondary literature on like gender and comic books and narrative art in addition to doing that database and statistical work. And so the assignment that we use in my DH 101 class is to, everyone kind of adopts a data set, like in sitcoms, you know, people will have to adopt a bag of flour and treat it as their baby for a few weeks, and it's kind of like that, but with a data set for groups of students, it's kind of their baby for the quarter, and they have to get from zero to being able to visualize it, map it, write a narrative about it, and then put the whole thing on a website. And it's like a Swiss watch, like nothing can fall out of place ever, because if we do, the whole course goes off the rails. And I've noticed, I wanted to share with you some features of the kinds of data sets that my students can work with versus the ones they can't work with. These might be surprising to you because you're used to working with people who have some data literacy, but I've noticed some very specific things about data my students can work with. So first of all, they need some context to the data set, they need some background about where it came from, how it was recorded, so not just stuff about what kinds of characters are in a field, but actually where did that field come from. They need some kind of model, like it was really helpful for them when they were doing visualization of a data set about comic book characters to see 538s work on those characters. My students can't work with APIs. I know we all love APIs and things like that, but they're not out of place yet where they can do that. I mean, I can do it at the beginning of the quarter and get the data for them and prepare it, but the less I have to do it in advance, the better. If I don't have to learn someone's API, then I'm a lot happier. So in keeping with the title of this conference, they love CSVs, they can't work with XML or JSON or turtle or anything like that. They need to be able to double click the thing and open it in Excel, because they want to see the structure of the data set, and they want to be able to drop it into something like Tableau and just immediately visualize it. And the ideal size for my group is like 2,000 records, because it's too big for them to individually change every record, but not so big it'll crash Excel. So these are hard-learned lessons that I've picked up over the years of teaching beginners, and I was hoping maybe people would find this compelling enough to think about creating training data sets when they're building out larger APIs and link data and stuff. That is awesome and wonderful, but also maybe think about smaller training data sets for students who are getting started, because the good news is they become very fluent very quickly, they learn so fast it's scary, and they'll learn your API and they'll learn about Sparkle queries and they'll learn about all that stuff, but they need to practice with something a little bit more manageable. I don't know how interesting this is to you. This is the suite of tools that we tend to use in my class. Maybe these are familiar. Databasic is a wonderful teaching tool for students. You can drop a CSV in and it'll tell you what each of the fields are. Open Refine for cleaning, Cardo for mapping, although I'm mad about their new interface. RAW is like a really nice data visualization package, Tableau, Side Escape, Fusion Tables, WordPress, P5.js, which I just wanted to flag as something that we're more and more interested in using for each student. Does anyone know P5? Do you want to explain what it is? That's the processing port to Java Scripts. Lots of creativity is used as an easy way to do visual design. Students come in learning P5 already. What's great about P5 is that as opposed to D3, which I also love and use a lot, P5 takes as your primitive, not a bar chart or a scatter plot, but a circle or a square. If you're interested in doing some experimental visualization, you can have a circle bounce against a square every time a female comes up in your gender. I'll explain why that's interesting to us in a minute. Maybe not immediately compelling. I wanted to give you a quick look at some of the work my students do after a quarter. Oh, I can't because it's down. Perhaps I'll share this out on my Twitter feed in a moment, but I did want to say they get pretty far. You'd be surprised. In 10 weeks, they get pretty far. This was using a dataset about all the performances from the New York Philharmonic, which I think Dan Fowler shared with me initially. They were able to show the influence of German composers in the early years of the New York Phil, which was pretty good work for a bunch of undergrads. I wanted to talk about this tool wish list that we have in the humanities that has to do with our specific way of looking at the world. I don't think I can play this for you because my Wi-Fi connection is down, but maybe you've seen this map of slave ships that was in slate. It was drawn from the voyages database of the travels of vessels carrying enslaved people. It's a really, really powerful visualization. You can see the ships traveling over time. You can slide back and forth. I wanted to emphasize it not because it's an inherently powerful visualization, which it is, but it also shows something that a lot of people in the humanities are trying to do all the time and really having a hard time with, which is showing time, space, and movement all at the same time. Everybody thinks there's some tool you can just drop a data set in where you have a scrubber and you can move it back and forth and see how people migrated or see how an object traveled. It's the most common request I get and one of the hardest to fulfill. For some reason, I don't know. It's interesting to think about why that's so important to humanists. It's hard for us to do it without custom programming. I think that humanists think about uncertainty in a way that's a little different from statisticians in the sense that the techniques I've seen in statistics for visualizing uncertainty depend on quantifying either a margin of error or a degree of uncertainty. But for us, uncertainty is more about epistemological uncertainty or ontological uncertainty. This is a pretty standard network visualization of some data we gathered about the early race film industry. It's fine. It's all the people involved in the industry and they're connected via the films they worked on together. But I think what was really interesting to us in this project was trying to figure out what a race film was because any individual criterion for a race film like directed by a black director or contains an all black cast, any individual criterion will fail because the race film industry was a community of practice. It wasn't like a binary category that a film could sit in. We were really interested in showing some of these films were really race films and some of them were kind of not race films. And we didn't really have a good technique in our repertoire for showing that kind of strangeness and fuzziness in our data. So we love thinking about fuzziness and strangeness and how to depict that. We're also always interested in multi-perspectivality. I wanted to show this map because in a way it's pretty straightforward. It's a map of Harlem in the 1920s and you can pull down a menu and map these different events. But what's striking about this map to people who spend time with it is that you realize looking at the categories that this isn't Harlem for people who lived there in the 20s. This is Harlem for the police who were surveilling Harlem like assault, domestic assault on police, automobile crash. Right now, so right now there's one map of Harlem but wouldn't it be cool if you could flip it so then it was from someone else's perspective? This is the police view of Harlem. Well, what did Harlem look like to someone who actually lived there? It's cool to think about. We don't really have a way of doing it. So Lauren, McCarthy and I, the one who developed the JavaScript port for P5 have just convened a series of workshops where we're kind of working through some of these issues. It's called Scopelab. It's a feminist data workshop that really thinks about uncertainty and contingency really seriously. So we're working on it. We're hoping maybe you'll think about working on it too. Talk to me about working on it. And I think that's it. Thank you. So I think I made it through. I'll go back to the previous slide so I can get that address. Yeah. I have a point. I see that you use cytoscape. These were a particular reason. Because when I tried to use it, I was very frustrated. So I'm surprised with the adoption. It kind of has, yeah, in DH. What do you use? For graphs. I use R most of the time. Yeah. And it's frustrating because I don't know. Yeah. The root of networks and traditional networks is always very cool. Yeah. That's a really good question. So we do tend to use cytoscape. I know Thomas and I both use cytoscape like when we're doing network graphs. Only because Gaffey stopped working more or less. And so we switched to cytoscape which is fine. We sometimes use Palladio which is another kind of web-based interface for doing network graphs. What I found about network graphs in the humanities is that humanists sometimes are interested in traditional measures of networks like various kinds of centrality and stuff. But more often they're just wanting to see connections. So it would be nice if there was like a beginner level network graph tool that could handle a lot of different nodes. But didn't have all the bells and whistles that like cytoscape or Gaffey has. Yeah. I say using like a lot of tools. I've thought about this problem quite a bit. Oh cool. I was wondering if you tried to put an on focus. Just because of the idea of like teaching people who don't know much about code one single environment. Because I love the web on the JavaScript program as well. But you have a lot of levels of abstraction and complexity to deal with web technologies. That's curious. I think from zero of the digital industry with so many things to learn. It's really true. Like what you say that we're kind of immediately moving them away from actually dealing with like whatever the data is at its most basic level. By asking them to like visualize it immediately. And I do know people who take that approach where they don't you know they start very granular with like what is data. Like what's the difference between a document and like an individual record. I find this. I don't know if this is true or not. I guess my my suspicion is that my students want to see things right away. Like they want to know what they're dealing with right away so that quicker we can get to visualization. The more comfortable they are. But I also see some very compelling reasons to get them programming with Python quickly. Yeah every year I'm like I'm going to teach this class differently. Next year for sure. Yeah please. That is that is so true. I mean that's kind of one of the most satisfying things to me is seeing the students having those conversations. And actually I find that happens most often when they're cleaning the data. So that's why I love teaching them open refine because they have to make these decisions about OK well you know like we have too many fields here which ones are we going to get rid of. And like oh well if we get rid of like you know this element then we won't be able to track it. Or like if OK if we clean up this language and standardize the vocabulary we're losing this nuance. And so they really see it very directly when they're trying to like aggregate terms and open refine. And sometimes they get it and sometimes they just don't want any part of it. They're like you know I don't want that you know like there's a field it's called statistics it's fine. Like don't make me you know talk about theory in the humanities at all. Yeah. Good idea. You're really walking through part of what you think is kind of problematic about the API access in the context of the concept. Maybe the scope of the data set. But I'm curious if you can comment on any sort of contextual information around the data set like features of the description of the data in terms of quality of provenance enhancements that have been made to the data that are that you've seen or that you think might be helpful or. Yeah. I mean actually it's kind of not what you might expect. Like I know data people are always very interested in tracking provenance and tracking like changes made to the data set over time. And they're not kind of there yet. They're not thinking about data in that way yet. What they find most helpful is actually talking to the person who if they're if paper records exist like taking the paper records and entering them into some kind of structured form because they don't understand how data gets from like out in the world to like structured data they can work with. So they really like to like see you know for example take the New York Phil collection they like to see the document that like described these performances and then how someone transposed those messy categories onto a spreadsheet or whatever. Yes I'm glad you asked that because it's such a good point. I subscribe to that newsletter too and love it. I find that like students I mean my humanities students like it's not that you couldn't make a humanities argument out of like municipal water records but they're not at a place where they can see that yet. So they need like traditional cultural artifacts like the you know like the Philharmonic's performance or like you know costume design over the years. Things like that where they can have an easier time seeing how to make an argument that speaks to the humanities about it and finding those data sets is just awful. Like if anyone here has tried to do it they can back me up. There's just like a grapevine where we all tell each other about these things like Dan told me about the New York Phil thing. You know 538 will sometimes put out an interesting data set. So those of us who do this kind of work have been talking for years about like aggregating it somewhere. Well one could imagine a different data set pretty easily although I don't know that it exists like understanding the neighborhood. I guess and one could also imagine just like switching to a different data set via this menu. I think what would be more kind of interesting and challenging would be seeing features on the map actually change as they reflect the way in which people perceive them. Like this Cartesian map really reflects like an authoritative view of the way space works. Whereas if someone's actually navigating their neighborhood wouldn't it be cool if you could like different features would change in size and shape as they took on a different importance to a person who lived there. Like maybe the church would be really big. I know it's not statistically accurate but sometimes to the humanities like that's not as important. Yeah I think we need to wrap up.