 So thank you for coming to this session. I know that you have a lot of other exciting and interesting sessions to go to. And so I appreciate you coming here to learn about the Documenting the Now project. It's quite strange to be speaking to this kind of room configuration where, actually it's quite strange to just be speaking in general. But so I apologize. Also, I didn't manage to get my speaker notes working properly, so if I stumble around more than usual, that's probably why. So I'm going to talk to you about this Documenting the Now project. And these are three names that you see on the lower side of the screen are the primary investigators in the project. But I also want to mention two people, in addition to these three, Chris Freeland, who's sitting over here, helped us get started, get the grant started at Washington University in St. Louis, and also Meredith Evans, who is no longer at Washington University, also helped get the project started there, too. So their names kind of need to be there. But since they're not actively working on the project right now, these are the three main people. And before I leave this slide, I probably should say the purpose of the Documenting the Now project is to build community tools and ethical practices around the activity of collecting social media data and working with social media data and preserving social media data. So that's the big umbrella that this work is happening under. And I won't spend any time really kind of trying to convince you that this is a worthwhile thing to do, because I think all you have to do is look at the current political situation and what's happened in social media in the last year, two years, to see that could we ever understand what's happened in the last year without looking at what has happened in social media? I think the answer is, for me anyway, is no and I'm not going to try to convince you that it's a valuable thing to do to collect social media as part of this. But I do hope to convince you about how you go about doing it if you choose to do it. So the three big, and you may have guessed this already from the three names, but the three main institutions that are working on this are University California Riverside, Washington University in St. Louis, and the University of Maryland. And we got some very generous funding from the Mellon Foundation to help us sort of work on this problem together. And I'll explain how these three institutions came together, hopefully as part of this narrative of how the project came about. And the work that I'm going to be talking about is work that these people did. I had sort of a hand in it too. I'm down there at the bottom there, but these other people, so Desiree Jones-Smith is our project coordinator at Washington University in St. Louis. Dan Chudnov there is our, he calls himself a data engineer in this. He does lots of other things, but in this project he calls himself a data engineer. Alexandra Dolan Mescal is our user experience designer, and Francis Kayewa is a DevOps engineer. We have Vernon Mitchell, who's working at Washington University, who is a PI on the project, and Burgess Jules down here on the lower center is also a PI and he's working on the community, really sort of like the community outreach and aspects of the project, building a community, which I'm going to talk to you about, and I'm a developer on the project, so that's normally my excuse for my presentation, you know, like I'm not used to doing this, but it gets old. I use it every time I talk, and so, and then these are, this is our advisory board, which as I'll tell you in a second, like these people have been very instrumental in helping us figure out what we're doing, and you may, I'm not going to list all their names here, or you know, for you now, but if you know any of them, you probably recognize that we're very lucky to have them working on the project, and we recently drew on these folks, they came to St. Louis actually during the summer, and we had a really excellent meeting where we kind of worked on really largely focused on sort of the community and ethical aspects of the work that I'm going to tell you about. If you have a laptop open, and you feel like what I'm saying is getting kind of dull, and you want to play around with this prototype, we have a, this is kind of a development prototype, thinking of it as a, it's really almost like a strawman at this point, but it's an application that we're using to sort of explore the design space around what it means to collect social media data, really focused on Twitter at the moment, and also what does it mean to collect web resources that are referenced within the Twitter data. So feel free to poke at this, you do need a Twitter account to actually do anything, so it'll kind of boot you over to Twitter to log in, grant this application and permission to use your account, et cetera. But I see everybody's, there's several lots of people looking directly at me, so I guess yeah, you don't need to play around with it if you don't want to. So the three, and I kind of alluded to this on the first slide, but the three kind of main things we're trying to do are build tools to help people do social media data collection, analysis, preservation, build communities, a, community, I mean communities already exist for doing this stuff, right? So part of it is just us tapping into them, figuring out who they are, where they are, what we have in common with them. But they're also, we need for the digital preservation community, I think specifically, there's a need for communities to practice around this work. And then also this issue of ethics, so like what does it mean to collect data that's out on the quote unquote public web? And I'm gonna get into this in a moment, but so I mentioned the three things here on the same slide because they're really very interdependent. You can't really have effective tools without an effective community. Communities that are trying to do things, they need tools to do the things that they want to do. This probably sounds all like so obvious, but also I think the thing that maybe is obvious, but perhaps wasn't obvious to me a couple years ago is that the tools themselves have ethics built into them, whether we want to recognize it or not, they're there, the question is do we know what they are and do we know, yeah, I mean, do we know what they are? And that's something that I personally did not always, I feel like I didn't always recognize the ethical decisions that I was making as a software developer. And so this project has been actually a lot of fun to work on because it's brought me into contact with people that are thinking about the tools in a very different way than I traditionally have. So the work began in 2014 when Michael Brown was killed in St. Louis by Darren Wilson, a police officer. And Burgess and I happened to be at a Society of American Archivists meeting here in DC and we were, along with a lot of other archivists at that meeting, talking about what will people know about this event 25 years from now, 50 years from now. Everybody knew it was significant, the week that it was happening, just based on what you were seeing in social media. Yeah, we put this data set together, basically a visualization of it. So the thing that happened that we didn't expect was that, and I probably should have, but is that these events continued, right? And so we saw up until the current day, right? Like, but the thing that happened in Ferguson, this sort of heightened awareness in social media about the killing of African-American men and women by the police is something that was gonna continue and was gonna accelerate, right? Awareness of it was gonna accelerate. And a whole social movement was gonna sort of swell up underneath that awareness, right? The Black Lives Matter movement. So we didn't know that that was gonna happen, but Burgess as an archivist that sort of attuned to these social issues, he saw it happening and so we basically worked together to collect data around each of these, not each of them, but probably about 15 or so different incidents. And, you know, another thing we didn't realize getting into it is that, so we were doing a lot of this data collection with the idea that researchers would find it valuable to use, right? And certainly at University of Maryland, we've been working with a lot of, actually primarily sociology, PhD students and professors who want to look at what happened in Ferguson, right? And that's been super, right? But, and I mean, it speaks to the mission of the organization that I work at, the Maryland Institute for Technology and Humanities, like this is what they wanna be doing. But what we didn't anticipate is that the first people person to ask for the data was this email that you're seeing here, and I've cut the from address off just because it, I don't know if it's, I guess I could show you it, but I just feel like it's maybe not a great idea, but it was a defense contracting company in Boston, right? That wanted to have access to the data and go to their website and there's like all this stuff about doing anti-terrorism work, right? And so I was suddenly in the position of like, well, here's a researcher that wants to use the data. Do I give it to them? My answer was not to respond to this just because, just because. But now, if you've heard Burgess Jules, who is the person that usually speaks about this project in public, talk about documenting them now. He was at DLF I think recently, and the Collections as Data conference here in DC. So you may have heard this already before, but like the thing that really, when I heard him talk about it, that really stuck with me is data is about people, right? So yes, we collected 13 million tweets in the two week period after Ferguson, right? After Michael Brown was killed, but in that data are the conversations of actual people, right? And what does it mean to get that content of theirs and to keep it, right? To take it from the web. And so some of the people here, so actually some of these people are on our advisory board, but you see on the far left, the gentleman with the red shirt on, is Rushine Aldridge, who was an activist using Twitter as part of his social media, well as part of his activism, sort of on the ground in St. Louis. All four of them were doing this, right? So, and Kayla Reed, you see right to the right of Jonathan Feddersen, whose head is unfortunately not fully visible, but Kayla Reed, and then to her right is Alexis Templeton, and then to Alexis's right is Ruben Riggs. And so we heard from them about how they want it, this was in the meeting in St. Louis, how they want to be remembered in an archive. And this sort of really got us, like got the creative juices going when we were in St. Louis, because it just brought all these issues right to the center. So I thought I'd just briefly describe some of the things that we're thinking about building into this tool that you're probably not playing with, but it's not gonna really look very much like that app.docnow.io application, but these are some of the things that we're hoping to build into the application that will address some of the, some of the ethical considerations around doing this kind of data collection on the open web. Well, on the open web, but social media in particular, so where people are using platforms to communicate, right? Not necessarily all web content. I'm really sort of focused here on social media, but you may, if you do web archiving work, you may see some parallels. So the first one is notification, and so notification, the idea there is what if, if we're collecting Ferguson, the hashtag, the tool is, what if the application tweets out into the stream of content? This, we're doing, so and so, this researcher is doing data collection, this is who they are, this is why they're doing it, puts it out into a tweet into the stream. So like it's kind of almost like, it's an effort, right? I mean, I think Dan Chudnov kind of and I have had sort of like back and forth about whether this is a value or not, but I think we kind of have come to the conclusion that it is an effort to put a signal out into the stream, right, that this activity is happening. The key thing there is that the tweet will have a URL in it that will take them to the application where they'll be able to see who is doing the data collection and why. And importantly, if they don't want to be part of the data set, they can basically use, as a Twitter user authenticate and say, I don't want to be part of this data collection and remove themselves. Interestingly, and this was an idea, Trevor Munoz who I work with had, perhaps it's an opportunity for people to opt in to say, not only can you use my data, but I actually wouldn't mind talking to you in person about like, if you're looking for people to interview perhaps, about like, what's going on. Basically to people that wanted to opt in to a conversation. This one was Jared Drake's idea, one of his ideas, but this idea of data retention. So when you're creating a collection, and this comes right out of, right, like archival work, but you know, what if you could, when creating a collection, stipulate how long that collection would live. So I'm creating a collection of Ferguson related tweets. I'm going to do some work with it, but I actually don't want it to live longer than a month, right? So we see this kind of work with IRBs. I think if you've gone through IRB before, you've kind of sort of been forced to kind of think about this, these kind of issues, right? But we're building some of that into the tool. This we probably could talk, you know, I only have very few minutes left. So maybe I should skip in case there's questions. But yeah, we could talk a long time about this, but the main thing with tweet IDs is that Twitter's terms of service are very kind of prescriptive about how data that you get from their API can be shared with what they call third parties. And but one of the things they do let you do is take a data set of tweet IDs. So an ID basically is a unique identifier, just a number, but kind of a long one that sort of identifies the tweet. And you can share those with researchers and researchers, when they get those tweet ID data sets, they have to turn them back into data and to do that, you use their hydration API. And in the process of that, like if anybody's deleted a tweet, you can no longer hydrate it. So you can no longer turn it back into data again. So it basically empowers content creators to decide whether or not they want their data to be sort of widely circulating on the internet. Another idea that we have that we want to try implementing is traditional knowledge labels, which is an idea from the Mukutu project, where, and again, we could talk for an hour just about this, right? But the idea is that they've done a lot of work, if you're familiar with them already, like around special labels that are content creators basically come up with that describe how they want their resources to be used. And it really kind of grew out of work with Aboriginal, I believe Aboriginal communities in Australia, but it's been used in sort of Native American communities as well that want to share their content online, but also want to share it in particular ways, right? And I believe the Library of Congress Folklife Center is also kind of working closely with them. Another idea are warrant canaries. So this is the idea that you've been following, like Brewster Cale's kind of like work at Internet Archive and how he's had these requests for data where he's not been able to, what's the word for it? It's, I always blank on the name of this. But when you're basically, the authorities come and ask for data, but you're not allowed to talk about the request, was it? What? Well, I think Patriot Act is a vehicle that they use, yeah. Gag order, there you go. Yeah, so the idea of warrant canaries is you put up a notification saying, I haven't gotten a gag order, you know, this year. And then the next year you say, I haven't got a gag order this year. And then third year, you don't say anything. And then really underlying a lot of this is the tool is that we're thinking of, we're really thinking of it as an appraisal tool and this is me sort of projecting my own sort of interest on it a little bit, but I think I share it with the other team members. But the idea is that it's sort of a tool to get to see what is going on on social media on a particular topic and what's going on on the web related to that conversation. Not necessarily a tool to go grab it all, right? It's more a tool to like help you see what's going and then to make decisions about, well, what is a value here? That's where we, I'm hoping it will get to. So for example, you know, we saw those activists that were working in Ferguson. You know, how do we find them as archivists, right? When we want to document what's going on in the Ferguson protests. So you can see already like that interest is very closely aligned to those of surveillance, right? And so that's actually something that we need to kind of sort out. And then this last one, I'm gonna kind of skip through because I know we're running out of time, but the deed of gift, so the idea is if you could identify individuals with social media accounts that are of value to the conversation. Like perhaps you can get into a conversation with them about it'd be great if you could download your Twitter archive and donate it to the archive. So I'm sure there's been lots of conversations about big data and big data, it does matter, right? I'm not here to say that big data does not matter. Certainly we're concerned about making sure the tool that we build sort of scales, you know, scale. But, and that uses, we're gonna use the cloud, these scalability in the cloud, you know, big data. But really the project is about small data. And I don't know if you're familiar with this idea of small data, but it, and I don't really even know if people are still talking about it. I know two Amelias, two Amelias, Amelia Abru and Amelia Acker kind of came up with this idea of small data, or at least they articulated it a couple years ago. And it's really kind of resonated with me for a few years. And the idea is that, you know, yes, big data matters, but also like the stories within that data matter, the context that's present in the smallness of the data matters as well. And as archivists, it's extremely important. I can just keep going. I think this is like the second to last slide. These are just three projects that we kind of think of them as sister projects in a way. They're not, we're not like directly tied to them, but they're doing very similar work to what we're doing and we're sort of cross-pollinating ideas between them. George Washington University is over there on the left because of the Social Feed Manager project, which actually some of our team members have worked on previously. We actually share a little bit of tooling between the two applications. So, you know, in the spirit of open source, we're looking to sort of have some collaborations there. And the one in the middle, Rizome, is there for the WebRecorder project. So they're actually another Mellon-funded project and they're really focused on, well, they're a web archiving tool, right? But they're very curator-driven, so it kind of speaks to the small data of our project when curators are making decisions to collect particular things. WebRecorder is a really nice tool, if you haven't checked it out for doing that. It's basically a user-driven application. They used to call it co-browsing, I think, like back in the day, like we're back in a day. But, and then Mukru, which we talked about previously, this is a sort of an aspirational link at the moment. We have had some conversations with them, but we haven't actually picked any labels yet, worked on any labels with any communities yet, but this is sort of like the direction we're headed in. And yeah, we need help. If any of this sounds interesting, and I mean, it's an open source, so the tools are all open source, right? So there's assistance there, but really like the thing that, I mean, this is our website, right? And there's our blog where we write about what we're doing. But we were using a Slack channel for basically for mediating the community development. And that's been kind of a big surprise. So we've had upwards over like 200 people kind of sort of joined this kind of rag band of a rag band or ragged band. I don't know, they're not that ragged actually, they're kind of pretty nice, upstanding people. But yeah, the idea there is we're a distributed team, so the project team itself is just distributed, that core project team, but then so we need a place to collaborate online, right? Because we're not working together physically. But then the idea is to grow that circle outwards of other people that are sort of interested in doing similar work, and we gather in this Slack channel. And so if you actually, yeah, if you need help getting to that, let me know. But there is actually a form that you kind of, like you say, I'd like to join with my email and then you get like a invite. And then there's a GitHub where our stuff is, our code and experiments. So thank you for your patience, I really appreciate it. Yes.