 Welcome to our software for social good luncheon. Thanks for coming. A couple of housekeeping details before we get started. One, please be aware that our luncheons are webcast live and recorded for posterity on our website. So you all are being recorded right now, FYI. Two, please tweet us at BKC Harvard during the event if you have things you would like to tweet. And three, please, if you can, also join us on Thursday next door in Milstein East B. We're having a book talk for Network Propaganda by Yohai Benkler, Robert Ferris, and Hal Roberts. That will also be live webcast if you can't make it. But there is a reception afterward at the HLS Pub, so it will be more fun if you can make it. Introductions. Hi, I'm Andromeda Yelton. I'm a software developer here at Berkman Klein. And speaking today about software for social good, we're going to have Sebastian Diaz, who is the Berkman Klein Center's Director of Technology, or Chief Geek. He guides the Berkman Klein Center's IT enterprise through a landscape of ever-changing technology and priorities. Sebastian manages the technology group, which consists of a Harvard-renowned development team, an infrastructure team, and a technical project management team. We will also have Rahul Bargov, who is a researcher and technologist specializing in civic technology and data literacy. He creates interactive websites used by hundreds of thousands, playful educational experiences across the globe, and award-winning visualizations for museum settings. As a research scientist at the MIT Center for Civic Media, Rahul leads technical development on projects ranging from interfaces for quantitative news analysis to platforms for crowdsourced sensing. He has a special interest in how new technologies are introduced to people in settings focused on learning. And we also have Peter Suber, who works for the growth of open access to research within Harvard and beyond, using a combination of consultation, collaboration, research, tool building, and direct assistance. He's the Director of the Harvard Office for Scholarly Communication in the Harvard Library, Director of the Harvard Open Access Project in the Berkman Klein Center, and a senior researcher at Berkman Klein. So starting with Sebastian. Thank you, Andromeda. Welcome. So I'm going to talk today a bit about software for the social good, because the common good is something that today you hear a lot, and how it applies to software is not necessarily evident. So I want to start with this question, which is, what is software for the social good? And everyone has different perceptions, I think, of what social good is. And I want you to think maybe for a second on what this is for you, what is social good for you. And while you're thinking, I'll just talk over your thoughts here for a second. I'd like to offer this, which is sort of my thoughts on the question, and talk about where BKC's tools, where the Berkman Klein Center's tools, fit into this universe, fit into this arena. And ideally, your answer to the question, this question, that you've already come up with, is in some place in this talk, and there's an overlap. There's an overlap there. So OK, not just one question, but maybe another one. But this one's not for you. This one is more for the software. This is what you ask when you're looking at the tool, when you're looking at any particular piece of software. And you're looking at a piece of software, and you want to ask, well, I guess you can ask a lot of questions. You can say, what does it do? What devices does it work with? Does it work with my phone? Does it work with my laptop? Is it a Mac? Windows? Does it have ads? There's so many questions you can ask about this. But this is one that I want to sort of concentrate on right here, which is, does it contribute something? And you can ask this question sort of in an open-ended way. And say, does it contribute something to those who use it? Most software does, I suppose, or we wouldn't use it at all. You're not just going to have something that does nothing. I guess some people might, but not necessarily on my phone. But does it contribute something to the common good, I guess, is the question I want to ask? Some example answers would be, it contributes to safety. It contributes to privacy. It contributes to secure communication. It contributes to understanding. But how can a software tool contribute? How specifically? So for understanding, for example, and because this is sort of a BKC core pillar study, there's, we build, study, educate, and connect, many BKC projects contribute to research. So or provide something to the community, maybe starting with a smaller community. This is the BKC community, which is, I guess if you took it as max active community, it would be about 300 people. Or a larger one, so a wider audience. And here you can consider that that wider audience is the university, which is, I guess, slightly bigger. Or city, in Cambridge. Country, globally. And globally is easy because you consider everything is online if this is an online tool. Online's accessible anywhere, or anywhere that has internet. Or I guess Jason Griffey's here, also, anywhere that a non-connected internet device is as well. So the reach is as far as it's useful, in effect. But so what is its usefulness? Like what is a tool? How is that useful within research? How is it useful within the community? So tools, like many tools, software tools, are just like the latest incarnation of tools, I suppose, from a wrench to a screwdriver to a club to anything, can be employed to advocate, educate, or connect. And this is a powerful idea, again, because the BKC pillars Build, Study, Educate, Connect fit in there. They can be powerful levers in areas that need help. So think of the examples that I had before of safety, privacy, secure communications, understanding. Those contributions are others. Can help advocate for a cause. Can educate your audience. Can connect your community together. But why does that matter? So I hope this answer in your head is fairly obvious to you. Software tends to shape our lives more than we notice. Don't play a passive part in it. Anyone can be a change-angent. And with online reach and global reach, this reach can be far. It can be global. And there is certainly a caveat to this. Some software tools are not necessarily used for social good. Some were created for the social good, and then maybe along the path they strayed and then got commercial, not commercial is not necessarily bad, but not necessarily stuck to their initial mission and get lost. So I just read a recent article that was going around the ethical software group about the co-founder of WhatsApp, Brian Acton. So if you guys know of White's WhatsApp, he sold to Facebook for a lot of money. And realized shortly after that Facebook wasn't necessarily aligned with his priorities, that he wanted to do with his software. The change that he wanted to bring about wasn't necessarily aligned with the change that Facebook's change was. And he left Facebook, left a hefty sum there as, I think it's he did invest on a section of his stock or whatever it was, and left a hefty sum because he thought that the change that he wanted his original tool to do wasn't aligned with Facebook. So he left and then started funding open encryption, health care, early child development. So not all is necessarily for the common good or the social good. And there's a balance. And it's up to you to make it, up to the person or the tool, writer, conceptualizer, to make it. So anyone who wants to learn coding, has access to open source or open data, can create impact, learn to code. There's lots of amazing things that can be done with being able to express your ideas through code. Explore open source projects, not necessarily to code, but to be able to contribute to them in some ways. Not everything in software is writing code. Explore open data. There are so many sources for open data and being able to leverage the work that was done already of the gathering and implementation. Explore it. Take a look. And this can even be not necessarily raw data, but maybe even already visualized data. And I think you guys do more than this than you realize, because many of the things that we look at in media, in news, is already this analysis. And you're doing your analysis as well. So contribution is also participation. So how does this application contribute to the social good? It goes hand in hand with how can users participate? How can they use the data? How can I use the app? But also, how can they contribute data to an open data platform? How can they contribute code to an open source project? So back to how tools contribute. How can a tool contribute to research? So a tool can contribute to research with data by providing a platform for experimentation itself, for the platform itself being research. So you can have a tool that is just complicated and not necessarily known how it's going to be implemented. And the research of building that tool can actually be that contribution. So for data, a tool can help by gathering data, analyzing data. You guys see this data analysis everywhere. Visualizing data, you guys, everyone loves data. There's stuff about data on visualizing data, on beautiful data, on analysis, and storing this data. So disparate, incongruous, nonsensical, large data that when you look at it later, when you do this analysis, when you do this visualization, you then see a different meaning. You then see a different and interesting concept. So keep in mind that these projects I'm going to show you in these next slides range from established platform that has been running for 10-plus years, gathering data on a broad set of things to borrow a bit less dramatic clickbait title from last week's Bruce Schneier talk, will blow up when you click them next. Not necessarily blow up everything. I want to walk through some examples that are active projects at Berkman client. So tools that gather data, we have three tools that I thought of, one of which is actually Raoul will talk about later, Media Cloud. The Internet Monitor project specifically goes about analyzing online content for controls and activity by monitoring and gathering data on internet activity, on what is currently being censored, on what is currently being blocked, on how the internet controls are being deployed. Lumen gathers data specifically about how DMCA notices are used and how they are sent to content owners and providers. Media Cloud looks at online news sources and aggregates those into a large corpus. All these have tons of data. I think at the last check-in, the Lumen database, which has infringing URLs as part of the database, had, I wish Adam was here, had like 3 billion URLs that have been reported via DMCA takedown notices. That is a lot of data. You have to find ways to analyze it. Media Cloud has an analysis engine. We worked with the Wikimedia Foundation to analyze the logs that were underneath the access for Wikipedia to find out where you're being censored. So this is part of an anomaly detection system. And that uses these different tools down here, Jupiter Notebook pandas, which is a Python tool and Elk stack, although it's not really Elk, because we don't use log stash. Sorry, technical there. Visualizing data, there's lots of ways that you can have fun with a lot of the data that's being grabbed already from these other tools. The Internet Monitor has a dashboard that we use to visualize the data. Curricle has a way to visualize, for the students here, visualize the Harvard curriculum. And ideally, as we branch out the expanded curriculum of other academic institutions, Dotplot has the ability to visualize any disparate data set in different ways that you want to be able to communicate to your audience. And AI Compass was one of the things that we wanted to, we sort of experimented with. This is the one where you click on a blow up. I didn't say that, did I? I did. That will display sort of the different areas that are being worked on within the governance of AI. Storing and aggregating data? You see media clouds on here, like everywhere. I love media cloud too. It's pretty awesome, which you'll see. Currarium, which stored stories in visual elements. Curricle, again, aggregating the data of the curriculum to be able to present it. The Emily Dickinson archive, which was the first online archive to be able to have all of Emily Dickinson's fascicles online. So you can restructure and experiment with them, because this is an interesting fact. No one knows how Emily Dickinson's fascicles, the papers that she put together in her notebooks, go together. There are opinions, but no one has concrete evidence on how they go together. So two poems that seemingly match could disagree. Two scholars who look at poems who seemingly match could disagree. And it's really neat to be able to go in there and say, well, I can see the needle punch on this fascicle and be able to think that that's where she bound it and how it mates with the previous poem. So disparate, weird data. The Criminal Justice Debt Reform Project, which is a project to unite the data that is stored by each state on criminal justice debt, the digital literacy resource platform, which is intended to store digital literacy resources for teaching specific digital literacy for different topics. And Tag Team, which is storing media feeds, sorry, RSS feeds, and being able to classify those and order them via filters and tags. And Peter Super is actually going to talk about that later as well. So they can also be platforms for experimentation. We're out of data here. And they can provide a platform where you can work on something that you just want to take a look at different methods that you want to prove or different ideas for an experiment. Media Cloud, we work the anomaly detection with a large data set. And some of the Twitter research tools that we use fit in that category. Technical implementation, a lot of the stuff's hard. Like the software and thinking of these research projects aren't necessarily easy. They don't come across as simple projects. These are complex problems that are interesting in themselves and solving. Really, you said it's a tool to think with. A thing to think with. So Road Trip is an implementation of a paper that was written to analyze network traffic on an off-path host. That means I can look at the conversation that two hosts are having and not be on that path, not be looking at the packets, not be looking at the communication that's happening, not be on the network and look at them and be able to analyze the round trip time, how long it takes for them to communicate if there's something censoring them, if there's a problem with one host. That's wild. And so that idea of thinking of that is the research problem itself. OK, so there's lots of ways to skin a cat, or in this case, cut an avocado. But in these examples of how a tool can contribute to research, these all have grounding within the Berkman client. So back to how tools contribute again. Maybe you see a pattern here. I keep asking questions. How can a tool contribute to the community? In the context of scholars, researchers, activists, students, they can help provide understanding. They can provide a platform. They can help provide connections. They can provide a service. So these tools both contribute, and you have to think of that they contribute both in their online form and in their end form, but also in their open source form. So take the example of ShariaSource here. So it lives in many of these spheres where ShariaSource, it's down there under Platform. It's a platform providing content and context on Islamic law with a mission to organize the world's information on Islamic law in a way that is accessible to the public and useful. So this can't be done alone, or rather, I guess it could be. But it's more easily done as a community. It's more easily done with a large community. So a platform is needed for the community to work. It's working towards understanding because it's publishing these tools, this case, the Sharia law, the compendium of Sharia law. And it's organizing in a way that's accessible. It's providing scholars a place to connect because they see the work that is being done by the other scholars who's working on what. And it provides a service because the scholars, the researchers, the activists, the students, the journalists all benefit from the resources gathered by that community. So all these tools work in some similar way where they connect in these various spheres with relative priorities, providing understanding, providing a platform, providing a ways to connect, and providing a service. So back to how tools contribute again. For those that didn't see the pattern, I like questions. How can a tool contribute to a wider audience? So for university-wide, city-wide, country-wide, globally, you can have tools provide platforms to organize knowledge, like Sharia source, the example I was just talking about, to preserve knowledge, to provide access to knowledge, and to provide tools to its users, to the activists, to the students, to the journalists. So organizing knowledge, something seemingly simple, can be simple as a URL shortener. We wrote a URL shortener that was fairly interesting that we wanted to have because we didn't want to use Twitter's or Bitly's. And we wrote a URL shortener just as a test, as an experiment. Tag Team, like I mentioned, organizes your feeds in an accessible way to you, an accessible way to your audience. Preserving knowledge, we wrote a tool called Amber, which allows you to take content that may have been taken offline and create a cache locally for something that your users are you to see. I guess PERMA would be more the one for you to see. The Library Information Lab wrote a tool called PERMA, which allows you to specify your URL and create a snapshot of it, create an archive of it, much like the web archive, and allow that to be accessible at a URL, a PERMA URL. And the time encryption, the time capsule encryption tool that was also written by Lil, allows for you to take some information, put it in a time capsule, encrypt it so that at a specific time later, it becomes accessible. Access to knowledge, we have some tools that actually are in production, in development that don't even have a logo. The patent database that we work on, the risk assessment tool database, which looks at the risk assessment tools used in the justice system to categorize risk of a specific individual, the criminal justice debt reform tool, which I spoke about, Shreya source which I spoke about, the case law access program, case law access project, which aims to take all of US case law and make it free and available online in the digital literacy resource platform. And tools for activists, I guess you can define activists however you want, but in terms of finding out like where circumvention is, to be able to target your research or your activism, to be able to find out who's being censored, to be able to provide censored content, those are all great. And also, not necessarily affiliated with the BKC, but the Guardian project or Signal, those are all kinds of tools that you've heard about that you want to think about where they place. So I want to just end over here for a second to think like, who writes these? Who runs these? And who thinks these up these tools? So I just, I sent out an email earlier and I sent it out too late. But I took just some mental notes of the things that I know are happening within the Berkman Klein Center. And the community is gigantic. There are so many people and so many avenues of participation and contribution within the Berkman Klein Center that there is one that fits you, not even just in the core group, which is just the top left line there, but in the community at large. And this is people who don't necessarily write code, don't necessarily, aren't necessarily tactical, but think of these ideas and contribute to those ideas because they're thinking of this idea of social good. So with that, we are going to move, yeah, we're going to move to looking at one of the tools, media cloud, and I'll leave it up to Rue. Sure. Thank you. Thanks. Okay, my name is Rahal Bhargav and I work across down the way at the MIT Center for Civic Media. So thanks for having me here. I'm not going to stand here and shill civics work. We work in a very similar domain. We care about technology and social change, the ways it can help it, the ways it can impede it. And similarly, attack it from both the technological point of view, building things that try to attack this, and the process and theory point of view. So you can check out civic.mit.edu if you're interested in that stuff. We collaborate with all our lovely friends at the Birkman Klein Center on one specific project that's one of our larger projects called Media Cloud. And as we heard, Media Cloud is about helping you do media analysis online. And it started at the Birkman Klein Center maybe 10 or more years ago easily. And now we have a nice partnership where Birkman Klein does a lot of great amazing research and drives the tool forward and works on a lot of systems back end stuff. And at Civic Media, we do a lot of front end development for web based tools and do a lot of research as well. So this is covered topics from the health of the democratic sphere and the role media plays in it to public health, to development online, and now looking more and more at hate speech. So lots of different topics driven by funding and different partnerships we developed. So it's quite a big project. Now it's focused on building a giant database of news content and information from around the world in multiple languages. And on top of that, a set of web based tools that help us dig into that data when we have a question. But of course, the innovation isn't really, the technology is seldom the hard part here. The hard part is the process we use to evaluate what's happening. So it's just as much innovation on theory and process as it is on technology. So what I wanna do is show you a little bit about concretely what the tools are and invite you in a couple different ways that you might want to engage and be able to use our stuff, okay? So I'll do that. Let me just set my timer here. That's not my phone. Let me set my timer here so I don't go over. Okay, so all of the tools are hosted at mediacloud.org. So you can find and link to all the tools there. You basically just have to create an account to sign in, mostly so we don't get swamped and overload our servers. Once you do, there's three main tools that you can use. The first one is called Explorer. Explorer lets you start to dig into media coverage of something you're interested in. TopicMapper is the second one and that's when you really wanna dive in and understand the different ways a topic is being framed, how a network might be talking about it and dig into different subtopics. And then we have a tool called Source Manager that lets you actually explore what we have within our system. So I'll start off with just a concrete query. So I loaded this up earlier just to get an example. And what I've got here is I've got a search that I'm running on three different things. One, coverage of Kavanaugh. Two, a set of media sources that I'm searching. So this is on, oh, that's all right. I've got a set, the number two there is a set of media sources I'm searching within. So this particular one is called US Top Online News 2017. It's from a Pew study. If I click on that, I get, I go over and I can find out exactly what sources are in it. And the third thing is I gotta arrange dates. Number three there, I've got one month of coverage. And then when I hit search, I just get a bunch of different data from our giant database. So we're going out and there's a set of sources that we're just regularly pulling into our system. We're not Google, we don't have everything, but we have a lot of stuff that we think matters. We have a collection like US Top Online News for around a hundred countries in probably a dozen different languages. Some countries we have state level coverage, like in India we have pretty good state level coverage based on where things are published. So one of the things that Media Cloud offers is global coverage in a very accessible way. So when we start to look at results, we can analyze, we try to help analyze things in Explorer, which is just the introductory tool on three dimensions. First is just regular attention. How much is this being talked about? And you can see I've actually got two queries. The second one is Kavanaugh and one of the survivors that has accused him of sexual assault. So on our intention chart I've got two lines. One is coverage of Kavanaugh, the other is coverage of Kavanaugh and a first pass draft query that you probably want to optimize and do much better at of coverage of the scandals around him. And you already see a story here. This is the scandal framing of Kavanaugh is dominating the coverage of Kavanaugh that's mentioning him since mid-September. So we can already see this big split pre-post accusations coming out. And we'd want to dig in and try to figure out why. If you click on one of these dates it shows you more stories, things like that. You get a bunch of other stuff. Some of the stories you can find out more about them. The second dimension is language. Of course you want to look at how things are being talked about. So again I can compare between two different... This is just a word cloud where the size of the word is determined by how often it's used in a sample of these articles. Everything here you can download. So we've got deck and download biograms. You can download tons of stuff. You can see it in different ways. We've also got a more unique visualization that we created that tries to use some more complicated math to tell you what different types of conversations might be happening. That there's one about the allegations and the accusations and that's not really connected to the party line. And then you can compare words actually and look at how the words are connected. We've got some metadata we've added to try to do topic modeling on a story to say it's about news or politics. It's about... And here it's done a pretty good job. Politics and government, most of the stories are about that. Crime and sex crime. So it's done a pretty good job of categorizing these stories. The last dimension is what we call entities when we're doing this type of computational language analysis, which are like proper nouns, like organizations, people, places. It's just a fancy word for that. So you can see who the main players are here, right? And how many of these stories are mentioning them, the organizations they're mentioning, and geography. Not surprising, this is mostly just all US coverage. So this is what Explorer can do. It can let you just start to dip your toe into understanding media coverage of things. You can pick the thing, number one, the thing you're trying to dig into, whether it's Kavanaugh or, you know, whatever robotic automation or whatever you're interested in. Two, pick your geographic locality with these collections of media sources. And that third tool I mentioned, source manager lets you do that. And three, your time span. The time span varies quite a bit based on different media sources, like we don't have coverage back 50 years, obviously. Generally, we say if you're researching things in the last year or two, you're gonna do pretty well because you won't have the problem with web pages disappearing and things like that. Researching older things, there's ways to do it, but it's harder. I just wanna leave you with a quick pointer to some of the more advanced things that we try to do with MediaCloud and then talk about ways that you can engage. We have this tool called TopicMapper, which once you have a query in Explorer that you think really is interesting, the limitation of Explorer is that it really focuses on what we have in our database already. And of course, you wanna find out all of the coverage of the thing you're looking at or as much as you can. So with TopicMapper, we do three things. And this is the thing that takes longer to do. We follow links to discover more content. It's called spidering. You follow the links, you grab a story. You follow the links in that story. You grab all those, you follow the links in those. We do 15 rounds of that and we filter it by the same set of keywords and then you've got a much better corpus, a much better collection of things to look into. We add in some of the measurements of influence that we think are really interesting. So what's being linked to most? So if you think about in many geographies, linking is something news articles do to each other, definitely in the US. So what's linked to most? We add things like what's being shared most on Facebook and there's some ways to do some Twitter analysis as well. And then how can you slice and dice this? If you're finding that there's different conversations about the topic that you're interested in, how can you actually carve it and compare, oh, how does the traditionally right wing ecosystem in the US talk about this different than the left wing? How do online blogs talk about this differently than traditional media sources, things like that? So topic mapper is for more in-depth analysis and you can use all of this stuff again as free with some limits on how much. So you can't spam us with five bajillion queries each day, there's some quota on there. So the last thing is how can you engage and participate in this? Like we talked about, for us, this is about understanding the stories that get told, how they get told, and of course in the media that governs how we see the world around us. The news that I'm reading governs how I understand the world. So what can you do to get engaged? Feel free to use our tools, they're open and free if you do media analysis. If you're interested in coverage of a specific place that isn't the US, there's great ways to get engaged and help us flesh out these collections for different countries. In Source Manager, you can browse by geography and look at our Mongolian coverage and it's terrible and if there's a researcher in Mongolia that's really interested in doing work there, we're not doing anything there. So if you're particularly interested in area, you can do that. We have support, if you go to the mediacloud.org homepage and hit the support tab, there's ways to watch a webinar about how this stuff works. There's a bunch of documentation. You can join our community mailing list to ask questions there. We have a support email. We're not, again, we focus on doing the more in-depth academic research, but we like building our tools so lots of people can use them. So Washington Post can use it to look at analyzing hurricane coverage and how it might differ in different areas. And nonprofits can use it to look at how their issues being talked about. We can't do deep collaborations that aren't funded, but we try to make it open and available as much as possible. That's the 10 minute-ish overview of mediacloud and Waste Again Gauge. Thanks for the time, and I'll hand off. Peter Suber talked about one of our other projects, Tag Team, that we have to switch. That's right. Switch computers here. Almost working. Here we go. Perfect. Thanks, everybody. I'm here to talk about Tag Team and I've got 10 minutes. Somebody should play the role of my timer. I can give you the history and the motivation for Tag Team, which I think are at least as interesting as the software, but to stick to the 10 minutes, I'll focus on the software. But if you have questions about the background, I'd be happy to talk about them later. This is the Tag Team homepage, which is not the tool itself, but I bring it up partly to show you that there is a homepage and partly to highlight this. If you want to go back to the homepage, that's the short URL. That's all you have to write down. Everything else is on the homepage. As you can tell, this is a Berkman Wiki. And if you have a project at Berkman, I encourage you to get a Wiki for your project. It makes organization much easier. Tag Team is open source software developed at the Berkman Center for a grant funded project that I ran. I guess I still run it, but the grant has run out. And I got two back to back grants and the first one was in 2011. So we started developing Tag Team in 2011. It was ready for use in 2012 and I've been using it for my work since 2012. The introduction said I focus on open access to research and most of my work in open access to research has been done on Tag Team through Tag Team since 2012 when it launched. And I'll say more about that. But to use Sebastian's categories, I consider this a tool for organizing knowledge but also a tool for activists. I'm an activist for open access and I organize other activists and we work together. We collaborate through Tag Team. That's the team part of Tag Team. Here's the Tag Team homepage, the homepage of the Harvard instance of Tag Team. I'm already logged in, so it's showing my hubs on the right. If you're not logged in, it would show a general menu of the most active hubs. A hub in Tag Team is simply a tagging project. So if I wanna follow what's going on in the world of open access, I create a hub devoted to what's going on in open access. Mine is called the Open Access Tracking Project but that doesn't matter. You could have a hub devoted to anything you want to track, anything you want to help coordinate with colleagues elsewhere. You could also make a hub in which you're the only participant. And you tag things that are relevant to your hub. So I tag things that are relevant to open access. You wouldn't believe how much happens every single day related to open access. And in the early days of the open access movement, I could keep track of it in a blog and I did for about eight years. I tried to tag every darn thing and till it got to be too much. And obviously that was a side effect of success but it meant I had to give up what I was doing and look for something else. So I wanted a crowdsourced method to organize what was going on. And now when I find something new that happened, I tag it for Tag Team. Tagging it puts it in the Tag Team database, not only for storage but also for search but also to publish a feed for people who wanna follow what's going on in open access. I'm gonna give you an example of how it works. I'm going to Google News just because it has pages I've never seen and they're not in the cache. So it's another presentation about Kavanaugh. But I'm gonna pick one that's not yet been tagged. So I'll bring up the page and I'll say, oh, that's relevant to my topic. I'm gonna tag that. I've got a tag button, bookmarklet in my browser bar. I click it and this form pops up. This is the tagging form and I fill in some relevant tags. And then I try to put in a description, some text. It could be a verbatim quotation, in quotation marks it could be my own paraphrase or translation of what's going on on the page. The title notice was grabbed by the software automatically. The URL was grabbed by the software. I have more than one hub in Tag Team but the last hub that I tagged for is my open access hub in the upper left. So that was entered by default but I'm gonna change it. I don't want this in my open access hub. It's not about open access. So I'm putting it in my test hub. But then I just click this button and it's automatically added. That's all it takes to tag something. So if you understand why a page is relevant to your project you can tag it in about 30 seconds. And if there's a lot going on in your topic area in an open access let's say there's 30 new pieces of news every day. You or your team of people can tag all of them in a given day. That's valuable to do because it puts all of them in the database and it puts all of them in the feed for people who are following along. I'll show you what it looks like. This I put it in my test hub and this is the front page of my test hub. I have to refresh to pick up that new thing that I just tagged. There it is. The top item is the one I just tagged. This is what it looks like inside Tag Team when I'm looking at all the items that have been tagged. It's organized like a blog with the most recent items at the top. The tags are all showing. I can use that green plus sign if I wanna add a tag right here that I forgot to add back at the time. I can also go into the tag record. This is just sort of the abbreviated version of it but the tag record is the full version of it and it contains that little summary that I cut and pasted. And from here I can not only add new tags, I can remove tags, I can modify tags. If I wanna change Trump to Donald Trump, I can do that here. If I wanna say every time someone tags with Trump, there should also be a tag in the record for let's say president. I can make sure that's done here. That way I don't have to needle my fellow participants in the project and say don't forget whenever you tag Trump, add president. They're just gonna forget. I can make that automatic. I'll say more about that in a minute too. But in other words, you can zero in and look at the tag record and modify it directly. My big project is the open access tracking project and I'm bringing that up just so I can show you some things inside that. It's an example of a big hub inside tag team. It has more than 70,000 tagged items and more than 80 taggers. And until my funding ran out, we were the most comprehensive source of open access news anywhere. Now we're running on an all volunteer basis and we're probably still the most comprehensive, although not quite as much as we were before. If you know a more comprehensive source of news on open access, please let me know. Yes. It's on the wish list. Peter, do you have a quick question? Yes, does tag team integrate with citation software like Zotero? You can tell that when you tag something, you're recording metadata about that article and what you'd like to do is export that metadata in a form that you can put into the bibliography section of an article, let's say. It's on our wish list to not only export in one standard format that allows you in turn to export in yet other formats, but to export basically in all the standard formats right away. So if you have tagged 50 items relevant to your dissertation, you'd like to export all of them in the same format in one operation for cutting and pasting into your doc. We want that, but we're not there yet. We have the metadata, we just don't have the feature in tag team to do that. This is the primary feed of my big hub, OATP, and it's live today. People are tagging things today and they're not me. There are people other than me who are tagging things today, and that's good, that's the way we want it. That's the team part of tag team again. And I think I had a tab here on a record inside and I just showed you a record so I can skip that. Now, let me back up and explain why we bothered to make tag team when there were already other tagging platforms. There's still one called Delicious, and at the time we launched this, Delicious was much bigger than it is today. And it was a good question why we didn't just use Delicious instead of this. There was another one called Kanotea, which was open source and developed by Nature, the scientific journal publisher. And I picked Kanotea originally. Remember this wasn't ready at the time that I started the project. And Kanotea was flaky and it died in the middle of the first year. But even when it was healthy and even when Delicious was better than it is today, they didn't have a feature that I wanted. For example, I want a standard vocabulary for the different subtopics within open access. One of the major topics is journals, open access journals. Some journals are open, some are not. But different participants could tag those items with the word journal in the singular and others could use journals in the plural. And if it's a small thing, it sounds like, but if they do that, then if you search for one, you're not gonna get the other. And so we need what's called tag convergence. We need some mechanism to get people using the same tag for the same meaning. I wanted to automate that. I didn't wanna nag my colleagues to say, look, do it in the plural. You forgot to do it in the plural. I wanted a way to automate this. So if we picked the plural as the preferred form, I want the software to convert the singular to the plural every time. So I don't have to nag anybody. In delicious and conatea, you can do this for your own tags. And in any tagging platform I've seen, you can change your own tags retroactively. What I wanted was project level tag management. So if we're doing a project on open access that might have 80 participants, I want the project leaders to be able to decide preferred tags and deprecated tags and automatically convert the deprecated ones to approved ones. So that those changes apply to everybody in the project. And you join the project with consent, so you shouldn't be offended if somebody changes your tags later on. And the result is what we call folksonomy in, ontology out. In the beginning, even now, we support user-defined tags. You can use the journal in the singular. You can make up a tag that's never been used before. But the project will consult with its members and over time decide which tags to deprecate, which tags to approve. And when it makes one of those decisions, it writes what we call a hub-wide filter. Here are some examples of hub-wide filters. Looks like my time is up. But I'll just give you some examples. Some hub-wide filters change a deprecated tag to an approved tag. Some of them supplement a given tag with a second tag. So Sci Hub, you may know, it's a database of, let's say, stolen journal articles. It's an example of what Aaron Schwartz called Gorilla Open Access. And some people like to use the tag Gorilla to indicate that type of open access. So I wrote a supplement filter that says whenever somebody uses the Sci Hub tag, also add this other tag so that people who go searching for Gorilla Open Access will find all those Sci Hub items, even if the original tagger didn't think they'd do it that way. Do I have one more minute? Is this for me or is that just somebody? Oh, okay. Let me just take this last minute. One of the most valuable features of tag team is the way it makes sharing easy. So if you want all the items that have a certain tag, there's a URL for that. And the URL is self-explanatory. If you want a tag for all the items tagged by a given person, regardless of the topic, there's a tag for that. And it's self-explanatory. You could even compose it without looking it up. If you want a URL for all the items with a given tag by a given tagger, there's a URL for that. And so on, you get the idea. But if you're in a Twitter feed and somebody says, what's going on with Open Access to Medical Research in Brazil? Well, there's a tag for that. There's a Boolean search for that inside tag team, but there's a tag, there's a URL for the results of that Boolean search. And you can stick that into Twitter. And by the way, it fits. That's the whole point. You can answer the question with all the knowledge in our database in a short URL. And that's how we end up benefiting other people. People can subscribe to our feeds, but they can also see the benefit through these URLs. Okay. Thanks. Our speakers, we have about 10 minutes for Q&A. I understand some of you may have places you need to be at one, but for those of you who can stick around, we can have a little Q&A. Three quick housekeeping notes on that. One, as it is the beginning of the year, I would like to particularly encourage new voices. Two, we have a microphone runner. Please use the microphone, even if you think you do not need one. Other people here may need you to use it. And three, a comment is not a question. Do not make me hurt you. Any questions? I know that several of our speakers are inviting additional questions because there's only so much you can say in 10 minutes. I had a question about tag team. Is there an ability to communicate within as far as if one hub is focused on an area that you're interested in that people could communicate between taggers to join together or, you know, I wondered where communication fit in. There's pretty good communication between participants of the same hub, but not participants in different hubs. To save memory and be efficient, tag team, if two different hubs tag the same item, tag team uses one record underneath the hood, but it shows up differently in the two hubs, depending on who tagged it and what privileges they have. But that's really the only cross hub communication we have right now. But I have 80 taggers and one of the difficult jobs in recruiting the crowd for a crowdsourced project is giving them feedback because we do have a standard vocabulary. We don't want to tag sloppily. We don't want to leave out descriptions. I want to give them feedback on how they're doing. And in the early days, before tag team had certain features, I had to do this by hand. I had to write emails to each of the separate taggers, giving them feedback on what they were doing. But one of the new features we just got, thanks to Sebastian, is email feedback through these modifications. If somebody uses a tag that's deprecated, for example, we use the tag Europe to stand for European Commission, European Union, all kinds of other things, so that if you use any of those roughly synonymous tags, they all get this one outcome. That helps tag convergence, it helps searching. But if somebody uses European Commission and I'll modify it inside the hub to Europe, the fact that I modified it will propagate back to them in the form of an email. And that way, if they're paying attention, they see that they used a deprecated tag and that somebody corrected it. Now, they could just say, okay, somebody fixed it, I don't have to do anything, which is also true. Or eventually, we're hoping this registers and they start to use the right tags in the future. It doesn't matter so much because the tag is actually now correct. We also have a way to send an email message to every participant in a hub. So for example, if we coin a new tag for meaning that we've never bothered tagging before, I'd like to tell all of our taggers about that. And I can do that now through Tag Team. I used to have to do that by collecting all these email addresses separately. Anyone else? Hi, question for Tag Team and possibly, sorry, Media Cloud also. Are there plans to sort of combine the capabilities of the two systems, assuming that if you look up an article in Tag Team, there's some data that you can harvest from Media Cloud. I mean, I guess they're probably already processed there in which case there are entities there which could nicely map to tags in your own system. So are there any plans for something like this? There could be, there are no plans today, but you can take all the information in your hub and export it. And right now, the main purpose of doing that is to import it into another instance of Tag Team. In case you're kicked off the Harvard instance or in case you just want to launch a new one, you want to fork this off or you want to do something else, you can get all your data out. And once you have it out, you could do whatever you could. You could write some scripts, for example, to integrate it with another dataset. But we don't know anybody who's doing that and we haven't taken any trouble to make it easy. But in principle, it could be done. Yeah, I mean, like most open source projects, Media Cloud is built as a series of puzzle pieces that either are connected in a pipeline or are sort of duct taped together with code. So for instance, the entity tagging stuff, you can, it's own platform called Clifclavin that I wrote with my colleague, Catherine Dugnazio, you can easily just send it text or a URL and it spits back in a machine readable format that a computer can understand the tags that the same metadata. So, and we take advantage of that with other people's platforms, so we build things in the same way so that these kinds of things can put together as puzzle pieces, which is a great way to sort of repurpose other people's work for sort of social good efforts when you're doing coding. Back there, Ellen, there's someone behind you. Could you wait for the mic? Thanks. Application called Media Cloud? Yes, mediacloud.org. So how have people used that? How's... So that's a great question. The short answer is there's been a bunch of different types of use. So going from simple to like more in depth. The most simple is kind of the media naval gazing. So this is the like Washington Post writing about how Hurricane Maria has talked about very differently than other hurricanes in the last year and a half. Yeah, so yeah, a lot of media partners. So there's a set of media partners that have done work like that, which is kind of like media introspective stuff. There's a set of, let's say, actors in the social good space, nonprofits, things like that. So people like the, at bigger scale, the WHO wanting to look at the dominant narratives around the latest like monkeypox outbreak in Southern Sudan or wherever it was, I don't remember. So things like that. So try and understand what the main narratives are of something and that's like medium or they're working with us to generate a report or some findings. As you get more complicated, then you start to see people that are basically using our system, which that whole web interface is built on top of an API layer. So you can use programming to get that same data out. And so there's a set of academics that are doing that and basically sucking in things like a long list of URLs and doing their own deep analysis on that. So you have a bunch of people that are doing that either around elections or other issues. And then the most complicated is the stuff that happens in-house here at Berkman Klein. So basically pushing the systems to its limits to do complicated analysis like the book that's about to come out on the US election. And there's a series of big reports like that that have been generated by our colleagues at Berkman Klein and like a little bit less in-depth with papers coming out of the media lab at the group I in. So it's a spectrum and that's the sort of shortened version of how I would, there's a bunch of different ways but that's a lot of it. We're having the event for on Thursday at 5.30 right now. Yes, I was hoping you would remind me I was. Very good. Thursday. Y'all should come. Any other questions? Okay. Oh wait, one here. Still got a couple of minutes. Raul, what are the intellectual property arrangements that let you suck in a good chunk of the world's media sphere? Yeah, I'm not the most qualified answer. The short answer is there's a, the content never leaves our servers. So like once we've pulled in a bunch of webpages, nobody can get, like give me the text of this story. You can't get that. You can get like a little excerpt. You can do aggregate work but you can't get the copyrighted material off. So then we're in the fair use territory and then there's a lovely army of Harvard lawyers that are in front of us sort of as a blockade and you gotta have some deep pockets and confidence to come at an army of Harvard lawyers on a fair use. Like I definitely am not the most qualified answer. That's my short version. If other folks on the team have more knowledge but that's the quick short answer. All right, so protected by our brigade of Harvard lawyers. Let's go enjoy the rest of our afternoons. Thanks for coming. Thanks.