 Excellent, thank you. So, hi everybody, and welcome to a virtual talk. Always a weird thing, probably, I don't know, probably weird for everybody, really weird as a speaker because you have no idea whether you're even like getting all the way through to people. But hopefully this works well for you. So I'm Kelsey Brason and I am currently working at the Environmental Data and Governance Initiative. I actually used to be a lot more on the tech-speaking scene in like the Node.js and hardware open-source hardware communities with the Tesla project, kind of moved from that space more towards purpose-oriented thinking and purpose-oriented technology. One of the things that concerns me about hanging out in communities of technologists is a lot of times people get really, really excited about building things because they can. And I realized that it was really important for me to always be thinking about who's actually going to use the thing, what is the valuable use case and work backward from there. And so that's probably the perspective I come into the DM from. Probably most of you have heard this phrase before, well, making some assumptions, but there's this Tim Berners-Lee, early internet sort of rant post on the W3 style guide, cool your eyes don't change. And I really like this as context for the decentralized web. And in fact, Tim Berners-Lee is now working on solid for the D-Web. And I just really like this rant and I recommend looking it up and reading through it. But just to highlight a couple of quotes, when somebody follows a link and it breaks, they generally lose confidence in the owner of the server. So you can see where this might have been in early internet, like a really important way of thinking about the domains you manage. My current work with the Environmental Data and Governance Initiative has a sort of a different, a different take on this same quote, this concept of loss of confidence in the owner of the server. This is climate.dot.gov as of March 1st, 2017. First year of our Trump presidency here in the United States. The Environmental Data and Governance Initiative sprang up because it was worried about things like deletion of public information from public access, which is a thing that has happened. By the way, this is not a politics talk. My organization was originally politically motivated, but not so much as being like for against a particularly political candidate, but really for advocacy for public data around the environment. Basically, if we're going to have a democracy and an educated public making comments on policy, we need the public to be able to become educated about the policy they're commenting on. And so what we do is we watch the way that public information, especially around the environment is handled. And this is a number of reports that we've produced over time. And by the way, we plan to continue regardless of who takes office. There are no clean track records here. But there's been a number of issues with sites being taken down from public site or from where they originally resided on.gov servers. And it results in a pretty big loss of confidence and loss of ability to make good decisions about like as a public to make good public comment about events that are occurring. And we've seen kind of a really interesting pattern emerge. It's hard to say how much of this is like a strategy and how much of this is, I don't know, happening incidentally, but we've seen some websites be taken down and then a few months later, a big policy change comes out and is open for public comment, but the website you use to go to to get information about that thing is no longer there. And so that's the sort of thing that we track. And so this is sort of how we arrived as an organization, as folks who are really interested in what the decentralized web is. If you rely on a.gov page for certain information, how do you know that there is an archive? How do you know where to look for it? Maybe it doesn't exist. If you do find it, how do you know it's exactly what was originally there? Maybe it has been modified in the meanwhile. And sometimes this matters a lot and I can get into that more a little later. And then if you, like maybe you decided to save it and maybe you've heard of the data rescue movement, there are a lot of people right around the election who based on this very specific fear spent a lot of time saving data. And that was actually kind of the beginning of my org. And a lot of that went into the internet archive. Which I actually can examine a little bit later in terms of how safe it is. But like if you saved it on your laptop, is that any better than it just getting deleted? It's a little better for you maybe, but it's not gonna help with the public comment problem. This is back to Tim Berners-Lee's rant. What you need to do is have the web server to look up a persistent URI in an instant and return the file, wherever your current crazy file system has it stored away at the moment. And that's exactly what the D-Web protocols typically do. So here's an example of a URI versus a URL. And by the way, I sort of designed this presentation originally to speak to people who are not already experts on IPFS, not already experts on decentralized web. I'm interfacing a lot with archivers, with folks who are like involved in environmental protection, involved in sort of lawsuits around the environment. And I'm sort of trying to be a bridge between decentralized web communities and folks who have really, really strong use cases for data to be held by the public and held safely. So some of this might be a bit redundant to things you already know. But I'm going to go ahead and go through it anyway, just in case there are folks out there who don't already have all of these backgrounds. So the difference between a URI and a URL is, a URL tells you where to look for something and a URI tells you what it is you're looking for, how I'm asking, who I'm asking and where to look versus how I'm asking and just what it is that I want back. And of course, that's what the D-Web does well. So a lot of this talk is actually based on slides by my sister Dana that I asked her to put together for me in the fall. And she went and did a whole bunch of research that I want to credit her for. And I also wanted to mention that if you're not as familiar with these concepts, what I'm going to show you here is a very picture-heavy version of this. But if you do better with text, I'll post a version of her slides, which are much more text-based into the IPFS channel. But just to get straight into it, here's what happens when you put files on the decentralized web. Here's what we start with is some files. You want to get them on the web. And let's just have a little look in that folder. We've got this text file, a PNG, a really big CSV file. Wow, not that big, but comparatively. So the first thing we're going to do is try to standardize the sizes of these little files. This is a process called chunking. So it depends on the protocol, but for the purposes of this presentation, let's just say 256 kilobytes. And you can imagine this as kind of a box. And if it's bigger than that box, you're going to need another box. But you don't want to mix things up, right? So if you put a small amount into the first box, even though you could put more in it, you're not going to because you don't want to get confused with all the different stuff that's in there. Very unsophisticated take, but this is really important because you need to be able to know that no chunk is bigger than 256 kilobytes. That's going to help you understand what hardware you're able to use. It's going to let you make a lot of different estimations. The next thing you're going to do is going to hash each of these. The full content of the files and basically tag them with a hash. And just to break down hashing a little bit, how it's some characteristics, it has to be one way. And so this is an example that I found on the internet. And this is demonstrating the concept of one wayness. If I say the output was two, you won't be able to tell me what the input was because in this very simple one, it's anything that ends with a two. So that's one version of what's one way. What we actually have for these protocols is more complex as you'll see later, but that's that one concept. It has a fixed size, no matter how big of the file you put in it, Fox, the Fox and perhaps that ICE, that PNG file we had earlier, it's all going to come out the same hash length. And it is deterministic. The same input will always produce the same output. It's also collision resistant. A different input will produce a different output. And this is something that's a little bit more complex than the other concepts because this is a little bit probabilistic, but in general, a hashing algorithm will have basically a very, very small likelihood of collisions, such that the same input will produce the same output. And I won't get into that here. But so once this process is done, each chunk receives a hash that is uniquely coded to it. What this does is creates file verification, it creates a unique identifier, and it's efficient because you can identify any file, even a very large one with a short text string. And if you'll notice this one's two chunks, so it's going to be a couple of short text strings. The hardest thing for me about this non-in-person speaking is that I have no feedback anywhere. So I hope this is going well for you, but feel free to hit me up on Twitter or something later if there's a new part you wanna talk more about. So next we go into the Merkle Dag, directed acyclic graph or a Merkle tree. And basically all we're trying to do is say, hey, I have this set of files and I want them all to go together and be treated as a group. So we're gonna produce a hash that has all the same characteristics of the other hashes and it's gonna represent all of them, all of the things in it. In case what that looks like is, if you take this hash and you copy and paste it into a text with the other hash next to it, what you have there is now another hashable item. This is just a string of text. So you can make a hash for that too and you can keep doing that until you only have one. And once you have that one, if you're able to map that hash back onto its original content, then you're able to split it again back into these individual pieces and trace all the way back down the tree. This gives you the same ID length for even many files and I'll show you a little bit about this file verification concept, any change anywhere in any file will result in different hash, but it does follow from what we already talked about with cryptographic hashing. So here's our original hash. And then if we modified, if we just added a white space somewhere in survey.csv in its second chunk, that hash changes, which means that every hash after it changes, every hash above it changes, such that the top-level hash is different. That doesn't necessarily mean it's wrong, by the way, it just means that it has a totally different identifier. So that could happen when you update a date stamp on something and it's got a slightly different information in it now. You can also do this for partial verification. If you don't want to receive everything in it, you know that maybe the purple hash is the only part you want, you just need to summary on the image. You don't actually have to trace all the way back down the tree and get the CSV back as well. I recognize I'm going pretty fast through this, but I hear these videos are gonna be released. So just play me on like 0.5x if you like later. So you access files by requesting their URI. Returning to this Tim Berners-Lee quote. Sorry, I just really like that rant. So you need to look up a persistent URI in an instant and return it wherever your current crazy file system hasn't stored away at the moment. And Dietrich was talking in the introduction about how the 0.5 release has significantly reduced that time. One of the things about IPFS that I really don't understand is exactly how this process of finding and returning the file works. But what's really important is that it does in fact do that by sourcing that across all of your peers. On IPFS, that indicator is the top level hash that we just showed with the Merkel tree. So you go and you ask your peers, hey, where is this IPFS protocol? Find it for me. And IPFS looks everybody and then people send back all the little bits that you need. What's important about IPFS is that one URI will always yield the exact same static content. So this is huge for my use case, which is maybe we need a file that is just exactly, perfectly a record of what happened or what the data looked like or what a website looked on a particular day. So we'll be able to return that. That actually has a really different take on this. The URI is the read key for the directory. So in an SSH key pair, you have a key that's held by you and a key that is made accessible to the public, you as the author. And you ask using a public key for the thing with the latest date stamp. And I am definitely paraphrasing how this works, but folks go in and make a compare, like the protocol does this, goes in and looks at all the different available date stamps that correspond to the correct key. And they send back all of the, whatever is the latest for that. And this is really important because instead of saying this is a particularly, like this is the static thing, it's saying it was definitely authored by this entity, whoever owns the private key, which like if you trusted the entity, for example, the US government, you would want to check the key that they said was their public key. And this is very helpful because the URI will not change. However, if you don't trust the entity that you're working from, this may not be helpful because they can change the content at any time. And when you get it back, you'll automatically re-host it back onto the network and you will be able to share two peers across the network, which makes you a node that strengthens the centralized web. That is a lightning tour of how the protocol works, which I hope is helpful. I wanted to bring this back to our use case within edgy, the environmental data and governance initiative to something that came up just over a year ago. So there was a, so there are a lot of air quality monitors in general around typically maintained by government or somewhat governmental entities. So here's an example of air quality data collected daily at some sites in Deer Park near Houston. And they've produced a whole bunch of these different data sets. On March 18th of 2019, there was a massive fire at a petrochemical plant and a bunch of tanks containing chemicals that burn and produce benzene, which is a carcinogen, were released into the atmosphere. Response on the ground was absolutely awful. The fires burned for days. And there was some misunderstanding among the responders about how many parts per million were okay to breathe. They were wronged by, I think, two orders of magnitude. And they sent kids back to school and workers back to factories. And first responders didn't have the appropriate gear as they were sent in to help with this. Cancer is not something you're gonna get an immediate backlash from, right? You're just breathing the air. But in a decade, we're very likely to see a pretty big set of cases where a lot of folks have health issues based on this fire that was poorly responded to. Something weird happened during the fire, which was that one of the air quality monitors was taken down by the monitoring entity. But this air quality data is kind of the basis for a legal case that would help folks cover their bills as the benzene takes its effect. So it's really important that that data be saved. And so that's something that we were called in to help with. How do we make sure that this data, which is clearly endangered, potentially one of the entities that would be held responsible is the entity that puts the data online. They would potentially be culpable in the legal case. So we can't really trust them as a source and we really can't trust that 10 years from now, this data will be available in a usable format in a way that we trust. So this is a case for why we might need data to go on to a static site, like a static verifiable provenance shown repository like something on EPSS. I also worked on a paper earlier this year in the data science journal talking about data risk matrices in a couple of different contexts. And this is one of the contexts that we use. But this is through the Earth Science Information ESIP Group. They did some workshops with folks who do a lot of different types of archiving, maybe some of them with university libraries, some of them in sort of rescuing data from like old on paper NASA data sets, some for like collections of botanical specimens, different ways that important data can be put at risk. And as this great quote in this paper from a workshop participant, data were considered to be at risk unless we had a dedicated plan to not be at risk. Take a second to think about that. We have this idea that data that's on the internet never disappears, but it's not accurate to say that. It's accurate to say that data that you might want to disappear maybe never disappears like anything that's done on the internet might be owned by someone but it doesn't mean that all data that has ever been on the internet is safe. I'm probably preaching to the choir with this crew but one of the things that we did in this paper was develop a risk matrix. So we did PES from this deer park fire that we stored with a partner in sort of a traditional archive. And this is some highlights from the risk matrix that I applied to this space. And you can see that there are a number of areas that are pretty high risk. The way you read this is the left column is categories of risk and then the top column is sort of the different lenses through which we might view that risk. So severity time to recover the degree of control we have over this problem. You'll notice that we are at low risk of loss from like political interference likelihood of occurrence is low. It's just it's data in an archive. They're not going to go mess with it. Our degree of control is also low. But you'll see that there's a lot of high issues like if a catastrophe will occur. So this particular archive has its servers based in San Francisco catastrophes can happen. That's if it did happen, the time to recover would be well maybe forever. This might be lost. We also stored many of the same data sets on IPFS specifically through Query which is a data set tracking software. And it's definitely not a panacea. And in fact, you'll notice that see the areas where it's high here are actually the ones that are low in a traditional archive. So look at this loss of knowledge, for example, the severity of risk of loss of knowledge for how to access this data set on IPFS on Query. We don't know in a decade where Query or IPFS are going to be. Hopefully they're going to be doing great but possibly not. And if you were a lawyer who's not involved in this but is involved in math suits about benzene, how are you going to learn that there exists a data set stored on something called Query on this alternative internet? So we kind of need this chain of people passing knowledge in order for this to, which is largely an early technology type of problem. And it's something that potentially IPFS could change. But then we also have this other problem of, like in theory, we could recover quickly from a dependence on service provider because in theory, the data set would be resilient because lots of people across the entire world would have copied it. In practice, I'm not 100% certain it exists in more than one place. So that's sort of some of the issues that we have to think about with this, like the same set of questions that I brought up at the beginning of like, does the archive exist? If it exists, do we trust it? Is the archive actually safer than the original? So hopefully take a look at this paper if this is something that's interesting to be a really interesting way to assess different impacts that decentralized web technologies might have on different spaces. And I would love to hear if you use it in your work. My summary is what I already said, single point of failure risks are down and new technology risks are up by using an IPFS-based technology. And this is like, we're not really actively using IPFS right now at edgy. We are really excited about it and we are waiting for the moment when we're able to, you know, broadly pitch it to all of our activist friends. And maybe that's a familiar challenge for you all. And like, you know, I try to get my family on, you know, signal and riot and all this sort of stuff. And they're like, that's great, but I definitely don't know how to use that. And this is just kind of a more professional application of the same problem. On the other hand, again, I'm not preaching to the choir, but the strongest way to strengthen this web is to use it. You all are working on this. And I'm glad to be here and I'm honored that I get to talk to you all. And one of the things that I thought could be cool is the project that I was working on for a bit was just trying to hold onto different D-Web technologies and taking notes to share back to the project owners from different perspectives, different archivists' perspectives. And I've talked to different communities, typically not involved in the D-Web space about kind of looking at this different stuff. And, you know, I've gone back and made issues on a whole bunch of different repos that say, hey, in this setting, this thing doesn't make sense or this thing's confusing or I tried to follow these directions and just kind of being an early user. These are sort of low T-weights as being on the D-Web but you probably are aware of them. And I'll finish out by giving a little plug for a group that I host, data-together.org. And this is sort of a splinter off of edgy and it's a partnership with Query and with Protocol Labs actually. We have a reading group where we talk about different contexts by which the decentralized Web and the concept of decentralization plays into civics, ethics, actually this season where we organize it like semesters, this season where we have the topic of polity, what defines the group to which I am responsible and whose laws I follow or build upon. So if hour and a half monthly discussions with kind of well-chosen readings are interesting to you, go ahead and check that out. Thanks very much. Thank you so very much, Kelsey. This was wild and I, you know, we were all talking about it in the chat. Chat please give Kelsey a huge round of applause. You did such an amazing job to explain the underlying technologies but also some like, I would call it deadly serious implications of these technologies. It's so very far out. Thank you so very much for sharing that with us. Thanks for having me.