 Hey, it's nice to see everybody My name is Carissa, and I'm from Oakland, California. It's not as sunny as you think And you can find me on Twitter, okay distribute So I'm gonna today. I'm gonna talk about the web of commons So let's play a game if you guys ready to play a game. I know it's 9 30 in the morning But we are like get ready. So two people are accused of a crime Neither of them did it, of course, but just in case they have a pact that they'll never confess But a cop comes in and separates them With the goal to try and get a confession out of them So you can actually map this game into outcomes if one of them defects and confesses They will get out of jail free and the other will serve 20 years if they both defect and Oops, we both tried to get over on each other they get five years And if they both stay silent they get one year So the ideal scenario that Optimizes for the lowest amount of jail time is for both of them to keep their pact right to have cooperation with each other However, because defection results in a better payoff. It's actually more likely to happen than cooperating It's a dominant strategy. So the dilemma the prisoner's dilemma then is that mutual cooperation is better But less likely to happen So this is an economic theory that is applied to a lot of environmental Phenomenon for example climate change or overfishing right so people take More than they give and they don't cooperate. So the common pool recess we resource depletes over time The proposed solution which has been the proposed solution since probably the 18th century is called enclosure literally creating fences around the land and Giving property ownership to certain people and they it was thought that this was the only way to protect places It was thought that only that isolated autonomous of individuals will always choose the path that's best for themselves To get out of jail quickly So in 1968 Garrett Harden who is the famous person who came up with this tragedy of the commons Said freedom in the commons brings ruin to all and this was the prevailing theory for a long time until 2009 When Eleanor Ostrom was the first woman to win a Nobel Prize in economics She For over 40 years. She studied this concept and proved that Harden Exaggerated the problems involved in managing a commons. She asked the question. What if we aren't isolated autonomous individuals? What if we have the ability to mutually cooperate? She came up with rules and norms that people use to sustain their mutual cooperation over time And she did this by studying real people in real places. So for example, this is her in Nepal in 1993 Where she studied irrigation management systems and she did this all over the place She did this in Guatemala Turkey Kenya she even went to Los Angeles in Chicago and looked at policing as a commons and saw how cops might be able to Mutually cooperate. So Instead of thinking about people as rational actors We're actually thinking of people as irrational managers together So what does that have to do with the internet? In 2011 she published a book with Charlotte Hess called understanding knowledge as a commons So what is knowledge right so in the digital world? We're talking about libraries online classes open source wikis. We have this The digital artifacts that we see online can be managed as a commons They are all used by and benefit a wide community largely for free and it costs close to nothing to share remix and copy their contents However, they can be polluted for example by actors that aren't doing right by the community So looking at knowledge as a commons as a shared resource if we're looking at it as what it actually is It allows us to really understand its possibilities and also what threatens it So what threatens digital knowledge today? So if we think about managing these commons as a blueprint There's like a blueprint for managing them and the resources knowledge We're actually more likely to see many communities Managing themselves rather than one monolithic community that manages the whole web or something like that. So Today what I'm going to talk about is our work on These particular areas public data scientific articles Scientific data and libraries and how we can apply commons theory to these kinds of things Along with decentralized technology So I work with Code for science and society. I co-founded this nonprofit We're funded by the Knight and the Sloan Foundation. I Founded it with this guy Max Ogden and This guy Matias Boos And We help we make tools that help scientists share data. That's our primary concern, although Your developers and you might find the tools underneath to be very interesting for things that you're doing So most recently we've been talking about scientific scientific data as a commons and it's a really good use case for commons So let me give an example of what's actually happening today Who's familiar at Elsevier and J store you guys familiar okay, not super many of you These are journals and Sort of publishing companies There's a lot of them, but basically they take public information information that was public Either funded by the public or otherwise generated by researchers and lamps and they literally build walled gardens around them There's paywalls They literally build fences. It's like what we were talking about earlier with enclosure They're literally doing the same thing. They're building a fence around the digital knowledge that you have to pay money to get access to This means that certain people with power can profit from it and it reduces everyone's ability to use it And it prevents innovation and it prevents the spread of knowledge. So this is a bad thing. We don't like this In the web of commons in a different kind of world You might see that users are in control and would be trusted to self-govern and manage this data together rather than giving it all to one company and So I want to return to Eleanor Ostrom and see how we can use the eight principles of common pool resource management that she Created this sort of algorithm for creating commons and how we can apply that to scientific articles and data So one of the first rules is you need to define clear community and resource boundaries So in this case the community is researchers librarians universities and labs and the resource that we're looking at is scientific data and code We're really focusing here on data because it's Really large and really difficult to store and nobody really wants to touch it It's a really expensive thing to manage. So we've been really tasked with this in particular And you want to match the usage rules to local needs and conditions? So that means we've been working with real users. So we've been working with the University of California Berkeley along with other places To try and make sure that this is all working together So one of the biggest problems they have is they have a server and This everybody comes to the server to try and get the data And if you only have one server for hundreds of terabytes of data bad things happen, right? So what we're trying to do is get away from these centralized and enclosed services We don't want to recreate the problem that is existing now. We want to have something that's decentralized. So We want to have something That doesn't have this 404 problem I don't know if you're going to understand the 404 problem But basically most of scientific research has a problem where it cannot Link back to the data that it was using And a lot of this is because of HTTP Literally if the server changes or you lose access to your Google Drive or something Then the links all entirely break if a grad student moves on or something like that So we're really interested in using a decentralized service that has content address ability where the link doesn't change over time But wait a second. Don't they have a good reason for enclosure like don't don't wait Don't we want to like have good control over copyright like why are we trying to? Decentralize and give away everything away for free So copyright in this case at least it means that the public pays twice So the public is funding the research and then they have to pay again to get access to it It doesn't really make sense, especially if we're able to reduce the cost of distribution to zero, which I'm going to try and do Another rationale given is that it's really expensive. So It's really expensive to host all this data and give it away, which is really true I mean, it is expensive to manage and maintain Databases and these kinds of things But we try to reduce that cost as low as possible by using a distributed network Kind of similar to BitTorrent So as more people come and download the articles and data and the more people that have it the faster and cheaper it gets rather than the more expensive and slower So the great thing about BitTorrent and compared to centralized services is that it's distributed like I just said It's also massively adopted and simple so you can build a client on top of BitTorrent They're future proof links. So if I link to something in BitTorrent, it's probably never the link will never change Even if it changes servers, the problem is is it scales really poorly Files cannot be updated and it's not really secure. So I can look and see everything that you're downloading all the time Psyhub has been and other kinds of places have been using tools like BitTorrent and there's been a lot of Debate about whether this is ethical or not to use a peer-to-peer or other kind of technology to get around copyright But the dream is and this is the dream that we're trying to do in practice And legally with the cooperation of universities We're working with the University of California And we want to turn their previously siloed servers I mean right now every university has a big server farm that isn't used like it's like 50% used or like maybe even like You know, there's only like 10 gigabytes on there 400 terabyte drives. So There's a lot of unused space So what happened if we connect these Siloed servers and we build them into a distributed network a Commons where they share their resources and you look at storage and bandwidth as a resource that should be shared and Mutually cooperated as a Commons rather than a competitive Commodity like it is today So going back to Ostrom we want to make sure that those affected by the rules can participate in modifying the rules So digital libraries will handle the power structures here. So we're working with the California digital library We want to make sure that rulemaking rights of community members are respected by outside authorities So in this case outside authorities are governments and corporations. We want to make sure that the universities are autonomous and are able to Manage their own data without interference one of the things that happened in Turkey recently is that there's been people who Haven't been able to say what they want to say in universities and they get Censored or even kicked out of their jobs for criticizing the government and it's even happening in the United States And other places around the world So we want to make sure that governments and corporations don't really have an ability to affect science science should be separate from these So what we want to do is the main rule that we're trying to optimize for when we're talking about Rulemaking rights is how do we have rules that keep the data online? So one of the biggest problems with BitTorrent, I know none of you have used BitTorrent before of course, but One of the biggest problems with BitTorrent is when there aren't any peers online, right? You see this awesome thing that you want to download and It's not there and this is Something that we really need to prevent from happening in this case because If there aren't any stable interested peers in the case of scientific data It's equivalent to a forest fire. It's completely disastrous and it's not it's not something we want So we want to develop a system carried about by community members for monitoring for monitoring members behavior So what do I mean by behavior? What are we monitoring members behavior? like we want to see who is doing their part to back up data and if One of the universities or some other partner who's downloading data Can't keep that data online then we need to be able to monitor that over time For example, we can detect how many copies are available in different places just like BitTorrent and compute the health of it For example, if we have data hosted at the Internet Archive The University of Pennsylvania and UC Berkeley. It's probably really healthy and has a low probability of ever going offline These are groups would also be considered members of a community that governs it And this is a lot more safe than say having like a hundred laptops, right? So you by able to like by being able to see who's hosting the data through IP addresses and that sort of thing You can make sure that the data is going to stay off online And We want to use graduated graduated sanctions for this so we might have like a score for how healthy the data is So for example, if one of them continuously keeps going offline We might start marking this university as less stable for example or something like that And we also want to provide means for dispute resolution. I don't really talk about that We build responsibility of governing the commons resource in a nested tiers from lowest level up to the Entire connected system. So that means we involve everyone so something that we've been building for this case is that by involving everyone we mean that anyone should have the permission to download the data and Reupload it so in this case users own the data It's very secure So unlike bit torrent people can't just sniff your network and see what you're reading Files can also be updated over time. So I'm going to talk about that in a little bit How do we keep contra content address ability but also update the files? Files can be updated, which is really important in science because often there are Retractions or problems and you have to go back and like edit some part of it So we keep it up to date and you can also see back in time like when the scientific article was Use what kind of data they used So it's really simple you just take a man line tool and we also have a desktop app You just type that share and the path to your data and it creates a link This is actually a public key that References your data on the network and you can give this public key to someone else and They just clone it and then they get access to the data They're connected to other people in the network through DHT, which is we're actually using the BitTorrent the same one that BitTorrent uses DNS we also have a DNS server that can connect people or you can connect to people over LAN And it just connects the computers together and figures that out. It's using a Special kind of key though, so we're not using to keep things secure. We're not using this key We're actually computing something called the discovery key, which is the hash of this key So only peers at the original link can compute this discovery key and find each other on the network So this means that universities can even share private data between them or people can share private data between you and Data that isn't yet available to the public won't be visible So it means that people sniffing in the network don't know what you're sharing and downloading unlike BitTorrent So this link never changes even as files are updated and You can verify the hash of the file system at specific versions and go back in time This is really good for what I was talking about before which is one in five articles suffers from reference rot Which is in insane right like out of one of five articles in science that are supposed to last forever You can't get the original data that they were looking at or link that they were linking to So how are we doing this? I'm gonna start getting a little technical So prepare yourself We're using append only logs It's a list that Only ever gets appended to So you append items to the list and so why are we using append only logs? It's in a simple data structure. It's immutable There's some sort of logical ordering to it And it's really easy to digest an index if you go on to our github and start looking poking around at the code You'll notice it's pretty easy to get started building your own application How can we share append only logs though? Over a peer-to-peer network where we don't necessarily trust people so people with laptops could be downloading this data and Like re-hosting it on the network. We have to make sure that This data is secure and that people aren't tampering with it So we use Merkel trees It's just a tree that verifies data and it's unrelated to Angela Merkel. Just Just to make sure it's clear So when you add data to dat you get this hash And it starts computing a tree where each root hash can verify all the hashes underneath of it and then the data below So as these trees get built up, it's really nice as you can verify all the data So if Alice wants to share a certain piece of data with Bob We wanted to make sure that this was really easy to do if I have a 300 terabyte Scientific data journal thing. I don't want to have to share 300 terabytes with you Right. I might just want to share one little piece. Maybe you want to file or a half a file And this is a really important part of this So Alice only to do this Alice only needs to share these intermediate hashes Not all the data and not all the hashes and Bob can recreate the tree from those hashes and Verify that she has the right root hash. So this is only this is only log in time. She's pretty nice But how do we do this in real time? So that can update data over time and maintains history. This is for static data what I just showed you When we start changing data Every time we append data the root hash Changes so in bit torrent Every time you append data or change the data you have to create a whole entirely new torrent network, right? We don't really want to do that. So what we're trying to do is maintain the peers that we've always had So crypto to the rescue We generate a key pair So as long as Bob trusts the public key, which is the link but we can as people add data to the Structure they sign it with their secret key Which verifies that the data has been added by the right person So as they're checking the root hash signatures, they just have to use the public key that they were given to verify those signatures Right now dad is only one writer, but there's a new Library called hyper DB, which I'll show you a link to and I'll tweet it later That has multiple writers and it's pretty experimental But you can start today with one writer So how do we turn append only logs into a file sharing tool because that's what we're really talking about Scientific data is files websites are just files. Most of what we do is just files So if we take a file and we cut it into pieces, we just insert each piece into the log It's pretty simple, right? The thing that's tricky about it is where do you slice so different kinds of binary files? Might want different kinds or might need different kinds of slices We also insert the file name and file metadata in another log So you can just grab the entire metadata of the data set and Without even grabbing any of the content This is unlike git like git forces you to download all the metadata and all the data at once because it stores the data in the Tree and we separate these so it makes it a lot more efficient and malleable you can read more about our paper Online my debt project org just take a look One of the coolest use cases that we've done recently is work with data work with the California digital library To back up data from data gov So when all the data is deleted from the US government website, we will still have a copy available on that So don't worry So when we downloaded this data we saw that most of that data is actually It's actually HTML. So if you look at the top, right? Most of this is like ordered by frequency most of the data is HTML It's not it's not like CSV files or zip files So most of the data is actually just links to web pages that also linked to the data so We need a lot of help, right? There's a lot of Data on the web that might go offline tomorrow And we need to make sure that that data is available forever, right? so we've been starting to download this data and Put it on our website. We have a registry where people can upload data Publish data under their name like github And so we have a California campaign finance data We have the backup of open data released before it was deleted this year By President Trump and then the names registry which is You know, it's a lot of cool data that we're trying to keep alive Someone built and this is his name is Richard Smith Una He built a distributed science journal called science fair with that so My use case around commons in science can also be applied to a lot of different applications So this is just one application that was built with that where you can search for a particular article and Go and get that article from the decentralized web. So wouldn't it be really cool if people had a decentralized peer-to-peer application where they could publish Their data and publish their articles scientific articles Without permission and anyone could get can get those articles You really actually have a host list journal, which is really cheap to maintain This is kind of the dream of of this app There's also another app built on that called beaker, which is a distributed web browser. It was built by paul frezzi and they Basically, it's just a fork of chromium where you can go to dat colon slash slash your link and then you can get The website that's hosted there. You can even have a DNS record That points to the dat URL so people that go to the the host list website on their chrome or firefox Can still see the website But then you can if they go to it in beaker and use dat colon use HTTP host list website It'll use the peer-to-peer version So there's lots of other community built Things here so we have ingestdb, which is a peer-to-peer database So you can actually use sequel commands on top of a database and that's peer-to-peer Beaker browser talked about that fair analytics is like a peer-to-peer analytics server. That's like Google like a replacement for Google analytics or one of these centralized services. So Check out fair analytics. It's actually really cool and really modular This is osm P2P DB, which is an open-street map editor That's peer-to-peer as people edit different people edit the the map You can see the history of the map and get it over peer-to-peer network There's also dat pki, which is a public key infrastructure for dat. That's how you can Create friends and contacts over dat and publish a list of dats and stuff like that so you can take a look at what we've built on our github and Hyperdrive and hyper DB are the two underlying JavaScript modules that really manage a lot of The more complicated stuff the merkle dags and stuff that we're talking about here And if you wanted to play around with that and node, please go ahead and build your own app So let's take a step back because I'm gonna wrap up here in a minute, but maybe One of you is asking right now So why aren't you using like blockchain or like aetherium or like doing some ICO and like, you know Why aren't you being all with all the cool kids? So I have to take you back to the beginning of the web And what really inspires me and inspires us on our team Some of the most fundamental people this is on the left. This is Tim Berners-Lee He invented www and then on the right this guy's meant surf and he invented TCP IP They came from research labs And look, that's me. I'm at the decentralized web conference. The Internet Archive was pretty fun They gave away their protocols for free to the public Simply as products of scientific inquiry there was no like profit motive Their only judge was science not profit So with a commons approach to the decentralized web the most ideal approach is guided from where we came I'm excited about creating protocols that are easy to use and develop with and extend without asking for permission I Don't want to have to pay money if I don't have to And I believe that some of the best tools aren't created for the market at all They're funded by donation built for the public good and given away for free So these protocols that I want to build should be I want to optimize for science and Collaboration rather than optimizing for profit So today we look at the decentralized landscape. This is the decentralized web conference last year So look at the decentralized landscape in the context of what people were doing back in the day and wonder if we're continuing their legacy 1.6 billion dollars 1.6 billion were invested in ICO's initial coin offerings in 2016 alone So this is like a huge growing Way to raise money and to to build applications. I mean Almost all of the money is going there, right? So why aren't we doing one of these? Why aren't we doing an ICO? If you really look at what they propose Many only offer siloed internets that are privatized with money being invested into coins That may or may not even ever be available in production So you're putting a lot of money into faith into, you know, a small team that has a lot of control over Over the coin and how it operates Web of commons is not blockchain We're not blockchains in bitcoins assume that I don't trust anyone in the network and This is pretty much the opposite of what we're assuming with the web of commons Everyone in the network is trusted and those who aren't following the rules are kicked out What we're trying to do with that is build tools for the common good That can also be used for other fun things and Use sound technology So decentralization I believe is not just a technological problem. It is also a human one. We have a lot of work to do So if you want to donate we're Donate that that project org we're a 501c3 so you can get a tax deductible Donation if you want and Don't forget to check out our stuff online and build something with that So thank you Find me on Twitter I'll be tweeting those links. I found that really interesting. Thank you I also thought the live captioning on there was absolutely amazing. This is going to New York and back I'm being live captioned by a person there in real time, which is pretty mind-blowing There were a few questions. I think you answered one of them just Which was is that similar to IPFS? Which is the interplanetary file system, but I think you just answered that one Don't know if you have anything else you want to add on that That is similar to IPFS in its construction and kind of but it's different in its goals, I guess. Yeah, yeah A question by Isaac, which was I share your dreams But what if I want to be published and say one of Elsevier's journals Where they make you grant all the rights to them. What happens then? Sorry, so if you get published by Elsevier, they make you grant all of the rights to them Could they still use could you still use this? How does that limit it so? Scientific researchers already have this little loophole when they give away their copyright, which is that they can Publish their pre-print, which is a slightly modified version of the one that they do in copyright And they can publish that pre-print on like a university server or on their public website So they could probably use their pre-print, which probably only has like one word that's different Right. Okay. Yeah, and that counts Glen asked How do you protect copyright files? How do you make sure someone doesn't upload a game of thrown episode to this? I mean, you can use that for anything. I mean, that's that's kind of the great thing about it Yeah But it the way we've built it is optimized for data sharing And we don't condone or support any illegal activities. Nice and Someone else James asked the public keys for trust relationships need to be shared over some other medium Do you have a preferred way of that happening? Sharing links over preferred medium. Yeah Right now we have a website that you can go to and log in and publish links with short links short names So you can say have an account like in GitHub and then create a short link for your data set And people can also go and view that so we go to dat project org you can log in and publish a data set But you can also create a DNS record for it like I was saying before so it's just an HTTP Website that has a dat compatible Link to it inside there. So either one I think is a good way Okay, but I mean if you wanted to keep things really private and share data You could also share the link over something like signal or some encrypted channel Which would mean that the link in the data that you share is entirely private. Okay, great. I think that's it. Thank you very much Thanks