 Internet has this funding myth that it's a very decentralized thing, but if you look at the actual usage today, much goes through a few centralized commercial entities like Google or Amazon or Facebook. And there are many single points for authorities or other interested parties to throttle or censor access to content. Our speaker, Will Scott, who is a recovering academic who has worked on distributed systems and security, among other things, thinks that the Internet should not have these central points of control and wants to tell us about the building blocks of decentralization that will allow us to build another Internet that is less centralized and more resilient to central attacks. Have fun with the talk. Great. Thank you. Yeah. So I'm really happy to be talking today, and it's been great to see so many people at RC3, even if just through my screen. In reflecting on this talk, I think there is a couple of messages, and the primary one for me has actually been really hopeful that when I think about what we have with decentralization and what's getting built, the trajectory actually looks promising in a lot of ways. And I think the second message is that when we think about the technical building blocks or decentralized systems, we really are at the very beginning that there's a lot left to do and there's a lot of work ahead of us. So I want to start with a story. And that story is a story of community and of building new systems. And the first step and the first question is how do you build that community? How do you find that community? And I think the answer that we have is things like this event. And this is one that maybe we don't have all the systems of discovery. And there's a whole talk here that maybe we don't have time for, which is how do you find the people who share similar interests? And the place where we can start to think about decentralized technologies is instead that you've got this community and you want to sustain it. And we actually have a bunch of decentralized tools and a really rich ecosystem to do that. So we can find our community and we can talk with federated systems like Mastodon. We can have video chats with Jitsi. We can host our own files, store our own data. We can collaborate on software projects by running GitLab. And so when you think about decentralized systems to help a community be self-sustaining and independent, we have a set of fundamentals that do this well. And what we're doing is not developing new things, but it's reducing the barrier of entry. And so when you think about the sorts of things that are happening today is you're finding things like instead of having to find someone to run a server and run a complex thing like GitLab, you have new systems like Radical that can make that easy. So people just run a piece of software and they don't need to worry about the complexity of deployment. So we're reducing the barrier of entry, but we already have a set of building blocks that allow a community to do this. So you've got your community, you've around an idea, around an ideology. And you can build software and you can start to make this idea grow. And so when you think about that, the next step is how do you make services, how do you make ways for that idea to get to more people? And once that stops being a thing that you can just self-post that you would run on your own servers, we have building blocks now. We have underlying systems and design patterns for how do you take your service and allow it to scale while maintaining independence. And these are somewhat less developed. I think we'll go through this in a little bit, but the user bases are maybe one or two orders of magnitude smaller than the previous ones. But you can have files go on a decentralized CDN like IPFS or a number of others. You can use distributed database abstractions, be they a gun or a star or hyper swarm or others, that will allow messages to get passed in a decentralized way and allow for some amount of synchronization and collaboration. So that a service can go out to millions of users without needing to fall back to centralized technologies like Google or Microsoft or Facebook or et cetera. And so you can, I think, not only have things that support your community, but can then reach other communities. And so we're sort of following the trajectory. This is less developed. And then we can go to the next step that's even less developed. And this is now not only touching a virtual service, but starting to interact with the real world. Because when we think about the successful technologies that are the Google's or the Facebook's, the world, they're not purely digital. They go into our real lives. And this is a part that's still very much in development, which is cryptocurrencies are a whole thing. But the thing that's maybe interesting is they help us bridge into the real world financial system. And so our services still maintains independence while being able to touch that aspect. And likewise, the promise of the DAO or of decentralized autonomous organizations is that you can maintain decentralization while interacting with real legislative, political, et cetera systems. And so the thought is rather than having some core owner who has to have power, you can encode beforehand a contract of how this organization will work based on the principles and have that power remain decentralized in that there's not a single person that the existing real world regulation systems see as the source of all of this accumulated power and go after and are able to then influence or manipulate apart from what the community wants. And so this takes the form of these evolving and developing systems like oracles, how you have something that exists in a decentralized way and then have it be able to interact with existing real world systems. And, you know, the answer ends up involving both advances in cryptography and in sort of thinking about how you find hooks that you can make these bonds of interaction. So what I want to go through in the rest of this talk is start with a grounding on a set of things that I would consider decentralized systems to give us a sense of scale and help ground us. And then I'm going to abstract one layer to talk about the underlying models and technological systems that are common to a lot of them. And then talk about sort of two ways to think about the limits there. One is, you know, as these things grow, what are the emerging pain points that we see that we know of that are in terms of sort of the existing technologies will, you know, are running into pain points. And then the other one is the more model properties. So these already are failing us in some ways. And so how do we how do we think about whether there are alternatives that do not have those same properties? Are these things that we can bolt on or do better with in the current systems? And how do we do that? So in terms of existing systems and where we are, BitTorrent remains probably one of the largest decentralized systems, in a sense. There are, you know, three million, four million active users out there on any given day. Which which starts to reach like meaningful percentages of the global population that are actively using BitTorrent. We can think about one other aspect of BitTorrent that's sort of interesting, which is you've got a lot of users. But then you've also got this this sort of metadata layer of finding new torrents and that finding new torrents core that needs to exist to allow you to discover the peers that you are in in your specific federated instance of BitTorrent comprises of something on the order of 400 open torrent trackers that will maintain metadata, but also be a Cademlia DHT, which is a distributed hash table that is made up by more normal peers that store metadata. And there's roughly four million torrents that are contained in that distributed hash table of metadata about who is participating in which torrent. Another federated system that's had a great year is Mastodon as a exemplar of activity pub protocol and the Fediverse more generally has on the order of three million users. The activity pub Fediverse more generally is something like 50 different projects. There are on the order of 5000 activity pub servers. There this fall, I guess, there's been measurement from the community data science collected academic group looking at activity on Mastodon specifically. They find something like 70,000 tooting users who collectively to roughly 30,000 times per day. And this measured activity versus total users is not an unexpected ratio. On pretty much any social network, you would expect that the majority of people there are consuming or not posting public content. More broadly, when we think about federated things, there's a lot of even foundational internet technology like email that is federated still. There are on the order of 6 million email servers, so different IP addresses running the SMTP protocol. There are a bunch of WordPress servers as well that federate through comments and pingbacks and so forth. Jitsi reported this year that they have on the order of 20 million users of their primary service infrastructure. They didn't say how many independent installations there are because that's much harder to measure. And Matrix had a great year and has grown to two and a half, three million users and has at least 11,000 independent instances. Moving again on this graph towards external impact IPFS, past 2 million users this year. They have a DHT using the same Kedemli protocol as BitTorrent, but self-select a smaller core of active nodes to participate in that DHT as a way to try and provide better latency and availability guarantees in it. And so their DHT is composed of somewhere in the 5,000, 10,000 DHT nodes. Of those 2 million users, somewhere around 20% or 100,000 are running sort of a desktop instance where they're staying on and are not being accessed through a web browser. And then participating in the DHT, one finds something on the order of 20 million different pieces of content that are being stored in IPFS and in a given day, something like 8 million that are being retrieved through that DHT or finding metadata about who has them. Secure Scuttlebutt is an offline first decentralized network. It's got a order of 10,000 users and it uses public servers as a mechanism for storing and forwarding when users are offline. And it has on the order of 100 of those. Previously I mentioned Earth Star that uses a similar partial replication and this network is really going further on the decentralization and meant for small independent collectives than many of the others. And then finally, Bitcoin has only about a million active addresses or accounts and about 11,000 nodes that are running the full Bitcoin protocol and this is not a dissimilar ratio to what you'll find on all the other projects of this type. And so again, you've got this fairly large gap between users and servers or the elevated nodes that are doing more of the work in coordination. Part of that we can see in the Bitcoin case is that in order to validate and run the full protocol, you need to have all the historic data which is up to about 300 gigabytes in this case. So the cost of running this full node is nontrivial. It's not something you can run on a phone or even really on a local desktop necessarily. So from these we can then say what are the commonalities that a bunch of these systems share? And I want to start by, I included in that you might notice a lot of federated systems and we can think of federation as this partial decentralization perhaps that we've gone from a central Facebook or Google where there's one entity to these federated systems, Matrix, Jetsy, etc. etc. where there's a lot of instances of a server but the server is distinct from a client and we can then there is this distinction to be drawn of well okay there are things like BitTorrent perhaps although BitTorrent has again a tracker that's sort of like the server that's separate from the the clients although the tracker is you know getting phased out to the DHT. And so the point is and this is you know I think why I think of federation as a lot of the part is that what federation is doing in some sense is it's extra externalizing the heterogeneity of resources which is you've got some nodes that are more powerful that are going to be always on that have bandwidth and in a federated system we we sort of explicitly run a different software on them and that indicates that that is this is the server software and in systems that we think of more traditionally as you know true peer to peer within the software there end up being heuristics that try and do the same thing that try and guess should this node take on more of the coordination work because one of the things that all of these networks want to do is make efficient use of resources and those resources are not evenly distributed right some some nodes have more disk space have more bandwidth have better availability and your system is going to perform better if you're able to take advantage of that and so we can and and I think you know it I don't know if I think of this as cheating but you can think of these as two different problems which is there's one problem which is how good of heuristics can I have to self-select nodes into being servers versus clients and then how do I make decentralized protocols and those are those are somewhat different and and and as a result you can say do I need to figure out how to have good heuristics or do I just start with federation which seems to be getting us a couple orders of magnitude more users and more success and then I can make packaging and make auto selection of client and server and and opting into those positions be a thing that happens independently of that or happens later as a different problem to solve the counter right is is that you you think of message passing and and and how your communication model will happen purely in terms of a single node and this is this is you know the the natural thing that you end up with when you when you do that and you go on the full sort of I'm a node and there are other nodes like me but let's let's take the view of a single user or single node is you end up with something that looks like gossip this is what secure Scuttlebut uses in a sense or is based on initially and and the basic concept here is I get a message and I want to then send it out to everyone else I'm connected to so that message is sort of disseminate through the mesh through the decentralized mesh and we have a bunch of optimizations on that initial concept so things like I will then send out you know an identifier of the message to see if the person I'm going to send it to has already gotten it and then if they have I'm not going to send it so that I don't waste the full bandwidth of sending that message over every edge but one of the things that's true when you when you start there is there's no concept of the structure of the network itself right because you're you're sort of considering a node in isolation what you what you don't have is any ability to then pull back from there to look at the the full network and start to say useful things about that and so one of one of the things that is tough to say is like will a message get to its recipient how quickly will it get to its recipient because these are all questions that aren't about the single node but are about the the structure and the connectivity of the network more broadly and so it becomes hard to mesh this sort of communication with things like when should I form additional communications or additional connections who should I remember and questions about how you maintain a strong network topology and I think I think one of the things that we maybe need to be thinking about is how do you combine those two together so that you can start to have better guarantees and better understanding of the actual dynamic properties of these decentralized networks because we end up with very few properties that we can say useful things about in this sort of system of message passing by itself in isolation to give a sense of a somewhat more concrete building block that ends up being the other one that we get used that that's distributed hash tables and so distribute a hash table which which many of you have probably seen at least in passing is that we've got a bunch of data and we've got a bunch of computers or participants in our network and we're going to come up with some identifier maybe a hash of both participants and content and we're going to put them in the same name tag so it's the same hash right and then the the different nodes the different computers will store the content that is close to them so so you know I'm you know you know person at position three I'm going to store all of the content that hashes to four five six until there's some other node so so I get some section of the you know space of content based on where I hashed to and then when I went to find content I find the node that I know about that is closest to that content and I ask them and if they know someone closer they'll forward me on to that person and so I end up finding the node that is responsible for that content and this as an as an algorithm there's a few different implementations and and dashes are in wide use one of the nice things is that it's an abstraction that you know from outside you can sort of think of it as a central database a centralized database that that I can put stuff into it and I can get stuff out of it and I don't have to think about the dynamics within and it has this nice property that it sort of grows as as you have more users you can expect the users to each bring data but but that the data per user isn't necessarily growing and so it's you know efficiently keeping data across all the different users the other building block that we have here is coming up with consensus how do a set of decentralized nodes agree on something right this is this is what we pioneered or this is this is what cryptocurrency really is relying on is is you have a bunch of nodes that want to all agree on something but but that's you know a much more general problem we've had this problem and have a set of systems for it in centralized systems for a while and it turns out that's more efficient but when you think of what the problem for consensus is in those cases they've actually been able to get away with a weaker threat so paxos raft these are protocols for consensus that work within a threat model of fail stop and what that means is nodes that are broken may not respond they may delay they may freeze they may crash and despite some threshold of nodes failing or crashing or not responding the system guarantees that the other nodes that are still working will agree on an outcome but in a decentralized world that that ends up not being quite enough that what you want is a stronger notion of what a bad node looks like and so that's what gets called bft Byzantine fault tolerance and that is tolerant of Byzantine nodes Byzantine nodes being nodes that can act maliciously so they can do they can send arbitrary messages so it's not that they just fail to send a message it's that they send a message indicating you know a different thing than they should have and so bitcoin came along did proof of work in proof of work you are using sort of the scarce resource of computation so how many times you can hash something as a sort of a race that gets used to as long as the you know majority of the network is all following this protocol even if someone is trying to do malicious things they don't have enough power unless they are the majority to mess that up and that's that switch now I think because of you know environmental concerns and so forth to proof of stake primarily and proof of stake is instead of using computation as that scarce resource that's using how much of the existing resource you have as that and one of the and so this is you know the there's a whole theory in here about things like if you already have a bunch of it you don't want to lose it and so you're more likely to not do something malicious but in both of these cases these are not systems that lead towards additional decentralization and and one of the things that that I think you know is at the heart here that of a lot of the problems for decentralized systems is this question of symbols and identity which is within the system itself we don't really have any firm notion of identity so like who is a user what you know is some entity you know or some collection of users acting on behalf of one user or are they multiple users we can't meaningfully enforce that when you think about how you enforce it even for centralized systems like the facebook or google they go to governments and they go to you know this like I'm going to identify you as a citizen or a person with some you know a driver's license and if you don't have that external authority that that stops making sense as a way to like appeal to authority and so instead a lot of effort goes into trying to think about incentives and trying to make the system not make it worthwhile to appear as multiple users when you are in in fact only one but but the flip side of that is that the incentive actually incentivizes centralization because it it ends up meaning that it is better to be a single large entity than many small entities and so that means that if it's better to be the single large entity that single large entity becomes more and more powerful over time okay so that was a set of building blocks let's let's go through limitations on these building blocks so DHTs are something that we have and they do grow with users however so as you get more users you can have more content and the load and burden on each user in the DHT does not increase however there's a set of applications that you just can't really do with our current understanding of DHTs so if I want to do web search and put identifiers for all of the web or for really any you know many millions of identifiers suddenly that's a lot of load that I would be needing out of each user on the DHT right so you know we can do something like a few million torrents total based on that user base but are you going to be able to store hundreds of millions of items you know all of the individual files and make those individually addressable that suddenly becomes really hard and so this is a question of naming and thinking about you know what is an identifier that I want to be able to search for or look up what are the things that I'm putting as the sort of entry points for lookups the other one is an interesting or limit here in terms of scale that these things don't don't sort of fully scale is that a lot of the applications that we want are are interactive in nature so we want to be able to look up something and get a response in an interactive way that is you know compatible with web browsing or some sort of you know I'm there as a user and DHTs take something on you know asymptotically log in lookups to find content so I find the note I'm going to do that's closest to it they tell me someone who's even closer as a DHT grows that goes from two or three people that I need to talk to two three or five and if each of those is taking you know 100 milliseconds that starts growing quite a bit to a point where even though it is log in which is great it's it's more than constant and it stops being competitive with with our centralized systems the second limit is gossip as a gossip network grows larger we start to really find ourselves in a situation where we don't expect messages to necessarily end up going all the way across it and so we stop being able to say things about like you know I'm in a local neighborhood but I can't necessarily find a friend through the network because it becomes way too expensive for all messages to go to the whole message the whole network and so there there is something about network structure and about how you think about things beyond gossip that we need in order to scale those networks and the final one is is an infrastructural one which is that there continue to not be real world incentives that are pushing the infrastructure that we have to support decentralized technologies so it's still true as it has been for decades that a normal user's upload bandwidth is about half of their download bandwidth this this is true this is a broadband fixed line one but it's also true for for mobile but but likewise when you think about like latency between end users when I think about a centralized system the latency for me to get there keeps going down because they keep building edges closer to more of their users and so on average they actually end up getting closer to their users and and lower latency but when you look at and this is a ripe Atlas probe you know 2015 or 2010 till now the the latency doesn't actually go down much like it it goes down and this is this is a bunch of nodes trying to get into Europe and the latency from from Russia goes down a little bit but India is staying about the same and we don't have money that's getting invested into the the links that are are getting translated for traffic between two different end users because that's not something that there there's money that is incentivizing and so until we figure out a way to solve that we're going to end up with a very sort of static latency and bandwidth profile okay I'm gonna I'm I think running into question time but I will finish with sort of the the more meta properties rather than than these scaling limits that we need to think about the first is coming up with a much stronger model of metadata exposure and privacy we have end-to-end encryption we're able to protect the contents but we don't typically protect either the size or communication patterns of who's talking to whom and without servers and intermediaries I think basically all of these decentralized systems less so federated but even there end user IP addresses and and the you know identity of of where they are and what they're connecting as ends up being something that is used for the disintermediation and and as such there's no real guarantee about how private you are or who could learn that you're participating or who you're talking with and so coming up with models where we get some limits on the retention so being able to say things like you know at some point after you have stopped talking the system will not continue to hold on to your IP address or who you've talked to your communication patterns is important and is not part of any of these models and so I'll end with two things that that I go back to when thinking about this the first is a paper on the impossibility of full decentralization in permissionless and this is on blockchains but but persistent permissionless consensus which basically it is the impossibility result that I brought up earlier when when talking about how in order to disincentivize symbols we move towards we move towards in encouraging centralization and I think this is a framing question which is the the incentive to decentralize is actually a an incentive that happens in a larger picture it's not the system itself but it's the entities around your system that that are the incentive to not have centralization right it's the government it's the regulations and and it's it's the other dynamics that that are why you would want to not be centralized and we don't have that in our model and likewise the tech policy perspective is a is saying essentially there are these very powerful existing entities that will attempt to identify the points of power and regulate them and that ends up being why we need this decentralization in some sense is to prevent being co-opted by existing systems of power that that's that's where we find the motivation for decentralization so I will end there I guess we have maybe a couple minutes for questions you're muted for me yes thank you we have a few questions from the audience the first is what do you think is the future for decentralized social media will there have to be a major event for decentralized social media to gain more traction it's a good question right so so social media I mean is an interesting and problematic beast in its own way right which is this is this is something that we haven't really traditionally had in the same way and and so one of the questions maybe to step back is is what what do you actually want having the the the part of social media that is you and your community is in your friends I think we already can do and we'll continue to see more sort of localized things because they they give us the same experience figuring out how you broadcast a celebrity or these these sort of more large-scale zeitgeist ideas is is where we are much less developed and so part of that less developed also means that there is more potential for further technological development to make that more possible so I'm optimistic that we get there but I think that's further off okay and another question related to that but probably more into direction of user experience can we abstract the decentralization decentralization to ease the use for the average user yeah I mean and and there's a couple of things there right which is we are certainly have libraries and patterns that other developers can build better user experiences on I think we're seeing also that there are easier to use sort of end-user compatible systems that that are views into decentralized networks so be that the things like manyverse or planetary which are mobile apps that came out this year for secure scuttlebutt or radical and these just sort of desktop apps that provide get or other types of decentralized network views that that don't require setup or work we're getting better user interfaces and and we're having building blocks that that are not themselves particularly visible or required configuration and a more technical question that just came in what do you think of the trade-offs made by single hop DHT's and their huge local routing tables right so this is this is an interesting question and it's it's where a bunch of thought is being put right now and so so one of the things you could say right is in the same way that a federated system is externalizing this this disjoint resource or resource heterogeneity between sort of highly available nodes that might make up your DHT and the sort of end-user more transient nodes that that are querying it you could have a DHT that tries to self-organize in a to to take advantage of even more heterogeneity so you find nodes that are you know more powerful and you have you know multiple players or or a hierarchical DHT where some DHT nodes forward queries into a smaller center that's that's more powerful and you end up with something where you probably can get rid of some of these like log n style latency things to to make them be able to scale more at the expense of something that starts to look like more centralization that there end up being a smaller number of nodes in the center that if they do go down or if they decide to send their things they actually can have really they end up with a lot of power in the system which is starting to be scary so so there's a trade-off there and next question given the popularity of cadamlia have probabilistic probabilistic models one the decentralized architecture or are they better suited to decentralization it's a good question um i think cadamlia i don't know if it won just because it's probabilistic i think it has something to do with uh it's fairly simple to conceive of right like we often uh say these you know these distributed systems are really hard to reason about um and in some sense that's like this you know i've done one thing and it was uh and and i'm out of ideas um that they're they're you know one thing that i took away at least from that set of building blocks that i presented was there were only three or four things that are underlying all of these things we really have not looked through a lot of this design space uh and so i am not at all unconvinced that trying something radically different we can find other models of decentralization that are more performant and that have different properties and we just haven't explored enough so i think that's a no okay the last question until now do you think decentralized writable storage like ipfs is threatened by toxic data sets like libgen no okay so so there's an assumption that libgen is a toxic data set but uh beyond that um i mean i think there's a there's a very reasonable story for any of these systems for how you sanitize them or make them compatible with external power structures um and so in an ipfs like thing uh users are opting into pinning or storing uh data uh as notes so so it's only either you've um you've decided to pin uh data in order to make it reavailable to others and so um there are you know as a node if you get a dmc complaint or you get a complaint that something is is illegal you can blacklist it you can say i'm not going to pin or reserve that data and so there's a way for each individual user in the system as they get complaints to um not make that data reavailable and limit their liability so so you'll get you know as as there's different uh regulatory environments people will be able to comply with their own regulatory environments um and and so i i i think that you know is probably enough you know um the the the the bigger question is like okay does that cause the software to be seen in the same light as bit torn and that's that's uh i think a more philosophical question um and and is as much a like a perception and finding good uses that you can counter things with as anything else okay thank you look for the questions thank you for the interesting talk and for being here answering the audience questions thank you this is great