 All right, let's get started. Today I want to talk about block stack. And the reason we're looking at this paper is that it touches on three questions that I find quite interesting. One is maybe at the lowest level how to build a naming system or really a public key infrastructure that maps from names to public keys. And this is like a very important question. Nobody has ever really figured out a convincing way to build a general purpose global public key infrastructure, PKI. So any progress in this area is interesting. Another reason I'm interested in block stack is that it's a non-cryptocurrency use of a blockchain. So it's just an interesting question whether blockchains are useful for anything other than financial stuff. And finally, and maybe most interestingly, block stack is really a proposal for a very different architecture for how internet services or really websites, how websites ought to be constructed very different from the way they're constructed now with quite different properties. And the idea is that maybe the kind of approach block stack takes might yield websites that are better in some ways than current websites. Now, block stack's a real system. There's a company that's developing it. It's in use by some applications. Does have some users. However, you should view it as moral work in progress than kind of this is the final answer. They've been developing and making it better over some years now. I don't think it's really to the point where very many people would decide to abandon the way they build websites now and switch to block stack. But it's very important that somebody out there is exploring how things could be different and better. And block stack is one of a number of different projects that are trying to push in a different direction for the overall architecture of websites. All right, so the pitch from block stack is that people ought to be building decentralized applications. So what does decentralized mean? This is sort of an idea that's been in the air for a couple of years now. I think maybe the best summary is that it's applications that are built in a way that moves ownership of data out of centrally controlled websites, like ordinary web servers. And one way or another puts control of users' information more into the user's own hands so that it's sort of realistic to say that users actually own their own data instead of Facebook or Gmail or whoever sort of essentially owning their data for them. The success and the interesting properties of Bitcoin have been a lot of what's driven the recent activity in this area. It's kind of an old idea. It dates back at least a peer to peer schemes like Nutella and Napster from around 20 years ago and farther back than that. But Bitcoin really prompted people to think hard about this and to sort of have a bit more faith that these kinds of ideas could be realized. All right, so I want to kind of outline first what a sort of centralized typical current website looks like. Talk a bit about why some people aren't really pleased the way current websites work and then outline how things might work under a decentralized scheme like Blockstack. So this is really current websites. The deal is you got a bunch of users sitting in front of browsers and there's a internet, they all talk to the internet. You have some website out there, maybe it's say Gmail or something and Gmail has a bunch of web servers that are owned by Gmail, owned by whatever the web services and sitting behind the web servers is some kind of database. It's kind of familiar picture for us. So these individual users, this database holds the user's data. Like if you use Gmail, your user one's Gmail is sitting in Gmail's database somewhere. This is a database server that Gmail owns and has control over the kind of logic for how Gmail operates essentially sits in the web servers that are owned by Gmail and sort of talk to the database to get out of your data. All right, so there's like totally nothing surprising here. And this is the way almost every website works. This is often some JavaScript or something and sitting in the user's browsers but all the kind of critical stuff is sitting in web servers or some kind of servers that the website owns. So for different websites, the day in the database is gonna be things like blog posts or mail or comments you post on other people's on Reddit or something or maybe it's your photos, your calendar, or maybe it's your medical records or something. There's a lot of data that's out there at various different websites that is in some sense, the user's data, like it's really the user's Gmail, but gosh, it's Gmail or it has control over it or Reddit that has control over the user's comments on other people's articles. Now, this setup's been super duper successful, right? It's actually, one of the reasons is that it's extremely easy to program. You know, all the logic here is in running in servers controlled by Gmail. They can talk to these databases draw often things like SQL databases that are very flexible query interfaces. There's like no restrictions on what data can be accessed. Like, you know, assuming this is eBay that's running here. You know, the user's bids are sitting in eBay's database server. Now, the bids are quite private, right? And if I'm bidding on something, I don't want other users to see it, but there's no restrictions on what eBay's web servers can look at. They can all look at the bids in their own database. They can look at other people's bids, find the highest bid. There's really no restrictions. So it's very, very convenient for web developers. And it's very successful. And so from that point of view, we should be skeptical that anything else could possibly be this successful or overtake it. But the reason why you might not think this current setup is perfect. Well, there's a bunch of reasons that users might be dissatisfied. One is that if I store my mail in Gmail, I really have to use Gmail's interface to get at it. Maybe they provide some other ways of getting at it, but I generally don't have a lot of freedom. Gmail sets the rules for how I get at my own email. So I might be a little bit irritated. It's my email, but I don't get to choose what interface I use. I can't just use any software. Pretty much has to be software that Gmail provides or Gmail supports. For situations like maybe Facebook, where other people might sometimes see my data, it's really the website that gets to set the rules for who gets access to my data or how, if at all, I can control that access. And websites often are a bit murky about their promises about how they enforce this stuff. And again, if it's the user's data, like if it's my photos or my posts or something, it's really kind of not that great that I don't have too much control over what the website can do with it. Another thing that people complain about with the current setup is that the website can snoop on my stuff. Like Gmail wants to look through my mail. They might have good reasons for it. Maybe they're training their spam generators, so that's okay. But maybe they're looking at my email to think about showing me advertisements or to tell their advertising customers what people are interested in these days. Worse, there's a chance that some of the employees who work at the website are corrupt and may be snooping on people's data for personal reasons. So maybe the company that runs the website is perfectly above board, but can't necessarily claim that it's not always true that all of the employees are following all the rules. Anyway, so people have a lot of kind of, or some people have reservations about the way the current system works. At a kind of more design technical level, one way to view what's going on here is that the main interface here is between the entire website and the browsers. So it's HTML that's flowing back and forth between here. Typically the websites and the database are sort of integrated on this side of the interface and all the browser really gets to see often is this HTML kind of final packaged form of the data. And it's a very user interface oriented representation HTML is and sort of not has nothing to say about the data itself or how the data is controlled. And the much more interesting interface and where this whole discussion is kind of going is that this is a much more interesting interface because it's much closer to the data but in the standard setup, there's no real boundary there. This is sort of the internal business, how this works is the internal business of the website. All right, so that's the existing plan. The plan, the kind of plan that BlockStack is proposing. And so there's a number of kind of ideas for how decentralized apps might work. This is kind of one of them. So I'm not gonna call it BlockStack yet because it's kind of much a very simplified version but we'll just say it's a decentralized architecture. And here the game is that, we still have a bunch of users and users run iPads or browsers or something but we're gonna in this new decentralized scheme we're gonna put all of the app code is gonna run in the client machines in the user's machines. And so this is much more like a sort of traditional or it's like downloading an app from the app store on an iPad or buying sort of old style PC hardware like buying a copy of Microsoft where this just runs on your laptop just buy some software you run on your laptop. So no longer running the application code in web servers. Well, if all you wanna do is use data on your own laptop your own data on your own laptop then we're done but what's really interesting about web based or about internet based applications is that you can store your data in the cloud and that means you can if you have multiple devices which most people do you can get at your data from any of your devices from your maybe your iPhone as well as your laptop and if you store data in the cloud somehow that means you can share data with other people and build multi-user applications like eBay maybe your Reddit or who knows what shared calendars. So the other half of the decentralized vision is that there's gonna be a storage system some sort of cloud storage system out there by which we mean I mean some sort of service that you can buy maybe from Amazon AWS or who knows where which will just store data for you. You can stick data in it, it's your data they store for you back it up to may hopefully some sort of access control so other people can't get at it and then you can retrieve it later from any of your devices. And so now if we're building some sort of for single user applications like I just need to edit some documents but I wanna keep them in the cloud maybe user one is buying storage from this storage server maybe it's Amazon and user two is buying storage from maybe Google's cloud storage system. For my own data I just talked to my own my application code talks across the internet to the service storage service that I buy storage from presumably pay for it myself and user two talks similarly to their storage service but if we run applications that are built to allow users to share data then there's the possibility that if I know how to talk to YouTube storage service I can run an application that reads data that they allow me to read as well. So if you wanted to build some sort of Facebook like thing on this the application would know who my friends are and reach out to my friend's storage looking for updates or photos or who knows what that my friends have stored in their own storage. So that means I instead of contacting with Facebook's website instead in this new model I would download an app from Facebook and run it and that app would sort of know how to find my friends and look at the data that they're storing. And you know if my friend uploads a photo to their storage it's really still it's their storage they're paying for it it's their photo they can use it with Facebook or they could use it with other applications too because the applications are really quite separate now from the data instead of being combined in the existing architecture. All right, so now sort of the technical level the storage interface is now the main interface so now we have some sort of put get or read write or who knows what interface as the main interface is no longer HTML it's really a primary interface we're worried about is the storage style interface which is a much nicer interface to write applications to than HTML is. Furthermore as I mentioned there's a much in this architecture where users really own and pay for and organize their own storage there's a much clearer notion of data being actually owned by the user and controlled by the user much like you own the data on your laptop or in your Athena account this is you ones account here in the storage service. Of course there's a number of but now we're very interested in the design of the storage system because now this is instead of being sort of hidden away inside websites this is now the primary interface in the system so we care a lot about how it's designed. So first of all it's quite critical that it be a internet service out in the cloud so we can get at it, get our data from any of our devices. It really needs to be general purpose that all the application codes here. So we don't now in this architecture really get to have application specific code at all on the server side because the sort of servers don't really have anything directly to do with applications. So we need a general purpose interface that's powerful enough to let us do whatever we need which is a little bit difficult to design. We now have the storage has to be paid for and now really the most obvious person to pay for it is the user themselves. Now maybe they're willing to do that, maybe they're not. We'd really like to have this sharing but we also want to have private data and maybe we only want to share our data with certain other people. So we need some reason. So the storage interface and the storage system one way or another needs reasonably powerful sharing and permission access control systems. A more subtle issue is that I may run multiple apps some of which I don't trust. If I just download some multi-user game from the internet maybe I don't want it to be able to look at my email while I'm playing that game. So that means that as well as having a notion of sort of this user with this user's permission we may want to have kind of subsidiary permissions where we can talk about not just this user as a whole but this user when running application two has certain permissions maybe just game files. This user when running application one is allowed to get at the user's email as well. All right and interestingly enough notice is that this storage interface is not as much of a stretch as it might have seen say 10 or 20 years ago because there's a number of storage services out there that are not unlike this. Like Amazon S3 is very widely used and while it's missing some of the things we would need here it's definitely a public storage system. You can buy storage, you can let other people use your storage. It doesn't have all that access control we'd like but it's not too far from what's needed here. And indeed today's papers observes that they can layer their storage system on top of one of a number of different existing storage systems. Dropbox is also another kind of candidate for something it's like this. And therefore this is not as sort of pie in the sky as it might seem. Okay, so what would the point of this kind of architecture be? Why would anybody care? The people who might care are the users. This might give users more control of their data. It may make it easier for users to switch applications. Like if I've uploaded a bunch of photos and I'm using one photo organization app or photo editing app. Since my photos are totally separate from the app I could switch photo apps and maybe still use all my same old set of photos that I already have stored. It may be easier in this architecture to have applications that look at multiple kinds of data. You know, maybe it'd be nice to have my email system be able to look at my calendar and the other way around. Maybe it'd be nice to be able to write backup software that could back up all of my data no matter what it was periodically. Maybe I'd like to have a sort of general purpose file browser which would allow me to look at all of my data. And none of this is possible or convenient in the current architecture but it's all seems like within reach. Now that we've kind of concentrated all the user's data into storage that they own. And finally, there may well be advantages in terms of privacy and snooping instead of entrusting my data to a web service that who knows what it's doing with it. If we play our cards where we can use encryption these applications can encrypt the data before it leaves my client machine so that the only thing that's ever stored here is encrypted data. And you know, when I read it back, I read back encrypted data and then decrypt it locally on my own machine. So again, the storage service never sees private data in the clear. Anyway, so those are all the sort of tantalizing possibilities why you might like why users might like this architecture. All right, so you know, if you dig down to the nitty gritty of what these applications actually have to do, you would need to work out a whole lot of details. Like there needs to be, if my application is gonna be looking at your data there need to be sort of conventions for how data store here. For example, if I'm gonna look at your recent posts you've made for our social networking application you have to have stored them in your storage under a key or a name that my application knows to try to look for. And you have to use a format that we all understand. So you know, there's some, if we wanted you sharing there's some kind of standardization obstacles that have to be overcome that don't really exist for big websites because they can just store their data however they like. Okay, so there's a question, does this adversely affect application performance? Absolutely, this is likely to be pretty bad for performance because in the old scheme, the old scheme can be, the existing scheme can be implemented with very high performance. You know, most of the web server may be making hundreds of requests to the database. Like when you look at an Amazon webpage, for example, boy are there hundreds or thousands of pieces of information that had to be pulled out of Amazon's databases. You know, when they're all in the same machine room and those fetches in the database take dozens of microseconds. But if one of these applications needs to reach across the internet, you know, to maybe hundreds of miles away to some storage service, you know, now everything's gonna take 10 or 100 times as long to fetch individual pieces of data. So, you know, that's certainly an issue. And, you know, it's the kind of, that kind of issue is the kind of thing that clever designers can find ways to deal with. So it would certainly be a problem. But my guess is on the sort of total list of reasons why this architecture is not gonna work. There's a number of other sort of equally unhappy puzzles. Although it would absolutely change how people write applications because instead of writing applications that assemble lots of, they use lots and lots of pieces of data, you would have to be much more parsimonious. I think people could work around it. All right. Okay, so any questions about this overall arrangement, which is the sort of arrangement that Blockstack is shooting for? So we should just sort of try to guess, even at this level, what kind of things might go wrong. One reason is that this interface is likely to be less flexible than database interfaces. And this actually goes back to the performance a little bit. You know, we're probably not gonna be, well, I mean, this is sort of subject to design, but we're unlikely to be able to be supporting super flexible like SQL queries. And certainly it's unlikely that we're gonna be doing SQL queries across other people's data as well as ours for shared data. So that's certainly one potential problem is that this interface may not be very expressive and that's gonna be painful for programmers. Another question is, could this give users an amount of traffic they might not handle? Yeah, so that's also a potential problem is that if you don't have very powerful queries, much of what SQL is doing when you talk to a real SQL database is that it may be looking through, cause the database server to look through a lot of data, but it just finds the one answer you're looking for, maybe the sum of all votes or something. It just sends that one little piece of final data back. Whereas if you don't have a powerful query language, you may end up having to fetch a lot of stuff and sort of do the filtering or aggregating yourself. And that just might be a lot of data to be sending across people's links. Yeah, so things might be slower. Things would be slower. And if it's a question whether they'd be too slow, maybe in the future in which, you know, everybody has broadband internet and we have 5G cell phones, none of this performance stuff will matter or maybe it'll be important, I don't know. Another problem with this setup is that there are some websites like eBay where it's really not the case that all the data is sort of definitely owned by one user. So for eBay, for example, well, actually I have two points here. One is some data is not owned by all users. Think about the front page of Reddit, right? There's some clever algorithms that Reddit is running to pick the order of items in the front page. I mean, to do with the votes and who knows what? Like where do those algorithms run and where do they get the data and maybe where do they store their conclusions about the front page? So that's something that doesn't really fit in. You know, maybe it could be fit in here but that'd be a little bit hard. Another kind of website that seems like it could be hard here is eBay where you wanna bid against other people. You know, eBay tells you whether you have the current highest bid which requires eBay to look at other people's bids. And then when you finally win, you know, the amount you pay has to do with the second highest bid but those bids are private, right? You don't want other people to see your bids because then they can just bid one cent higher than you would win at a low cost. So, you know, maybe you too, user two has stored a bid here but if I'm bidding against user two and we need this application to tell me if I'm the winning bidder, that means this application needs, in order to answer that question, may need, probably need to know user two bid, user two's bid, which means that user two bid has to be accessible to me but if my application code knows it, well, it's running on my computer and I can change it, right? Those are the usual rules for code you run on your own computer and if I change my application code to actually reveal your bid, then that's totally cheating from the point of view of what eBay is trying to do. And so nobody would trust auction system that allowed that. So it's really unclear, you know, there's probably tricks that could be used but, you know, if we just use this architecture in a straightforward way, websites like eBay that need to look at other people's secret data but not reveal the data are quite a puzzle as just, and I already mentioned that websites that have to keep their own data like indexes or vote counts or something, that's often a puzzle, also a puzzle because there's no notion here of, you know the website itself, there's just application code and generic user-owned storage. Because usually this, I mean, so you would probably have to augment this with some trusted servers to run the privacy critical part of eBay or whatever but it doesn't really fit into the model that well. Another thing that's gonna turn out to be bad news here is that if I have data that I wanna share with some people and not others, like I wanna share data with just six, eight, two, four students but not outsiders, you know, how is that actually enforced? You know, we'd really like to use end to end encryption so we don't have to trust the storage server because after all that was a big motivation for moving away from the current website architectures. We don't wanna have to trust these clouds services. So I could encrypt the data so that six, eight, two, four students could read it but it's actually quite difficult to do that in any kind of straightforward way. You know, I could encrypt the data a hundred times, you know, once with each of the six, eight, two, four students keys or maybe I can encrypt the data once with some sort of unique key and then encrypt that key with eight to four students keys or something. But then you run into questions though if somebody drops the course and you don't want them to be able to see the data, you know, how do you make sure that now they can't see the data? So you can use encryption for privacy but once you get into sort of complex multi-user applications with groups of users, for example cryptography can be quite difficult to use to solve your privacy problems. Okay, so these are ways in which the system may be awkward to program and because it may be awkward to program but features that may leak through into the set of application features you can have being limited also which is not going to make users very happy either. All right, so that's sort of the high level view of what BlockStack is kind of working towards. So now let's maybe focus a little more on BlockStack specifically. Where BlockStack actually originated as a project was as a secure naming scheme and you can still see the paper we read today has a lot of preoccupation with naming. Although if you look at their current website and the current stuff they write it's much more about this decentralized architecture and applications and much less about naming. But naming is still very important for them. So the question is why are they interested in names and what do they need from a naming system? So the kind of names they're talking about in the paper and in BlockStack in general are user names. These are really human users. So we're talking about names like maybe Robert Morris. That's the kind of name they're talking about because in their decentralized architecture they don't, the kind of players in the game are the users, the users on the data that the users need to control who can see their data. So they need to be able to name other users. The specific things they need to solve with naming. They need to, if I wanna look at your data BlockStack needs to find where your data is. You're storing your data on some storage server somewhere I need to know what are using Amazon AWS or maybe Microsoft Azure, and if so which server that Microsoft is storing your data. So BlockStack needs a way to map names to the location where you store your data. So that's one big thing that they're doing with names. But they also need to find out, if I'm gonna read your data I need to be able to do things like check that it's really your data. I can't, the whole point of this is to not have to trust the storage services. So in order for me to be able to check that it's your data we need a way to map the name to the public key. And we're gonna assume that when you store data you sign it with your public key first. So we need both the map names to where to find the person's data and map names to the public key that we can use to check that when we retrieve data it's really data that you wrote and not some kind of misleading thing cooked up by the storage service or someone else. Now this name to public key thing actually is used in other ways too. If I wanna encrypt data so that only you can read it probably the way I'm gonna do that is to encrypt the data or some other key using your public key so that only your private key can read it. So if I wanna implement cryptographic ACLs or really almost any permission scheme access control scheme I need to be able to name the people who can use the data. And so if I'm gonna make access control lists these are usually one way or another driven by names and you'd be able to name the people who can read my data. So in particular this part that maps names of people to public keys this is usually often called the public key infrastructure or PKI. And so what block stack is proposing among other things is a general purpose sort of public global PKI public key infrastructure to map usernames to users public keys. And this is actually quite important because people have known for a long time decades that in order to sort of make big advances in internet security almost certainly the only way to do that is to have some sort of public key scheme so that people can sign data that they produce email and check signatures on email or data that they receive for other people and also encrypt so that to ensure privacy that only intended reader can read the data. So almost any internet wide scheme or large scheme intended to get cryptographic privacy or cryptographic authentication ends up having to involve some sort of public key system public key infrastructure so that I can find out given the identity of the person I wanna talk to how do I find their public key? And yet there kind of isn't a successful public key infrastructure system out there. Nobody's really figured out how to build one of these that's actually useful. And as a result, people have tended not to build or deploy people tended not to deploy systems with cryptographic privacy and authenticity because there's no PKI and maybe because of that people haven't worked on PKI because it's not clear who would use them. But at any rate, one of the reasons why BlockStack is interesting is because they're trying hard to build a global scale public key infrastructure. Now the kind of names, if you remember the paper talks about the Zuko's triangle thing, the kinds of names, the style of names that the paper's talking about is if three of these three interesting properties one is they're unique and what that really means actually is that the names have global meaning that the name Robert for example has the same meaning to everyone in the world, maps to this in the same way to the same data location, the same public key to everyone in the world. Of course that's a little ridiculous for Robert. Presumably my ID under BlockStack would be much longer than that. Maybe Robert Morris, there's a lot of Robert Morris is maybe Robert Morris number 67th Robert Morris to register with BlockStack that would probably be closer to what my name would be under BlockStack. Anyway, that everybody in the world when they see this name and they run it through the PKI gets the same information about it. So this really means global might be a better word for this. The second property the paper talks about for names are they're human readable? Just like Robert Morris, if somebody can look at it and make a guess what a name means and some people may be able to remember the names because they sort of have human meaningfulness. And the final thing they're interested in is that the naming system and the allocation of names be decentralized. And the paper claims on this is the old claim that it's difficult to get all three. Apparently not impossible since the paper does it. The sort of intuitive reason why it's hard to get all three is that if you have a, supposing you have assistance decentralized so there's no one entity in charge of allocating names. Well, if you do that then it's very hard to ensure uniqueness. That is if you don't have some single entity handing out the names, how do you know you don't end up handing out the name same name to multiple people if there's not some central trusted entity. Then you can actually have decentralized and unique names but the most obvious ways to do that sacrifice the human readable part. So if what you decide your names are going to be public keys, a thousand bit public keys and a public private cryptography system, anybody can make up a new public private key pair. They're typically made use random number generators. So since anyone can make one up and they're generated randomly they're going to be unique but they're not human readable. So many of the obvious ways trying to get all three these at the same time don't work so well. The way block stack solves these is at a very high level. I mean, they're going to produce their decentralized system with no central person handing out names. The names are human readable and everyone sees the same set of mappings. The way they do this at a high level is that they rely on Bitcoin's ability to produce a single ordered log of transactions. That's one way of viewing Bitcoin is that everybody agrees on what the sequence of Bitcoin blocks is and maybe you get temporary forks but Bitcoin rapidly resolves any forks and causes everyone to agree on what the sequence of blocks is in Bitcoin. Okay, so once we have Bitcoin that's causing agreement on a sequence of transactions we can stick anyone can stick transactions into the Bitcoin log that, as well as maybe being valid Bitcoin transactions also have hidden away in them name reservation records. So now this is sort of naming on Bitcoin on the Bitcoin blockchain. So Bitcoin was already getting us sort of unique and globally agreed sequence of these transaction blocks. And now, and anybody can submit a transaction. So in that sense it's totally decentralized, right? So the way block stack uses this for naming is that if I wanna register a name I can pick any name I like, say Robert Morris as long as it's not already in use. And I submit to Bitcoin a transaction that happens to be a valid Bitcoin transaction but it's also gonna be meaningful to block stack. And it's gonna say, please reserve, please allocate the name RTM and map it to whatever my public key and my information by where to have data. No one can submit these and the block stack servers all the block stack servers watch the Bitcoin blockchain as it involves. And every time they see one of these records that's a block stack transaction as well as a Bitcoin transaction the block stack servers think about adding this mapping to their name database but they have a set of rules for rejecting bad block stack transactions in the Bitcoin blockchain. So for example, if some bad person after I've allocated RTM themselves try to allocate RTM well they can submit any transaction they like so they can perfectly well also submit a transaction trying to steal the name RTM for me and mapping it to some other public key that they know the private key for. Well, all the block stack servers are watching the Bitcoin chain there's only one Bitcoin chain and only has a one set of contents. And so the block stack servers as they sort of look at successive transactions in the Bitcoin chain are gonna see my allocation first and then they're gonna see this other person's allocation for the same name and the rule is gonna be well if a name's already allocated you can't be allocated a second time and the block stack servers will ignore this attempted registration of a name. So what's being implemented here is a kind of first come first serve scheme for allocating names. The first person to get an allocation record into the blockchain wins that name. Okay, so as far as those three properties the Zucco triangle properties it's decentralized because we believe Bitcoin is decentralized and there's no sort of other entity deciding who gets what's name. It's really just this first come first served scheme. So it was decentralized. The names can be anything there's no any strings, whatever so they're perfectly reasonable to put human readable names in here and everybody's looking at the same blockchain for any name they all see they all agree on what the first registration of that name is. So it's unique or globally meaningful as well. So block stack has managed to actually get all three of these Zucco properties in their naming system. There's a question. Does this mean that the block stack servers have to scan the entire chain from back to front for adding new names? Yeah, so in principle and sure the state of the name database is really the result of interpreting the whole blockchain but of course the block stack servers will cash the latest state they've seen and as they're in a database so each block stack server maybe he's read this far in Bitcoin blockchain and has a database that has the current mapping for every name seen in all the blocks before this and when they see a new block from Bitcoin they'll just look at the transactions and update their database incrementally to reflect these transactions. So getting a new block stack server up to speed actually does take quite a long time. I think when some paper I read said it might take up to a couple of days about once your block stack server is up to speed then it's all sort of incremental additions after that but you know, there's maybe a larger point is that it is indeed the case that block stack is kind of piggybacking on Bitcoin and you know, you could easily argue that Bitcoin is not very scalable ultimately or uses up too much electric power who knows what too slow takes a long time to reflect new transactions and so these are all sort of somewhat undesirable properties that block stack is inheriting from Bitcoin but nevertheless, you know, it's not there's not aware of another way of getting all three of those Zuko's properties in a naming system. So if you value them, your options are not you don't know a lot of options other than this approach. Okay, so we may ask ourselves whether this naming scheme this way of mapping names to public keys and places to find the data whether it really has properties that we actually like. So let's go back over those three properties. So one, the names are unique. Everybody in this system agrees on what RTM means. It's really that the names of global meaning. So the question is whether we care about this whether this is a good property. So one thing on the plus side for this is that it makes that having these names like this human, if having globally relevant names means that we can talk about names with each other I can email you a name and that name will have the same meaning for you as it does for me because we're both gonna look it up in the blockchain and get the same result. And so that's nice. It also means that I can look at names that are recorded somewhere like in an access control list and kind of understand know what they're going to mean. But some things that are maybe not so great about this is that if you have to choose your names from a single global pool because that's what we're doing here, right? The since there's just one naming system there's just one set of names. It's gonna mean that it'll be actually hard to look at a name and decide if it's the name you want. Like my name would actually probably be as I mentioned before, you know, maybe, you know RTM 95587 depending on how many RTMs are up. So this may be my name. It's actually very hard to look at that and decide is that the RTM that you really meant. So that really undermines the human readable property that they have here. The bigger the system is the kind of less valuable having human readable names is just people at MIT and maybe there's only one Robert Morris at MIT although actually there's more than one. But across the world the kind of justification for caring about whether names are human readable it's very slim. It's also a human readable can be deceptive depending on what's going on. So if you see a name that looks like, you know RTM at MIT.edu, if you see that name and block stack or something it's tempting to imagine that it might be connected to that email address, right? Because it looks it's human readable. It like looks like it has meaning. And that's the whole point of having human readable names is that they kind of suggest meaning to people. At least for block stack that's deeply misleading. Now block stack the names really don't mean anything. It's simply first come first serve. So all we know, all we can tell by seeing this name RTM at MIT.edu from block stack is that this is this name means the first person this name refers to the first person who registered this name. That's all we know initially. It might be me, it might be somebody else. There's no reason to believe it's me or that it's associated with MIT or anything else. All we know is that this name is owned by whoever registered it first. Now, if I establish say a secure email conversation with whoever owns this name, you know using the key that block stack maps us to and I spent some time talking to them, you know maybe I can eventually convince myself that they're the person who I think they are. But the name alone is looks like it's meaningful but probably is not in fact very meaningful. So that's a real defect in human readable names that could be defective. And related to that, the block stack naming scheme doesn't help me find if I sort of know who I wanna talk to block stacks not really helping me find the name of the person I wanna talk to. You know, maybe you know you wanna send email to Robert Morris. You know, gosh, this is deeply unhelpful and it's the only thing in the block stack naming system is names that are like this. So it's really not necessarily solving the problem that people have, which is I sort of know in my head who I wanna talk to but I don't know their public key and I don't know their block stack name either. How do I find their block stack name? So that's sort of a defect in this system. You really have to already know the name if you wanna use block stacks naming scheme but how do you find those names? Some other options that you could consider for naming in a sort of larger decentralized system. One is that we could just abandon names, human readable names, you know not try to get all three of those Zucco properties and just use public keys directly. So that would mean if I wanna interact with you I need to find your public key somehow maybe you just send it to me maybe you tell me something over the phone I can use to get your public key maybe you send me a secure message or write on a slip of paper or something. So we could just use directly use public keys and then we wouldn't have to solve all these problems although of course they're awkward. Although maybe I can store the public keys I know about in my personal contact list and that'll be helpful. It'll be like telephone numbers that telephone numbers don't mean anything but once I know your phone number I can stick it in my contact list. Another possible approach would be to abandon the decentralized part and just try to cook up some central entity that would actually reliably verify identity that some centralized entity maybe the social security system that hands out social security numbers or whoever it is that hands out driver's license or something and kind of piggyback on their work to establish a centralized notion of unique kind of verified names. That's actually remarkably difficult also. But it's another avenue to think about. Anyway, so Blockstack took this particular approach to trying to get names. All right, let me just kind of outline the big picture of the pieces in Blockstack which is sort of a refinement of the decentralized application diagram that I showed before sort of at the bottom they have this Bitcoin system that's chugging along with Bitcoin blocks and the carry along kind of unknown to Bitcoin these Blockstack transactions. There's a bunch of Blockstack naming system servers and it's not really clear whether they intend ordinary people to run them or that there would be a service in a way it makes the most sense for ordinary people to run them on their own laptops because you have to trust them but that may not be so great. Anyway, these Blockstack naming servers read the blockchain and kind of accumulate a database. These, at least in the first instance what's in the Bitcoin blockchain is public keys and the hash, the cryptographic hash of information describing where each user stores their data because you could store that information. This is just RTM stores is data in Amazon AWS or something but that's too big to sort of conveniently store in Bitcoin and so there's this kind of intermediate layer called Atlas whose really its only job is to map hashes of information that are stashed in Bitcoin into the zone information one zone record per user and so that means that my, if I have an RTM registration and Blockstack that holds the hash, my zone record this just has the name or the internet address or something of where I store my data so it might be AWS slash whatever identifier I used to uniquely identify the stuff that I store in AWS and this is really a reference to where my storage sits where all my key value pairs are stored. Now, the paper, I think the paper's vision is that you'd be able to have your zone record point to any cloud storage out there. In fact, the cloud storage system has to obey the Blockstack interface and so you can't just use any existing cloud storage system. So in practice, these all point to block at the moment at least Blockstack's own Gaia servers they run this and these are just stored servers that know about different Blockstack users and store their key value pairs for them. And that means if I want to read your data out of if I'm running a Blockstack application that wants to read your data I need to apply your name somehow. I gotta find your name, maybe you tell me your name over the phone. I type your name into the application I'm using maybe it's a to-do list manager and it needs to go out and find your to-do list items to show me. My app is gonna contact a Blockstack naming system server and ask it to translate your name. It's been watching the blockchain, it keeps a mapping. It knows how to use the hash to find your zone record. Your zone record points to some data owned by you in Gaia and then my app fetches this data. It needs to verify the data. So all Blockstack apps expect data to be signed by the owner in Gaia and the public key to check your signature on the data. I can use your public key, your public key is embedded in your record and Bitcoin is why I can use it to, my app really can use it to check the signature here and make sure that this is data that you actually produced and not something that an untrustworthy Gaia server cooked up. Okay, so that's sort of a basic outline of how this works. It turns out that sort of embedding their information on the Bitcoin blockchain is not as straightforward as I described, they have, they need to take special efforts to detect forks for example, because they don't get to the Blockstack name servers don't sort of get to realize directly that there's been a fork, they have to detect it. I mean, they need to detect it because the fork might be part of an attack. And the Bitcoin isn't filtering out bad records for them. And so they have to do their own and enforce their own rules on the records they get out like ignoring duplicate registrations. There's, they also, it turns out need to charge, they charge fees for registering a name. And that means that the Bitcoin transaction that is the registration of a name has to pay some money to what's called a burn address, pay some Bitcoin current Bitcoins to a burn address in order to have the right to register that name. And the Blockstack name service actually checked that each name registration transaction did pay enough Bitcoin to this burn address for which there's no private key so that money simply disappears. And the reason they do this, the reason why they require every name registration to waste some money is that otherwise it's too easy for bad people to just register lots and lots of names. Like certainly the experience with the domain name system where for a while name registration was free, was that people would just register, every single one, two, and three letter combination that's possible and they wouldn't own all these names for free or somebody knowing that I would really like to own the name RTM might register it before me and then if I wanted to use it I had to pay them. So in order to try to deter that, they require fees. And that's actually almost probably an important part of the design, since free stuff on the internet tends to be abused or sort of drowned out in intentional spam. All right, one detail in this picture, so I left out the, in this picture we have the client machine that's running some app since the client device. Now, when the app needs to get out my data, it needs to be able to decrypt it. Whenever it writes data into my guy's storage, my app needs to be able to encrypt it, ultimately using my private key. And when it fetches data back, it needs to be able to decrypt it also ultimately one way or another using my private key. So these applications need to get at private keys, but private keys are super duper sensitive. And whereas these apps are just whatever junk I downloaded from the blog, stack app store and possibly totally untrustworthy. So we never want to give them a private key. So what actually happens is that there's a separate program that I'm always running called the block stack browser. And it's this program that knows my private key. And so if the app wants to do stuff as me, it's really got to first do it through the block stack browser. And in fact, the way this plays out sort of complicated and detailed the block stack browser essentially makes up a kind of per app private key. And this app uses just the per app private key and not my real sort of master private key. So this app again, doesn't get to know my real private key. But this issue of not revealing sensitive key material to the, these apps, which may be indeed quite untrustworthy is an important detail and block stack keeps my master private key secret. Now on the topic of private keys, a weakness in essentially every system, you know, like Bitcoin itself and block stack. And also is that users tend not to be as careful as they ought to be about private keys. So am I, you know, if I'm gonna use block stack from my phone, you know, that means my phone has to know my private key. If I leave my phone in the cafeteria, then whoever finds it now has a device that has my private key in it and can do anything as me because as far as block stack is concerned, they are me, if they know my private key. Users also tend to lose private keys. You know, I don't use the service for a little while. You know, I don't use the private key for a little while. You know, I forget whatever passphrase it was, for example, that was protecting the private key or I put my private key on a USB somewhere key for somewhere for safekeeping and then lose the USB key. So that's completely routine problem that users have. And block stack actually does not really have an answer to these questions. They pretty much assume that users will be careful of their private keys. And if you lose your private key, block stack can't get it back for you. It's like it's in order to be super secure in order for you not to have to trust block stack, only your client knows your private key. If you lose your client, we forget your, whatever the passphrase is, you're just completely out of luck and block stack can't help you. And so this is just a difficulty in real life. People don't want to use systems that are that brittle. And then real life, what ends up happening is that even systems that have serious cryptography usually have some sort of key retrieval scheme whereby there's something I can tell block stack, maybe mine, mother's maiden name or they send me an SMS thing to my telephone or whatever, some scheme I can use to recover my private key. And those, if you want to attack a system, it's often the password recovery or the key recovery aspect of the system that's the easiest to attack. I just call it block stack. I said, I said, tell them, you know, I'm really, I'm Robert Morris. You got to believe me, please, you know, reset Robert Morris's key for me or password, and tell me the new password. And if I'm convincing enough, you know, and the system allows key resets, they're going to let me have it. And thereby also presumably if it's an attacker, really who is calling and pretending to be me, they'll let the attacker reset the password or the key or whatever. Block stack happens not to allow that because it's so obviously insecure, but real world systems, if they don't want their users to abandon them, need to have a better plan. And it's not clear how to make that better plan. All right. All right. There's a couple of sort of issues I want to talk about that come up in the system. For me, the block stack is really a kind of source of questions to think about, or even kind of things that are not really, you know, suggestions for more things to work on. You know, block stack, I think block stack situation now is that you probably wouldn't actually want to use it to build a real system for real users. But it's kind of trying to point the way to a system that might someday be if enough cleverness was put into it and enough development was done on it, might actually be a system that was both convenient for programmers and actually provided some real value for users. But it's probably not there yet, but it's interesting to think about, you know, how it could be designed differently or better in order to kind of get it closer to something that would really be useful. So one question you might have, especially in the context of 824, the block stack really needs to use Bitcoin. Like Bitcoin's really not, you know, not that great. The fees that you have to pay, you know, to register a name, you know, a varying value on Bitcoin by factors of, you know, 100 almost every night overnight at times. In addition, people really don't like the way Bitcoin uses proof of work to burn up CPU in order to be secure. So, you know, Bitcoin is not perfect. Although it's kind of an important part of the system. Otherwise, they couldn't, you know, it's not clear how they would do names without this whole Bitcoin tie-in. So one question you might have 824 is whether certificate transparency, which is a, you know, we looked at it last week, certificate transparency does not have mining and does not have proof of work. And yet, you know, it's powerful enough to be helpful in a naming system. And so question is whether instead of Bitcoin, whether Blockstack could use something like certificate transparency, not in order to enforce adequate rules about names. And I actually don't know the answer to that. My guess is the answer is no. My feeling is that while certificate transparency can reveal conflicts, well, conflicts really is a two people registering the same name. Like if you required everybody to submit their name registrations to a certificate transparency log, yes, indeed, you would be able to see that two people had registered the same name. But certificate transparency doesn't resolve ownership conflicts. So if I register RT, you know, supposedly last year I registered RTM and I've been using it happily for the next year. And then somebody else registers RTM today. Yeah, you know, they'll submit their registration to a certificate transparency log. And so now maybe that will make my name unusable or something. But it's not clear really who should own the name because certificate transparency doesn't have very powerful mechanisms for resolving these conflicts. You might think that order would be enough, but the same records in different certificate transparency logs can have different order because there's nothing forcing the different transparency logs to have exactly the same order. And if you want, you know, how come Bitcoin can enforce every replica of the blockchain to have the same order? I believe the answer to that really boils down to Bitcoin's mining. I mean, it's Bitcoin's mining that resolves forks. It resolves differing copies of the blockchain and forces agreement. And if you don't do mining at least, you know, or something like mining, it's not that clear how to enforce agreement on the order of the records. So in addition, the fees the block stack charges are probably critical to avoid various kinds of spam in the naming system, various kinds of abuse. And you know, block stack built on Bitcoin can sort of automatically require people to pay to register block stack built on certificate transparency. You know, there's no direct mechanism to acquire fees. In fact, I think the point here is actually quite a bit larger, and that's that a lot of people talk about using block chains for lots of stuff other than cryptocurrency, but in fact, it seems difficult to use block chains, open block chains with unrestricted access, except when they're coupled with some kind of cryptocurrency. Again, I don't know if that's true, but it's certainly my impression. All right, so a big question with block stack is whether it's going to be convenient for programmers. And to me, this question is absolutely critical because there was one of two very critical questions. The other one is the other critical question is whether it makes users' lives better. My perception at the moment is that indeed, block stack is not particularly convenient for programmers. I think I've used block stack, a program block stack. I've tried to build systems that are like it. And my strong impression is that it's just a lot more difficult to build a web application on one of these decentralized platforms than it is on the ordinary platform. And that's kind of damaging because if the website developers aren't on board, then nobody's going to get a lot of traction. And if the website developers don't like, or sort of feel that the system is difficult to program, the only way that you're ever going to get any traction is if the attraction to users is so strong that users demand decentralized applications, and that might force programmers to use it. But the programmers just speaking for themselves, my guess is that the architecture in which basically all the code is sitting in the client and we don't have special website servers is just pretty painful. It's hard to have data that's specific to the application because all data is owned by users. It's hard to have indices or counts of likes or vote counts, the kind of front page rankings, as I mentioned for Reddit or Hacker News are difficult. There's just all kinds of stuff that's a pain if you don't have a notion of the website itself with its own data. The access control is actually equally painful. It's very easy to write the code in a traditional website to decide who gets to see what data. In a decentralized system, it really only can be enforced using cryptographic access control, or at least that's the way it seems in the example of BlockStack. It just turns out for all except for very straightforward like one user using their own private data, using cryptography to enforce access control is just pretty painful. So, programmers might only be excited if users were excited. Our user is going to be excited. One way to look at that, one way to ask that question, is whether this kind of decentralized user-owned storage is good for user privacy, because that's one of the big pitches is that by storing data on stored services that users own and pay for, maybe that'll keep the data more private, more secure than storing their data on websites. So that really amounts to asking, is it better than trusting Facebook or Google to keep my data private, both from Facebook employees and from other users of the site and from hackers who might try to break in? And that's just a question, right? It sort of depends on how much you trust Facebook. The fact is that you're still storing your data out there in the cloud on some service, just maybe not Facebook. And you're still running software on your client that is presumably provided to you by Facebook. You're running Facebook software on your client. So you're still kind of trusting this code that Facebook gives you or wherever you get your code from. And for the real hackers among us, you can look at the code and convince yourself, because you're running it on your own computer and convince yourself that it's okay. But for the general public, the difference between talking to Facebook's web software and their web server and running Facebook's software on your own client may not seem very great. And who knows, maybe the Facebook app you're running is sort of sending Facebook information about what you're up to snooping on you. And there's a question about why is cryptographic access control painful for programmers? One way of looking at it is that the access control checks that you have to do in a sort of standard website are very straightforward. You just write a little bit of Python code or whatever it is to decide if some user should be able to see some data. And you can even compute using data that the user shouldn't see as long as you don't reveal it to the user. Whereas in anything but simple situations, doing the cryptography to allow some users but not others to get at your data just requires a lot more thought. So suppose the MIT registrar maintains a list of all the people taking 824. So that's a group. So they maintain that list. And I want to use it in order to govern the protections for some file stored with cryptographic protection and block stack. Because that list, that group list of 824 students may change. What I do for encryption may have to change too. So if I could encrypt the data once with the key of each user in the 824 group list, and that would work because they could just read the copy that was encrypted for them. But then as users are added, the registrar changes the list, adds or deletes users. My software, I need to notice that that list has changed and busily go out and changed the way stuff is encrypted, re-encrypt for the new users, delete the encrypted copies for the old users. And that's just like a level of damage or a level of complexity that doesn't exist in current systems. Not necessarily that it can't be done. But it just does require a lot of machinery that is not ordinarily needed. Another sort of trust issue from the point of view of users is that they still have to trust their storage provider to preserve their data. And they still have to trust their storage provider to always serve up the most recent copy. A cheating storage provider might try to cause trouble by serving up an old version. So at least in the block stack design, you're really trusting your storage server. This central service is from your point of view to do the right thing with your data, to preserve it, to back it up, to produce it when asked for, to produce the right version when asked for. And it is a bit of a question for just ordinary people that if you're trusting Amazon AWS to store your data correctly and not lose it, it's not that much bigger of a step to trust Amazon itself to run the website. And we can argue about whether that's really exactly true, but I think from a high level point of view for most people, most ordinary people, it's really a pretty small distinction. And you would have to overcome that in order to persuade people that boy, the block stack approach of using Amazon as a storage service is better than the standard way of using Amazon as a website. Another question from a user's point of view, another pitch for why the decentralized architecture might be better for users is that it gives them more control over, you know, not privacy, but just sort of what applications they use with their data. So if you want to switch applications but still use the same data, like change photo editing apps, like I mentioned, in principle, that should be easier with this sort of decentralized architecture, because the, you know, again, the data is not owned by the application website. If you want to use the same data and multiple different applications, like I want to run a calendar app but use the same data from my email app, that also is, you know, relatively convenient with the decentralized scheme because the data is sort of, again, independent from the applications. You know, maybe users want this, maybe they don't, probably not at the top of anybody's list. And there's an additional problem that in order even for that vision to work, there has to be a lot of standardization of formats of files. So, you know, the calendar file, my calendar program has to store its calendar data in a format that my email program can understand otherwise that doesn't work. And if I'm going to switch email applications, well, my old email application better have been storing my email in a format that my new email application can understand otherwise, this vision of decentralized apps being easy to switch among can't be made to come true. And a final issue that sort of worries me about this whole thing is that it's not clear that users are going to be willing to pay for their own storage. If people aren't willing to pay for their own storage, then this whole arrangement is pretty unattractive because a lot of the point was to sort of give users more responsibility over their own, storing their own stuff. Users are, I think, are so used to free advertisement supported services that they just might not be willing to get on board with paying for internet stuff. Alright, nevertheless, I feel like this whole area is well worth keeping an eye on, maybe even worth sort of working on different pieces of it if you're interested in looking for research problems. And I don't really believe it right now for the reasons that I outlined. I think it's absolutely worth pursuing because someday it's like definitely the way these kinds of decentralized systems work has been getting better and may eventually be good enough that there's serious competition for existing website architectures. And I would just love it if serious competition like that were to arise. Alright. That's all I have to say. Next Tuesday, the last class meeting is going to be project presentation so we'll get to hear what everybody who hasn't been doing lab for has been up to. Please ask me questions if you have them.