 I'm James, or sub-stack, from the internet. If you want to follow along online, there's a link to these slides. You can poke around and just be on your laptop the whole time. I don't care. Anyways, I don't really work anywhere. I just do some freelance stuff sometimes, but I'm not very good at it. I mostly just tinker around on the internet, making open source modules, and I want to talk about permanence and sort of rethink, perhaps, how we build web apps here in the present so that maybe our future, aka FutureJS, is a little bit nicer. One thing that really sucks is we have all of these web services, but they don't tend to last very long. I think I listened to a talk by Brewster Kale of archive.org, who had some stats that the average web page lasts 100 days, which is pretty terrible, I think. But for some of the more popular stuff, web services are always getting acquired, they're always pivoting into some other stupid thing, they're getting shut down, they're turning evil left and right. It's pretty terrible how the business models of the web sort of run themselves into the ground, and then us developers with all of these third-party endpoints have to put up with that, which is terrible. So also, people really haven't adopted offline techniques very much because it's hard. What makes it hard is one of the two or three hard problems of computer science. One of them is cache invalidation. Another one is naming things, and the third one of the two is off by one errors. Good. But basically, the network, as you will doubtless be aware here, is extremely unreliable. And we really shouldn't pretend that things that we make are going to be able to use the network in any meaningful capacity if people are actually out in the real world and not in a suburb of Minnow Park in California building a web app. So there's also this really huge problem in terms of making things that are going to last on the web, where there's a power asymmetry where the clients are bundled with the services that they depend on, and identity is kind of a commodity, which is really unfortunate. So how can we get away from some of these things? Well, if someone owns a service, then they can just take it down whenever they want. You depend on them to provide the service, and if they don't want to provide that service anymore, there's really nothing you can do about it. So how can we build services that nobody owns? I think is a really interesting question, and we can use a lot of techniques from distributed computing to actually answer this question effectively. So here's some pieces that I think a lot of modern web apps use that we can replace with distributed alternatives. And a lot of these techniques are just becoming practical on the web with tools like WebRTC and different torrent modules and stuff like that. So we've got to have identity, we've got to have a system for doing trust that's not baked into the SSL, cartel, domain name, extractive, rent-seeking enterprise that it is. And we've also got to have some way of doing feeds, like sort of a Twitter-style data model for pointing at big blob storage. So anyways, for identity, how can we do this? Well, if you've ever used an SSH connection before, it's quite simple. You run an SSH keygen command that you copy-pasted from Stack Overflow, and then you just copy your idrsa.pub onto the server you want to connect to. So at no point did you have to talk to a third-party service to register a domain or to register a username. Your username doesn't have to be unique. It already is unique because it's based on entropy. So a really common technique that you see a lot of the time for building distributed identity is either using the public key directly as the identity or using the hash of the public key, depending on what kind of keys you have. So another interesting technique I've found for doing this stuff is if instead of having a unique name registry, you just operate like the real world, where your name is just what your friends call you. And if your friends don't call you that name, maybe they're not your friends anymore. Or like, you can have two Jameses in the same room, and we don't explode. You can even have two John Smiths, and that's okay. We figure out ways for resolving those conflicts informally ourselves. Humans aren't just a hash that has collisions. Fortunately enough. So how can we do this kind of stuff well? So a really good module for doing this in Node is called Sodium. So this is based on the NACL library. So if you require Sodium and then do .api, I'm wrapping it so I don't have to do that. So just do sodium.cryptosignkeypair, and that will give you a public key pair. So if I run that program in Node, oops, sorry, I think it was API. Ah. Oh, goodness. I've already broken this terribly. Anyways, if we have a program like this, we can create a key pair and load that into something like keys.json, and then we have this way of having identity on a service. So this particular example also runs in the browser with a module called sodium-browserfi slash browser for some reason. I'm going to bug Dominic Tarr to fix that. But anyways, if we listen on a port now, so if I go to localhost 5001, I've already prepared a little bit of an example for doing distributed peer-to-peer identity in the browser. So here I have Sodium, and this one actually works, so .cryptosignkeypair, and here we have an object that has a public key that's 32 bytes and a secret key that's 64 bytes. So if we take the public key, which is 32 bytes, so that's the same size as a lot of hash functions anyways, and we can convert that to hex. And there we go. Here is my username. Not exactly very rememberable, but there are techniques for building systems on top of this primitive as the username. So that's sort of an introduction to how we can start tackling the idea of identity, but we're going to need a lot of other data structures if we want to build a new distributed web that really challenges how the web currently works and makes it more permanent, more accessible, these kinds of things. So another really useful technique for building web apps of the future is this technique called Merkle Dag. So I bet almost everyone here has already been using Merkle Dag's probably every day because they're the data model that Git uses. So Merkle Dag's are a sort of content-addressed data store that has some really nice properties and we'll also be looking at some kinds of content-addressable blob storage. So content-addressable data is really fun. Basically you know how one of those hard problems of computer science, one of them is naming things. So with content-address storage, the name is just the hash of the content. So the name for this piece of data, I didn't have to come up with anything. It's this hash. And content-address data is really nice because you don't have to... When you get that data from someone and you already knew it's hash, you can instantly verify it. So Merkle Dag's have this really nice property where they're self-verifying in a sense. So the other thing is... Here's an example of what a Merkle Dag looks like. So the really nice property of Merkle Dag is if you include the hash of a previous document in the current document, you sort of create this chain of custody for your application where you keep all of history around. So here's just an example where we have a previous header that says null for the first one and then it points at the first message in the second and the third message points at the second. So it goes one after the other and you can create this linked list that represents history. The nice thing about the structure is it's very trivial to do replication because all that you need to do is you have two logs and you can catenate them together and maybe perform some other operations, but it's relatively trivial. And if we can keep all of history around and if we can't rewrite history, which is sort of this Orwellian idea that we should maybe step away from, then we can build really robust web apps that will last as long as video games from the 80s, maybe, if we're lucky. So there's a great module on NPM called Hyperlog that lets you do this kind of stuff. So if we start from zero, as I often like to do, we can make a so-called log.js, load up Hyperlog, and we can instantiate a new log. We'll need a levelDB handle to do this, so we can require a level and make a DB. LevelDB is really nice. It's a really simple key value store database, and it's a lot like SQLite where you just need it to give it a directory and it exists in memory in your process. So if we instantiate a Hyperlog with a DB, then we can do some operations like log.append, a message, perhaps, so if we read from standard in, so here I'll require concat stream, so process.standardin.pipe.concat. That'll give us the body. And now we can call log.append, or there's also log.add if you know the hash of the previous content, but log.append, you don't need to know that necessarily. We'll give that the body, and that's all we need to do. So now if we echo beep boop, run that into node log.js, and if we fix our program so that it's not completely broken with an extra parent, there we go. So great, it prints nothing. That means it's succeeded. If we want to read out this data, I'll maybe add another entry. There we go. If we want to read out this data, we can change the program so that instead of piping from standard in, we can do log.createreadstream. And I'll just print out each record like that. So here we go. Here is our data. And note, just like in the Merkle-Dag example, or just like in git, the hash of the current document is like 523, and then in the next document, we point at the hash of the previous document, and so on and so on. And the really nice thing is if we want to do replication, it's pretty trivial, so I can take this log program, and make a replicate.js out of that. So now we can just call log.replicate. And one really nice trick I like to do for symmetric protocols is you can just use standard in, and we'll pipe that to standard out, like so. And so now we need a second program, so this one is going to replicate from .slashdata. But if we make that take process.rgv2, then we can replicate to another place. So if I run node.replicate on data, now I need to pipe another program standard in into the standard out of this program, so I need to do this sort of mix-y-match operation where I take standard in, pipe that to standard out, whatever. So there's this nice little 20-line program I wrote to do that called dupesh. So we can run this program again, replicate.js, and I'll say .slashclone. So it might have actually worked, but sometimes it doesn't always hang up correctly. So we do see that now that there's a directory called clone, which is good. If I take that log file, make a clone.js out of that. Whoa. So now instead of data, we'll print out the clone, and hopefully it'll give us something. Yeah, great. So it succeeded. Let's get that data set. What's even crazier is all of what I just showed you works in the browser. So to make that work in the browser, all that we need to do is take our replicate.js and instead of speaking standard in and standard out, you can just replace level with require level browserify and replace standard in and standard out with a WebSocket. So let's actually do that. Okay. So first we're going to need, but we will need some browser code. So we use level browserify, instantiate our database with whatever it doesn't matter in the browser what you use. And so now instead of standard in, we can use a WebSocket. So I like to use this module called WebSocketStream. So require WebSocketStream and now wsock hit ws colon slash slash local host. We'll use 5002. And now we'll do log.replicate ws.pipe log.replicate pipe.ws. This is an example of a duplex stream. You can read all about this in the stream handbook if you want. Or you can take the stream adventure node challenge. But anyways, this is a common pattern if you're doing stuff in node. So now if I run a static server, so I'll just make an index.html file really fast, bundle.js and put a body around it. Okay, so now if I browserify I think we I was using replicate.js, yep. So browserify replicate.js dasho bundle.js and give that a second. And now if I run a static on port 5000, that's just a static web server. So if this works our browser code now should open a web socket connection. So it should expect to talk to localhost 5002 but I realized that actually I need to use a different thing. So now if I run that duplex.h command from before, where I do node replicate.js dot slash data. The other endpoint I can use is a web socket connection. So I've got this handy little script like netcat, which talks standard in and standard out, but it uses a web socket connection. So if I listen on port 5002, if everything goes well, then if I go to localhost 5003, that's just the favacan thing. So, failed. Specified error event. Success. Possibly. Because it printed nothing, which is good. So if we want to actually get out that data that we put in to our connection, we'll have to change our program a little bit. So if now I put back in log dot create read stream dot on data, and then in the browser you actually have to bind a console, which is silly, but there we go. So if I browserify that again and give it a sec, then maybe it'll work, but it's okay if it doesn't. This is just to give you an example of how few moving parts you need to implement something like peer-to-peer data replication in the browser. So that's still running. And not getting data, but that's okay. So anyways, the next part of how we can build web apps that will outlive us, hopefully, is that we're going to need to figure out a way to do trusted updates. So when you get a web page, you normally fetch it from a server. And that server lives at a DNS record, so you have to pay some amount of money per month to keep that server running. You have to pay for hosting. You have to pay for all of these things. What if you just decide to stop paying, then all of your users suddenly can't use your software anymore. So how can we implement a system for giving people trustworthy updates so that they can fetch data from other peers in the network? So all of these things are not very future-friendly, I like to think. Like app stores and SSL certs, whatever, because they have expiry built into them and they cost money. So I've got this little experiment called Trustlog that uses signed messages on top of Hyperlog to give you things like trusted method and revoke. So you can do things like you might be used to with the app feed, where you have a GPG key and then that's trusted, but you can revoke it and you can add more keys around a single identity. So it gives you something more like a multi-device thing. So here's where this is building towards. I've got this little demo for this project called App Feed that I've been working on. So with App Feed, I think there's some versions already in there. So with App Feed, so there's a publish command and you can give it a version and then you pipe some HTML into the command. So if we have a web page, whoa. I'll make that web page on the spot. There's our web page, whatever. If we pipe that into App Feed publish version 1.1.0, then I get a hash. This is the hash of the content. So if you ran a shot algorithm on that HTML page, you would get the same content. And now I can list out the versions that I have in my local feed. So I had version 1, now I have version 2. If I want to replicate with another server, here I've got a server running and it just prints out the versions that it knows about. So right now it doesn't have any versions, but if I replicate with that server using the same web socket trick that we did before, then it doesn't kill itself correctly yet. But if I curl that page, I guess it still doesn't work. Oh, there we go. Hooray, it works. Great. So I just didn't wait two seconds. All right. So now our server has the entire history of everything that we've pushed to our page. It's not like you deploy and then your web.html updates. So you can see that your web.html variants aren't using it to ingest that information from your web server. But with this kind of a technique, you can have an experience where if you disappear, the users of your web app don't have to suffer that much. They can take that data and put it somewhere else instead. I think this is a much better model for building web apps. But it's sort of an active technique if you want to remove single points of failure from web apps. We can get data from peers. And how can we do that? Well, there's this really great suite of modules in Node.js called abstract blob store. There's dozens of these things. One of my favorites is called content addressable blob store. And I'll show how that relates to peer-to-peer data in a moment. So with content addressable blob store, you get you get data so that the data that you put in the key of that data is just the hash of the content. Just like you get and get, just like I was doing on the command line a moment ago. So the first thing you do to use this stuff is you require content addressable blob store. And then we can instantiate that with a directory. And so with abstract blob store modules, you get four methods. But the two that are important are create read stream and create write stream. So if I do standard in dot pipe B dot create write stream, then I get a callback that has access to the key of the data. So if there's an error, console log error else console dot log w dot key. I get the hash of the data that I put in. If I want to get that data back out again, I can do b dot create read stream and I can pipe that to standard out. So if I take that hash from before, I get out the data that I put in. Great. So we can use this basic kind of interface to implement a torrent blob store over bit torrent. Why not? That's pretty straightforward. So all that you need to do to use bit torrent instead of a local directory on disk is swap out content addressable blob store for torrent blob store. And you don't even need to give anything there. I think there is so I have a local tracker running. Here's what you can do to use torrents. So if I do echo hello future JS node post.js, now instead of a shah hash as the key, I get a magnet link which has a hash in it. So you can feed this magnet link into any bit torrent client. I've got one on the command line here that I like to use. And so this connects to the tracker in a sec. Sometimes it takes several seconds calculating estimated 100%. Great. So this torrent now made this file called file that has our content. Hello future JS. So that's basically all you need to use bit torrent as an arbitrary blob store for your applications. I might also add that all of what I've just showed you works in the browser too using this project called WebTorrent. And you can go to WebTorrent.io to read more about that. So this is great. We have pure distribution of blobs and we have a system for building replication for handling our HTML payloads. But we're still sort of dependent on servers. I mean we're going to have servers no matter what but it would be good if the servers were just there to provide some redundancy and availability to our network. So how can we build a peer-to-peer log. And this is something that I've only got working in the past few weeks but there is this extension to the BitTorrent protocol called BEP44 and this is a pull request that I made back in April that's still not merged but it's almost passing all of the tests. So the nice thing about this extension is that you can put mutable content into BitTorrent. So how BitTorrent works is normally you can only put immutable blobs. This is like a giant Linux ISO for example. But if you want to change data on the BitTorrent DHT, you can with this patch. You only get one kilobyte to work with but that's more than enough if all that you need to do is put a few hashes that point to other torrent files or other immutable data. So all that we need to do to build a peer-to-peer Twitter for example on top of BitTorrent is save the head which is just the most recent hash and then every new document that we put in BitTorrent it just needs to point at the hash of the previous document and then we can create a program that just walks those links walks that linked list data structure all the way back until we're out of messages with the most recent first which is kind of a nice model for something like Twitter. So let's go ahead and build that. So I've got a little prepared on the Rails sequence for that because I can't quite live code that kind of thing completely fresh. So here is how you can build a BitTorrent back to Twitter. So the first thing you can do, I've got a little module that just sets up the key pair infrastructure because it's a little bit annoying to do this by hand but it's not even that hard so you instantiate that. If you're in this program you'll get 32 byte public key printed to standard out as hex. The next thing we can do is we'll instantiate the BitTorrent DHT module from NPM directly which is not bootstrap it in this case because I'm on conference Wi-Fi but the main thing you need to remember is to pass in a verify function and this just checks the signatures because we're using these elliptic curve keys and if a node saves a key it needs to be able to verify the signature to make that work. So I guess I should step back a little bit how that Bep44 patch works is that the location in the DHT in the distributed hash table for mutable content is the hash of a public key so by signing messages it's guaranteed that whoever put that data there has ownership of that public key so anyways we instantiate this DHT instance and the next thing we can do is keep a pointer around to document what the latest head is of our of our append only log then you can just call DHT.put with the signature signed data so there's a little helper there called .store to handle that now we can publish things to the DHT so I'll just take lines from standard in and I'll pump them through to our DHT instance so the first thing is just to put the hash of the content as the first line and then all of the data payload will be everything else in the file so that's the value and we can call DHT.put to put that value into BitTorrent and just update the head so that is all that you need to do here's the entire program for creating for publishing content Twitter style messages to the DHT that's it 36 lines and lots of empty lines so if I run this program now and I run post.js the first thing we get is this hash so I've got a little get program right here it's also very simple all that it does is calls DHT.get on whatever hash we pass in so if I run get with the first hash I get back null because we don't have any data yet so if I type a message I get a new hash and now when I load the first hash again I'll see a pointer to that message so here's the pointer I can get now so now if I run get on that value I see the first message and the pointer to the previous message which is null so if we want to make another message we get another hash so if I load the identity load the hash of the public key again I get the most recent pointer now which is now this different hash so I get 5b whatever it says another post and it gives me a reference to the first post which is hello world cool so we can take this information we can take this get.js program and create a feed walker that will just render all of the messages so now if we take the body the hash is going to be the first line so we'll split our new line it's going to be the first one if the hash is not equal to null then we can call dhg.get on the hash and so all that we need to do is make our little get program recursive and if everything is working properly if I run get.js on the hash of that public key I'll add a third message just for fun and now if I run that program we get our hash not found oh goodness valid ID I'm not sure why that happened I'll make some new data really fast oh I know why it happened that should be a buffer okay so now if I run the program on the hash of the public key and we'll have a bit torrent demo luckily I'm using a local tracker but it still might fail oh I know what it is I'm not actually printing out any messages anymore I should do that so of course it's not going to print anything yeah so there's the pointer to the first message there's our first message there's our second message there's our third message we just made twitter on top of the dhd thanks so if you're interested in this kind of stuff there is plenty more where that came from there's a project called get torrent that uses that same patch to implement github style thing on top of only bit torrent and I think it also uses bitcoin to do a naming system so there are lots of different techniques for naming systems I'm not as personally familiar with those but there's also a project called ipfs that's definitely worth checking out but they don't quite yet have a pure javascript implementation but it's the same kind of ideas but a little bit nicer and there's tons more torrent modules check out web torrent and I have some homework for all of you so that was pretty easy to build a distributed twitter so what else can we build as a service that nobody owns that no one runs that's a good thing but if you're a developer and you've spoken flicker and soundcloud those are all easy you just put the payloads as magnet links in the DHT I've also thought it would be pretty straightforward using these techniques to build an auto archiving youtube style combination live stream service so you live stream torrent file and a git issue tracker that was peer-to-peer would be pretty cool but the main thing is that geocities never forget rip so thanks