 My name's Tara Vancell, and I'm gonna tell you about Merkle trees and how they enable the decentralized web. So I wanna start out by talking about two centralizing forces that affect how the web works. The first is what we'll call the server problem. So if you want to upload files to the web, you need to have a server, but most of us don't have servers lying around, and they're also sort of burdensome to manage. So typically we choose the more convenient option, which is a hosting service like YouTube or Medium. Now this is convenient, but what it results in is a lot of web content being concentrated on the infrastructure of a handful of providers. The second centralizing force relates to how we address content on the web. We currently use what's called a host-based addressing model, and what this means is that if I upload a video to YouTube, the URL for my video is tightly bound to the youtube.com origin. And if I ever wanna move my video to Vimeo, for example, I need to get an entirely new URL. So there's a lot of friction involved in exercising choice between hosting providers. And most likely, once I've made my original choice, I'm probably gonna stick with it. Now luckily both of these problems can be solved with a peer-to-peer system. And the first part of the solution I wanna talk about is called content addressing. Content addressing is pretty much what it sounds like. It's generating an address for a piece of content based on the content's value. And we do this with a cryptographic hash function. A hash function is a one-way function that takes an input, like a file, and generates a fixed-length output, typically 32 bytes. And it's important to know that a well-designed hash function guarantees with extremely high probability that no two inputs are ever gonna generate the same output. So a hash is a unique identifier for a piece of content. So how does this help us solve the host-based addressing problem? Well, if a hash is a unique ID for a piece of content, that means we can also use it as an address for that piece of content. So at this point, if I wanna move my video from YouTube to Vimeo, it can maintain the same address, no matter where it's being hosted. But we still haven't talked about how to reduce the burden of hosting files on the web. And we can also do this with a peer-to-peer network. We can take a data set, split it up into small chunks, and distribute it across a network of peers. And then those peers can share the responsibility of providing bandwidth and disk space. Now, this does make the responsibility a little bit more manageable, but there's something else really important that we need to consider, and that's trust. Participants in a peer-to-peer network are anonymous and untrusted, so it would be really risky to download a file from a peer without first verifying that it's the file that you actually asked for, and not something malicious. And this is where Merkle trees come in. A Merkle tree has some pretty unique properties that allows us to do efficient data verification across a network of peers. So what is a Merkle tree? It's a binary tree, where the nodes store hashes instead of storing chunks of data. Hashes of what? Well, the leaf nodes store hashes of chunks of data, and the parent nodes are a little bit different. The parent nodes are the result of concatenating the left and right children, and then applying the hash function to that result. But let's actually step through constructing a Merkle tree so we can see how this works. So we have four chunks of data, and we start out by determining the leaf nodes. We're simply going to apply the hash function to each chunk. From there, we'll determine the parent nodes. Let's look at A. So to do A, we concatenate the hash of one and the hash of two, and apply the hash function to that result. And we'll do the same exact thing for B and its children. Finally, we do the same thing for C, the root node, concatenate A and B, and apply the hash function. So a root node has a special name in a Merkle tree. We call it the root hash, and as we'll see in a few moments, this actually plays a pretty important role in peer-to-peer networks. So let's look at an example. Let's compare two Merkle trees to see if they're equal. Remember that a Merkle tree is just a representation of a data set, so in effect, what we're doing is comparing two data sets to see if they're equal. You'll notice that these trees have different root hashes, and you might be able to see why. It's because one of the leaf nodes in the tree on the right is different than the tree on the left. So as a result, its parent node is different, and consequently, that node's parent is going to be different as well. So a change in any single leaf node bubbles up all the way to the top to the root hash, and this means that the only thing we need to consider when comparing two Merkle trees for equality is the root hash, and this is wonderful, because not only does it reduce the number of comparisons we need to do, most importantly, if we're doing this comparison over a network, then the only thing we need to send over the wire is the root hash, and typically that's 32 bytes. But how does this come into play when we start talking about peer-to-peer systems? Let's see another example. So I mentioned that the root hash has a special role, and the first value it provides is serving as an address for a set of files or data, and this is a lot like how URLs work on the web. A URL points to a set of files that live on a server somewhere, and if you visit the URL, you can download the files, but you would never visit a link from someone you don't trust because it might just point to a virus, and we have to apply that same principle here. When we get a root hash, it needs to come from someone we trust, so that's what we've done. We got the root hash C from a friend, and at this point, we can start asking peers on the network to send us the files that are associated with that root hash, and we got the files. We have four chunks of data, the numbers one through four, but remember that it came from a bunch of random people on the networks, and we don't trust them, so we need to verify the data. So let's ask ourselves, how might we do that? Well, this is the second role that the root hash plays. We have the root hash from our friend, so why don't we just reconstruct the Merkle Tree to get another root hash and see what we get, and that's exactly what we can do. We'll construct the Merkle Tree with the data we receive from the network until we get a root hash, and we see that it's the root hash that we got as our friend. So we know with 100% certainty at this point that the data we received wasn't accidentally corrupted or intentionally tampered, because if it were, we would have gotten a different root hash. So at this point, you might wonder, if the purpose of the root hash is to let us address a set of content, and to also do some integrity checking over that content, why don't we just concatenate all the data together and then apply the hash function to that? Why bother with a tree at all? And that's a pretty astute observation. That would work, but it turns out there is some value in constructing a tree, and we're gonna see why in a second. So in the last example, we needed to download all of the data before we could verify it, but on a peer-to-peer network, we download data from people all over the world and at different times, so it would be convenient if we could verify data as we receive it. Likewise, we might not be interested in the entire data set. Maybe we only need one or two files, so being able to partially verify data sets would be pretty cool, and it turns out we can do that, and that's why we need to construct a tree. So let's see another example that's pretty similar to last time. We received a root hash from someone we trust, and we've started asking peers on the network to send us data, and we got the first chunk of data. It's the number four. So let's ask ourselves, what other data do we need to reconstruct this root hash and verify that the number four belongs in this Merkle tree? Well, it turns out it's not very much data at all. We already have the number four, so we can determine the hash of four. We'll need the hash of three so that we can calculate B, and finally, we'll need the node A so that we can calculate the root hash. And again, we see that we got the root hash that we expected to, so we know that indeed, the number four belongs in this tree. Now, it's really important to notice that the only thing we needed to download from someone we trust was the root hash. The data itself and all of the proof required to verify the data came from an untrusted peer, and this is a critical piece of making it possible to distribute the responsibility of hosting files across the network of peers. Merkle trees are used in dozens of projects and they actually have some other cool properties that I won't have time to discuss today, but they're of vital importance to peer-to-peer networks and decentralized systems. Being able to address content with the root hash means that we get consistent links that stay the same no matter where the content is hosted, and being able to efficiently verify data sets makes it possible to distribute the responsibility of hosting files. In summary, Merkle trees make it possible for the web to be built on the small contributions of many rather than the concentrated resources of a few. Thanks.