 Hello, everybody. My name is Nathan Evans. This is Christian Grothoff here to talk to you today about routing in the dark pitch black Basically, this is a talk about security problems in the latest version of free nets routing algorithm So the first important thing I'm going to say is what exactly is the question free nets or the problem that free nets trying to solve Basically performing efficient routing and decentralized networks is a very difficult problem for peer-to-peer file sharing networks and all kinds of different networks Free net claims to solve this problem with logarithmic routing logarithmic to the size of the network Our question is whether or not free nets new routing algorithm is resilient robust and resistant to our attacks In order to explain the attack first, I'm going to go over some of the exactly how routing works so that it's easier to understand But as a spoiler alert, we don't really think that it is robust resilient to our attacks So what exactly is free net for those of you who don't know it's not an anonymous file sharing network Well, it's touted as an anonymous file sharing network Free net add structure to a unstructured network by using node locations in a cyclic address space It's a friend-to-friend network, which means that all the peers are defined They're not discovered in any way like a lot of other peer-to-peer networks The important thing to remember is that our focus is only on the routing not the other aspects of the network and For the routing you need to understand two things first off routing in the best case Assumes that all the data is stored at the nodes location, which is closest to the data's key All data is identified by a key which is in the same cyclic address space as locations also, we assume that All nodes forward requests based on the proximity of the key to the nodes location so forwards it to the closest peer that it has So there's a theoretical basis for the free net routing protocol. It's based on small world networks Since routing uses node locations to determine proximity to the data the algorithm works best in small world networks But it would work in any kind of network and also we need to remember or know that Free net uses location swapping in order to help structure the network and make the routing more efficient So in order to understand the how it's swapping works. We have a an example here First off since we're going to use this example throughout the talk need to know that the circles represent nodes the locations are the numbers inside the nodes On the edges are the distances between the nodes. They're only listed on the two nodes that we're considering swapping because those are the Only ones we really care about we're looking here at a potential swap between point nine zero and point six zero and As you can see the edge distances are large and the idea basically for a swap is that we want to reduce the average edge distances So here's the result of the swap and as you can see the average edge distances have gone down. So that's a good swap Here's the actual equation for how reform you like us of how swapping actually happens Another thing to remember is that all peers randomly initiate swap so any peer tries to swap with its neighbors all the time and Basically P of a B is determined by the ratio of the products of the distances to all the neighbors that appear has before and after a swap That's kind of a mouthful if you want to look at the equation That's what it says But basically if the result of this is greater than one a swap always happens and otherwise swap happens with probability of P of a B as it's shown up there So again in order to understand the routing I'm going to go over get input requests So this is how a get request works again We have to remember that the peers use the location and the proximity to the data is key for routing Basically a client starts a get request and it sends to its neighbor whose closest whose location is nearest to the key that it's trying to find Unless the data is at the neighbor the request is forwarded to that peers nearest neighbor with relation to the key again and this kind of this forwarding stops when the htl equals zero the data is found or the Request has been seen before and that's just to avoid circular routing So now we have an example of an actual get request in our example network again Here we are we're starting from point nine zero the node point nine zero and we're searching for data identified by the key point two two So first point nine zero finds its closest peer, which is point one and it routes to it Point one doesn't have anybody to route to so it replies that it doesn't have the data Point nine zero then tries its next closest peer, which is point six Point six forward sits closest peer, which is point two five point two five responds that it has the data Response to point six and point six responds back. I know this is kind of boring But it's important to understand the routing before we get to the attack So this is how put requests work basically they're really similar to get requests Client initiates the put request or an insert routes its nearest neighbor neighbor checks whether any of its peers have a closer Location than it does to the key and if so it forwards it to that peer and if it is the closest peer out of all of its Other neighbors it resets the htl to its maximum and then forwards a request out to all of its peers This is kind of weird. We think it's mostly for replication But it's relatively irrelevant to what we're doing with the attack And again the routing continues until the hops to live equals zero or a circular route happens So here's a quick put example This one starts from our node point two five and we're inserting data with a key point nine three So point two five since it's nearest neighbor, which is point six Point six has a closer peer than itself so it forwards a request of point nine zero point nine zero decides that it's the closest Peer out of all of its Neighbors so it resets the htl htl to max and forwards a request to all of its peer neighbors again So now we get into the fun part the basic idea for our attack here Is that freenat has a large reliance on a balanced distribution of node locations for storage and routing? But if there's some way that we could reverse the diversity of all these locations unfair storage responsibility and bad routing starts to happen and Basically, this is because if there's a lack of close locations to a key for data All data ends up clustered at a few nodes and all routing responsibilities go to a few nodes And the way we can do this is that peers can't verify the neighbors of their neighbors So basically you can force swapping if you lie about who your neighbors are So here's the details of our attack basically we just create malicious nodes or a node with specific location or locations When a swap occurs with a malicious node a random location is removed from the network a good location is removed And then the malicious node resets itself to bad location again Locations also can spread between non-malicious nodes just to the regular swapping protocol And after enough time we found that even few attackers can create large storage Problems and clusters around certain bad locations So here's our example network again this time showing the attack So in this network we tell point nine zero to go bad So point nine zero then resets its location to one that we choose It then forces a swap with each of its neighbors first with point eight five And then it resets its location and it forces a swap with point one oh Resets its location again Then it forces a swap with point six oh Resets its location again And you can see that it's taken over quite pretty much all of its peers Which we can again force swaps by lying about who its peers are And although they would swap with a low probability We're showing point five zero zero and point four five swapping just so you can see that swaps do occur between Non-malicious nodes and still spread the malicious locations and then after that swap happens The malicious node again forces a swap with the new point four five and then we set its location again So imagine storage or routing in this network for any data whose key is above point five You can see that all routing requests are going to go to the highest malicious Location in the network which in this case happens to be our malicious node, but it could be any of them, but this puts a large portion of the storage responsibilities and Routing requests to go through one node, which is the problem So how did we implement the attack? You don't need to be a lead haxer to write what we did The code is very minor changes to the actual free net code base The attack nodes follow all the steps of the protocol except lying about who their peers are We found that over a long enough period a single attacker can spread malicious locations to most nodes in the network using multiple locations and multiple attackers in our attack helps Make it go faster and makes the effects more Defined Okay, so here's the test bed. We used it's a 800 node test bed We created an overlay topology that conforms to the small world networks as defined by Watson Strogatz We monitor the network to find path links and to monitor the swapping locations For simplicity on our network content is stored at the closest node With relation to the key This is the assumption again for routing But we just put it at that one node because we don't want to worry about replication things like that Also, we set a bound on the storage at each node, which is true in the real world as well So here's an example of how our malicious nodes attack the network Basically the picture on the left you can see is an initial distribution of node locations for 800 nodes You can see it's it's pretty well distributed around the circle In the picture on the right you can see that there's large clusters around our malicious locations and There's large gaps where we've basically taken away all the good locations around there Okay, so now the fun examples. These are our data loss examples in our network It's important to go over how the data loss actually happens Basically, if you remember the example I showed you worth the point five As a malicious node if somebody was inserting data again above point five all that data would end up stored at point five oh four with a high probability what once point five oh four loses its Storage capacity or runs out of storage capacity. It then has to push out data And I'll first in first out fashion. So that's how the data actually gets lost So here's our example On the x-axis is time it goes from zero to two hundred. It's roughly five and a half hours On the y-axis is the percent of data lost in the network. So Also the attack always starts at 75 time increments You can see here that after about two hours of attack time Roughly 20% of the data in the network is lost and this is with only two attack nodes out of 800 So here's the data loss example this time with four attack nodes again in our 800 node network Exact same scales for everything here. We have about 30% data loss with four attack nodes Okay, so here's with eight attack nodes, which is 1% of the network and It's pretty drastic. You can see that there's about 60 percent data loss in the network Also the above and below lines are the standard deviation because we did this over average runs So you can see that sometimes it's a lot more drastic and sometimes it's less This is because we don't choose our malicious nodes in any way. We just randomly choose them So we don't know exactly what that topology looks like when we choose them So what are some possible protections the free net could use to protect against this? One thing they could do is check how frequently a node swap similar locations But as soon as you define similar You're limiting the size of your network and as all peer-to-peer networks want to be they want to have as many peers as possible So that's a not a good solution Another idea is limiting the number of swaps with a particular peer But if you stop the number of swaps at say five with a certain peer and it's still advantageous for routing to swap with that Peer then you're screwed and that's not a very good protection either Can you determine a node is bad because its location is really really close to yours? No, because again if your network is large enough you're expecting you're going to expect to be swapping with people whose locations are really close to you Another idea is secure multi-party computation To compute the formula we showed you earlier But that doesn't really do anything because an attacker can always lie about who his friends are and there's no way to Fix that everybody you can never know who the friends of your friends are in a friend-to-friend network So in conclusion, we don't believe that freenets routing algorithm is robust enough to be used adversaries can remove the diversity of node locations and therefore screw storage responsibilities and routing responsibilities and we cause significant content loss with even a few attackers We use freenets own code against itself. So it's not like we're doing anything Really bad. We're just tweaking a little bit and since swapping is such a crucial part of the routing algorithm it's a really tough problem to fix and It's just a fundamental problem We also think that churn in the network natural churn can cause similar location loss But to find that out come read our paper and the codes available there Okay, thanks everybody for listening questions Could you repeat that? Are you saying that? Okay, so the question is if everybody has a huge data store does that solve this problem in the short term Probably I mean you could tweak a lot of the parameters in our network. We're using we're inserting 25% of the total storage capacity of the network So yeah, if you tweak that it's going to change it either way Essentially what it mostly changes is that how much data loss occurs how quickly right if you got whatever every note has a terabyte of storage space And your overall network only contains a terabyte of data, then you're fine. Nothing will happen if you every note only has whatever 500 megabytes of storage space and your network contains 10 times as much right then your results will be much worse than what we have right now so the point is It's really just gradual and we just picked one quarter as a data point to show kind of okay What will be realistically expect for such kind of network to be able to handle right? It approaches essentially a hundred percent if you keep it going it will approach a hundred percent as you can see the graphs I mean obviously the curve flattens out, but as you as the malicious note location spread Right and they start spreading slower and slower over time But they will continue to spread because of them the random swaps that the network does you will eventually take over everything Even with just one malicious node you would eventually take over everything questions, you know, how long do you want to run these simulations and? 800 nodes is a lot we don't have that much processing time anybody didn't hear that the question was what happens if you run this for a really long period of time and Our answer is that eventually even one attacker can take over all the locations of the network I mean given enough time random swaps are going to happen and all the Nodes will become bad Yeah, of course, there's other networks that don't suffer from this, but So the question here is is this an architectural free net problem, I think or is it? Okay, and then how General problem with friend-to-friend networks it is well here It's a real big problem because of the swapping because they really rely on who your friends are for this equation In a friend-to-friend network that doesn't rely on swapping for such a big portion of the routing Then it wouldn't really be as much of a problem and essentially it's an architectural problem in the sense of the free net routing Algorithm specifically the other parts of free net are not impacted But it's kind of essential that you you know find your data quickly and Where you store your data and so on and so in that sense. Yes, it's the free net architecture It's not a general friend-to-friend problem I would just say friend-to-friend networks make it incredibly hard to do these kind of things to begin with and The answer is just this is not a solution at least not a good one Yeah, or their magnitude of for for routing And the number of swaps okay The question was what's the order of magnitude of our algorithm in relation to the network what we have to do well The answer is obviously you have to have kind of a o of n as in linear number of swaps to take over the entire network Right and the rest I mean this is unit swaps your neighbors How long it will take to spread again depends on how often the nodes himself swap right? We didn't actually well I just I mean so Any more questions? Okay. Thanks everybody. Thank you