 Hello, and welcome to BitTorrent Hacks! My name is Michael Brooks, and we are here to... We are pirates, we are hackers, and we like BitTorrent. Our motivation for this project is to improve the protocol. A part of this is to educate people on the weaknesses. Here's my speaker, co-speaker David. My name is David Aslanian. I'm a programmer, and I test avionics software. I may have worked on some of the products that some of you have flown here to. With? This is my trophy room. These are companies I have hacked. This is my Millworm page. My number is 677. The BitTorrent protocol is now seven years old. The protocol has become widely successful in a very short period of time. But with this success has come growing pains. The protocol originally envisioned by Bram Cohen is fairly secure. Vulnerabilities have been introduced by people trying to do things with the protocol it was never meant to do, such as private networks and encryption. Most vulnerabilities identified by the CVE, or common vulnerability exposure system, more than 37,000 vulnerabilities are identified by CVEs. Most CVEs fall into specific categories and are given a common weakness enumeration, or CWE number. For instance, all SQL injection vulnerabilities are CWE 89. In the current BitTorrent specification, there are three CWE violations. In order to understand the vulnerabilities, you must first understand the protocol. Alright, so I'm going to go over some of the basics of the BitTorrent protocol here. All the data used in BitTorrent communication is B encoded. The two simple data types are shown here. The first one that you see is an integer. The second one that you see is a string. As you can notice with the string, the size of the string precedes the colon. And with the integer, the integers can be of any size. They start with an i and end with an e. The next data type is a list. This example shows a list that contains one number and two strings. Pretty simple. The fourth and final data type is a dictionary. This is very similar to a Python dictionary. This dictionary shown here contains a single integer and two strings. Dictionaries access their members using strings, whereas lists access their members using numerical indexes. This is a torrent file. These four data types that I just showed you can be used together to create more complex data types such as irregular trees. It's shown here. For instance, the first step in starting a torrent is obtaining the torrent file, obviously. The torrent file uses B encoding, as we just described, to build a data tree. A couple things to notice about this data tree. The leaf nodes in this tree are either integers or strings. Another thing to notice is the announce list subtree. That's a list of all the trackers that the torrent can connect to. Also notice the pieces part of the info subtree. It contains SHA-1 hashes of all the chunks of the file that will be transferred. During client tracker communication, torrents are identified by their info hash, which is a SHA-1 hash of the info subtree of a torrent file, shown highlighted in yellow here. One thing to note is that the list of trackers, as well as some other information about the torrent, can change without the info hash ever changing. Immediately after opening a torrent file, your BitTorrent sends a scrape request, which looks like this, to the tracker, to one of the trackers that is listed in the torrent file. The scrape request is a simple HTTP get. Then the tracker responds with statistics about the torrent, including the number of people who have downloaded the file, the number of people who are seating and leaching the file, and that's pretty much it. The next thing your BitTorrent client will do is it will send an announce request to the tracker. The announce request sends more information about your client. For instance, in this announce request, the event is started, as you can see, most of the way down there. For example, if you just completed a download, the event would be completed, and zero bytes have been uploaded and downloaded in this announce request. Another important thing to notice is the port number, which the tracker will use when identifying this client in an announce response. This is an announce response. What the announce response does is it gives the client a list of peers that it can connect to. The peers, as you can probably see, are not being coded. Instead, it is a binary representation of the IP address and port pairs. At this point, now that the BitTorrent client has a list of peers, it's going to begin a series of handshakes to establish connections with other peers in this swarm. The first packet of a handshake is going to be a UDP ping request, as shown here. This is the response to the UDP ping request. Once the client receives this UDP ping response, the client knows that the remote peer is available, not firewalled or behind a NAT or otherwise unreachable. There are many other UDP packets that will be sent before any sort of file transfer begins. In addition, there are extensions to the BitTorrent protocol that allow things like chat to occur after the handshake is complete. Take a swig of Jameson. Ah! Client trust. BitTorrent was never meant to be an exclusive protocol. As a result, two vulnerabilities were created. The first one is CWE 602. Client-side enforcement of server-side security, or better known as client-side trust. With every announce request, the client is expected to report on how much they've uploaded and downloaded, and how much is remaining to download. Private BitTorrent networks use these announce requests to enforce the share ratio. For instance, some private BitTorrent networks will impose a 50% ratio in which you must upload at least half as much as you download. The tools to spoof this information are well known. They send spoofed HTTP Git requests informing the tracker that the client has uploaded data when in fact no data has been uploaded. RatioMaster is an example of a spoofed client. However, some BitTorrent communities have been able to detect this method. A more difficult to detect method is that if the BitTorrent client lies to the tracker saying that it hasn't transferred any data. In addition, the client never reports that the transfer is completed. From the tracker's perspective, the BitTorrent client is unable to contact other peers in the swarm, which could happen under legitimate conditions. As a result, the bandwidth ratio stays constant. The BitTorrent client that is able to accomplish this is called Stealth Bomber. I created it by modifying the original Python BitTorrent 5.22 client written by Bram Cohen. The Stealth Bomber is open source and is on the DEF CON CD. The next vulnerability is Immortal Sessions. All private BitTorrent communities are vulnerable to this. It is CWE 384, which ranks in at number 7 on the OAuth Top 10. All BitTorrent clients authenticate with private trackers using a session ID sent as a Git parameter. These values cannot be changed because there isn't a system in the protocol to notify the BitTorrent client of a new session ID. The session ID is a part of the announced URL in the .torn file. One could exploit this vulnerability by sniffing the network for announced requests or .torn files. Another try-and-trude method is XSS in the web application. Due to this session fixation vulnerability, SQL injection in the tracker leads to immediate access without the need to break the password hash. The most popular private BitTorrent community codebase is TBDev. And the MD5 password hash is also the Immortal Session ID. Password hashes should always be kept secret. The hashing of passwords is meant to be a last line of defense after the database... Thank you, thank you. More Jameson! More Jameson! Oh, where was I? Alright. DMCA bots. How do you get busted? Just out of curiosity, how many people in the crowd have gotten a DMCA violation? Quite a few of you. When I asked it, black hat and no one raised their hand. I'm like, ah, you guys aren't pirates. Another client-side trust issue is that a number of seeds for the torrent is calculated by looking at the number of requests which have recently set an event as completed. It is trivial to inflate this number by forging announce requests. Just because a torrent has a lot of seeds doesn't mean it's good. In fact, it could be a trap laid by a DMCA bot. Announce requests are important to anti-piracy software. These automated bots will download torrent files of copyrighted material in question and periodically contact the announced server obtaining a list of IP addresses. The DMCA bots will connect to you pretending to be a normal bit torrent client, attempting to download a chunk of copyrighted evidence to use in court. Peer Guardian is a popular peer-to-peer plugin that prevents you from contacting to known DMCA bots. Peer Guardian is similar to a real-time blacklist used to fight spam. I don't like Peer Guardian at all. IP addresses are very cheap and proxy servers are free. Unfortunately, the safest method to avoid anti-piracy bots is to use private bit torrent networks. Demonoid is invite only. However, demonoid users will get DMCA violations. This is because the community is too open. Other communities such waffles.fm and what.cd are very exclusive. I don't think any one of these communities will ever get a DMCA violation. The third vulnerability to affect bit torrent falls under CWE 300. Channel accessible by non-end point or better known as man in the middle. This is a vulnerability in the message stream encryption protocol used for bit torrent encryption. This is not the first security problem that affects MSC. A company by the name of IPOC is able to detect and throttle MSC traffic using deep packet inspection. Furthermore, a research paper entitled attacks on message stream encryption goes over many problems with MSC. The biggest problem with this paper is that many of the attacks discussed are highly theoretical and may not be possible in the real world. To make matters worse, no source code has been provided to otherwise prove or verify any of the attacks presented. I think this is a common problem with academics. They like to talk about security problems and they don't write exploit code. On the con CD is a proof of concept code that I wrote to show that man in the middle attacks and MSC is possible. So here it is. This is MSC in short. MSC relies on three cryptographic primitives. The very beginning before the dot dot dot is the Diffie-Hummond key exchange. This is to produce a shared secret between the two users. This secret is then concatenated to the end of key A, the string key A. Finally, the info hash of the dot torrent file is concatenated to the end. Then, the sha1 hash of this entire string is used as the key. What is lacking from this protocol is authentication. The info hash is sent as a get request, usually over plain text and can be obtained. Or often, you're downloading torrents from the pyre bay, in which case all info hashes are known. This is not a reasonable method of authentication. This protocol is very similar to unauthenticated SSL. Both can use DH key exchanges, both can use RC4, and most importantly, they're both vulnerable to man in the middle attacks. So, speaking of unauthenticated SSL, all of these bit torrent clients whose icons are listed have a completely broken SSL implementation. When communicating over SSL, they don't even bother to check the validity of the certificate. What you're seeing right here on the bottom is a pop-up from views. And what it originally said is basically something like, would you like to trust this invalid certificate? And what they're really saying is, do you want to get owned? And I think, unfortunately, the answer for most users would be yes, because most users aren't going to understand subtleties of what this pop-up box even means. Views also fail to check the certificate, but rather claims all certificates are invalid, which is just as bad. These are not problems in the SSL library. They're problems in how the SSL library is being used. This is like the X files, trust no one, not even your certificate authority. And very powerful security tools like SSL can be used completely incorrectly, even if the libraries they're using are completely secure. So, while working with the Python BitTorrent client, I found some awesome code comments. The first one is, this function is a boat made of shit floating on a river of shit. The next one is, azuria sucks equals choke, I'm not shitting. Yeah, great stuff, great stuff. I don't know if Bram Cohen wrote that, but it'll be funny. Okay, attack surface. So, what is the attack surface of an application? The attack surface is any code the attacker is able to influence through data. The data structures I've shown you, dot-torn files, scrape requests, announce requests, as well as handshake between peers, all data that touches BitTorrent code. Web applications have a very large attack surface, whereas GUI and desktop applications are usually very small. There are also inner workings of the application that isn't a surface at all, such as crons, scheduling, and logic to support configurations. It's best to think of an application as a fortress, but hackers make software into jumping castles. Oh, missing a slide. No, there we go, fun. Any data that the attacker controls is called tainted. If a variable containing tainted data is passed to a sensitive function, that is a sink, then a vulnerability is created. In this example, Taint is coming from the command line using scanf. We're reading in 100 characters, and then string copy is being used as a sink, and that is triggering a buffer overflow. Another example is in PHP MySQL. In this case, Taint is a good variable, and MySQL query is a sink. Input validation is a way of taking tainted data and making it safe to use with a sink. In this case, we're using InVal, passing it to get before entering the query, making sure that it's a number. All right, so we did a lot of fuzzing with BitTorrent clients, and we were looking into possibly any tools that are already out there to help us fuzz this application, to fuzz a variety of applications, and one of the tools that we ran into was Peach. Peach is a popular fuzzing framework, and let me diverge for a little bit here. What you're looking at is another .Torrent file displayed as a tree. What our fuzzing framework does, our fuzzing framework is not Peach, but it uses Peach. What our fuzzing framework does is it traverses this tree with a recursive function, and whenever it finds a leaf node, it inserts juicy fuzz data in place of the original data. Like, for example, once we get to the link node under the info subtree, we see that it's an integer, and then we query Peach for nasty integers that we can insert into the Torrent file at that point, and that becomes one fuzz test case. Just given one Torrent file like this, we can generate over 150,000 fuzz test cases by using this method. I should note that the Peach fuzzing framework does not have an API to be able to say something like, I want an integer that would be great for fuzzing, or I want a string that would be great for fuzzing. We basically had to use a debugger to dig into Peach and find out where we can get that sort of data. At this point, I think it's important to make the distinction between fuzzing data versus fuzzing to code routines. This is applicable not just to BitTorrent. The example at the top here, you can see data being fuzzed. The encoding is correct for the first two examples under fuzzing data. However, the data itself is what's being fuzzed. In the first case, we're fuzzing for format string vulnerabilities. This is in contrast to actually fuzzing the routines that would decode this data, this B encoded data. For the first example on the bottom of the slide, you can see that we corrupted the size of the first string tiny to see what would happen when the decode routine comes across an invalid size of a string. Below that, it's the same thing, just vice versa. One important thing to note is when you find a memory corruption vulnerability in a lower level in the encoding level, the vulnerability would appear in multiple places in the application. For instance, B decoding routines exist when you're opening .Torrent files, when you're communicating with the tracker, as well as handshakes. All three of those interfaces would be vulnerable to the buffer overflow, and it would be up to the choice of the hacker where to exploit it. Our approach to fuzzing is a little bit different than Peach's approach to fuzzing, as you may already have inferred. What you're seeing here is what they call a PeachPit file. Peach is good because it has a lot of pre-made tools that are useful for fuzz testing a variety of different types of software. Peach is bad because you still have to know a lot about the protocol in general, or the file type that you're fuzzing. It can take a lot of time to track down the documentation, or if there is no documentation, you might even have to reverse engineer the protocol before you can even start this process. Creating a PeachPit file, this XML file shown here, is really time-consuming and can be somewhat difficult. What this XML file defines is it defines the type of data you're fuzzing, where the data should go, and how the fuzz test should actually be run. The biggest reason we specifically can't use Peach is because it's not good at modeling irregular trees, such as all of the B-encoded data types that we've shown you thus far. It has an ability to represent binary trees, but that's hardly ever useful in the real world. Now we're going to talk a little bit about code coverage. All of these pictures, all of these products, have something in common, and that is that they're all safety-critical applications. Code coverage has likely been used in the testing process of each of these products, even the robot from Robocop, obviously. The reason that is is because code coverage is really good at finding certain types of bugs. It's really good at quantifying how good your tests are. For example, in safety-critical industries like the airspace industry, you have to show that your code base was thoroughly tested. What code coverage does is it provides visibility into what your tests are actually doing, and it also provides a way to indirectly quantify the quality of your testing. This is really important in fuzzing because fuzzing can be like shooting in the dark. You don't really know what your fuzz tests are doing as a whole, and code coverage can provide much-needed visibility into what fuzz inputs are actually covering parts of the application that you're interested in, basically your attack surface. So what is code coverage? Code coverage is a form of white box testing. It provides visibility into how the code that you're running is being executed. From an attacker standpoint, it allows you to test more of your attack surface. It allows you to identify your attack surface and attempt to increase how much you're testing that attack surface. There's many different types of code coverage metrics, and there's many different ways that you can get code coverage data. For example, we have statement coverage, which is pretty simple. How many statements were executed? We have decision coverage, which is how many control flow structures, like if statements took both their true and false paths. We have condition coverage, which is how many conditions within each decision evaluated to both true and false. In this context, the condition refers to a Boolean sub-expression, say, within a NIF statement. Then the new concept here is sync coverage, which is how many syncs have I executed, and we're able to infer this metric from statement coverage. What we do is we look at the output from a source code analysis program, like, for example, the open source code analysis program RATS, and determine what percentage of the syncs that we're interested in have been executed during a run of a specific test or a set of tests. Just to provide a graphical example, this is a statement coverage report. Produced by LCOV. LCOV is a HTML front end to the GCOV application, which comes with GCC. So there's a couple caveats that you have to be aware of when trying to interpret code coverage metrics. One of them is that just because you've executed code does not mean that you've tested code. So in this example, you can, with the user input of less than 10 characters, you can execute this code without having the bug being... the buffer overflow discovered. In addition, in this case, you're only going to be able to find the null free that exists in this code snippet if the condition is false, in which case you'd have less than 100% code coverage. So 100% code coverage really doesn't mean anything. It's how you interpret these metrics that makes a difference. So this is basically the same example, except it's with decision coverage. You can completely execute this code with these two test cases listed without ever discovering the buffer overflow in the string copy. So there's a lot of code coverage tools out there, but there's a lot of problems with the code coverage tools that are out there. A lot of them are short on the list of features. They don't integrate source code analysis, which is really useful, whether you're a software tester or an attacker. They don't allow you to add, subtract, or compare coverage reports. They don't allow you to automate the process, and this is really a big problem. So how can we use code coverage with fuzzing? During DEF CON 15, Charlie Miller came up with a pretty cool concept that says you keep adding smart fuzz templates until the effect, the cumulative coverage, does not increase. And we built a tool to automate this so that we can have a set of smart fuzz templates to base all of our other fuzz test cases on. And we also integrated source code analysis into this tool so that we could have an idea of what dangerous function calls were or were not being executed during our fuzzing. So, I'm going to give it to Mike from here. Charlie Miller's a doctor. He has a doctoral. Man, I would go to any, but he actually doesn't use doctorate in his name, and I asked him why, and it said because it was too pretentious. I have to agree with him. But hey, if there was a talk by Dr. Doom, even if it was a cooking talk, I would be there, cooking with Dr. Doom. I'm going to show you some smash stacks. All right, so this is the fuzzing application. I've set it to a very special place in the test cases, and I hit enter. That's the stack trace, and more took out. What it's doing is actually it's multi-threaded, where I have about 32 threads for it. Fucking segfault bitches. This hit at like 2 a.m. I did this on purpose. None of them are crashing, but a lot of them are. I was screaming and jumping up and down with excitement, and the apartment above me, they were like, they're hitting the floor, quiet down. They're just like, yeah. Anyway, got to call all Python. Anyway, thank you for seeing our talk. This is the end of BitTorrent Hacks. Thank you.