 Good afternoon. This is HiveMind. We are looking at distributed file storage using JavaScript botnets. I am Sean Malone, principal security consultant at FusionX. We are definitely hiring. FusionX needs a little bit of an introduction though, so let me tell you a bit about what we do. We do a combination of penetration testing, red teaming, sophisticated adversary assessments. Basically, we assess your entire organization, not just a particular network or system or application. So if that sounds like something you'd be interested in, hit me up after the talk. The problem that we're looking to solve here is that sometimes, even when using encryption to store sensitive data, we run into problems. That problem is that with encryption, the data is still present. It's simply encrypted. And if it's encrypted in a way that we can recover it, then someone else can force us to recover it for them, such as a court order or a five dollar wrench. So encryption is not always going to be enough. So if we can't simply store the files encrypted on our own systems, what can we do? The first thing that comes to mind, store the files on someone else's system. That way if your system is seized, then the files aren't there. The problem is that that's usually illegal. So what I want to do is look at a way to do that with standard functionality in a way that's at least less illegal, mostly legal. So the way we do this is standard functionality, no exploits. We're just using some tips and tricks and looking at the standard features in web browsers. So what I mean by this is that all of the techniques that I'm presenting here, all of the features that my technique uses are used in real web applications. So there's nothing to patch. Removing these features would break modern web applications. So that's a great advantage here because this is something that's going to work for the foreseeable future. It's not something that is only going to work until some vendor patches a particular vulnerability. First a disclaimer though. This is a research project. I'm not responsible for what you do with this software. It's not intended to be used to store critical data at this point, though the concept should be able to get there eventually. Also, I'm not a lawyer. Nothing in here is legal advice and I'm not responsible for anything legal or illegal that you choose to do with this software. Web browsers have undergone some significant changes in the last 15 years or so. We started off with the most basic form of client side storage or the browser cookie. We had JavaScript for data processing and Ajax or asynchronous JavaScript and XML for that back end client to server communication. That's changed recently with the advent of HTML5 features. We have all of those older technologies still present in the browser but they've all been upgraded. Now we have web storage to store larger amounts of data in the browser. We have web workers that can spin off JavaScript threads that are separate from the main GUI threads. You can do a lot more processing without gumming up your application and we have web sockets that creates a persistent socket from the client browser back to the server. So the end result here is that a web browser is basically a computer program that will communicate back to my server, execute any arbitrary code that I hand it and store any arbitrary data that I ask it to store. Sounds like a botnet node, right? You might ask what about sandboxing? Doesn't that make it impossible to access the system data, execute code on the system? Yes, it does. That's the purpose of some of the browser security improvements but the short answer is I don't care about that. I don't need to do anything outside of the normal browser security model. I'm simply running code in the context of the domain that loads the code and accessing data that I've stored on that same domain. So it's all on the same origin, it's all within the browser security policy. Again, these are features, not bugs. So let's look at what it takes to actually build a botnet on top of web browsers. The first step in building any botnet is going to be the node infestation. How do we actually get our code running on the node? How do we take control of that particular node? The first and most obvious technique is to simply use a site that you own. If you own a site that's getting a thousand hits every five minutes, then you have the capability to execute whatever code you want on a thousand different web browsers every minute. That's a lot of power. Most sites don't do anything with that but there's definitely the potential there. Next one is compromised sites. So anytime there's a persistent cross site scripting vulnerability where we can store a piece of JavaScript on the site that is executed every time somebody visits that particular site, we can include every visitor to that compromised site in our botnet by adding that piece of persistent JavaScript onto the compromised site. URL shortners are a fun one. Normally you have a URL shortener that simply redirects to the target but what if we simply load a full screen iframe showing the intended URL and in the background we have a second iframe that is running our botnet code. You can use ad distribution networks. There was a great talk at Black Hat this year about various ad distribution networks where instead of distributing an image you can actually give them an iframe source and they'll put an iframe on the target pages that then sends traffic back to your site. The intent is to use this for SEO, page rank type things but if you have people going to your site you can make them a member of your botnet. My personal favorite is the anonymous proxy server. I stood up an anonymous proxy server just an open anonymous proxy listening on port 80. I stood this up a few weeks ago, let it just sit there, didn't advertise this, didn't solicit traffic at all and right now it's getting hit by about 20,000 unique IP addresses every 10 minutes. This is completely unsolicited traffic. I never promised to do anything with this traffic. I never promised to return any particular content. I never promised that the page I return is the actual page they request. Usually it looks a lot like that page that they request but it also has an iframe in it. So it's another great way to build a botnet very easily and very quickly. Command and control is done through the HTML5 web sockets. This quote here is from the official working group publication on web sockets to enable web applications to maintain bidirectional communications with server side processes. That could have been written with botnet communication in mind. That's exactly what you want to do for your command and control channel. When that doesn't work you should always have a way to fall back to Ajax. Older browsers don't support Ajax and sometimes when you're going through proxies and such, web sockets and proxies don't play nicely so it's always good to have that additional fall back there so you don't lose your notes. Data storage is done through HTML5 web storage. Again a quote from the working group publication. The part that I like here is web applications may wish to store megabytes of user data. What they really mean is megabytes of application data. Megabytes of whatever the application server decides to push down to the client. So I'm making that megabytes of my data being stored on all of these different browser nodes. The back end is a Ruby on Rails application with a MySQL database for the active record database abstraction layer. In addition I'm running a Redis server as well. Redis is an in memory key value storage that has some nice features for what we're doing here. Redis by default has persistence. It writes to disk but you can disable that meaning when the power is pulled the Redis values are gone. And you can also expire particular keys so say you're uploading a file splitting it into blocks. If those blocks temporarily live in Redis you can simply set a key expiration there and those blocks disappear after a particular time. So it's a great way to check sort of a time to live for all of the nodes in the blocks for all the different files. So that's what it takes to build a JavaScript botnet. We're going to be using this JavaScript botnet for data storage but there's definitely more that we can do with this. Other fun botnet uses would be network scanning, simply checking to see what ports are open. And again all of this is coming from your nodes. This does not show as coming from a source IP address of your command and control server. DDoS attacks are another fun one. And data processing with web workers, anything that you can break up into a relatively discrete task you can push down to these nodes and have the nodes do all of the heavy lifting for you so long as you can write it in JavaScript. JavaScript is not going to be nearly as efficient as writing it in something like C but when you consider that you can spin off multiple threads so you can have four different threads running in four different cores if your node is a quad core system. And if you can do this on say a persistent cross site scripting vulnerability on a popular viral video or something, that's a lot of processing power there. And it's free. Now we have the botnet, let's look at what it takes to actually build a file system on top of that botnet. First a few definitions here. A file block is what I'm using to refer to a piece of an uploaded file that has a set maximum size. So a file is going to be made up of multiple file blocks. A node is simply any web browser that's a member of the botnet. And the server is the central command and control server that also serves as sort of the phone book for these files. It's the directory of what files have been uploaded and where all of these different files live. So when we're storing a file we upload the file through the web application just like any other web application. And it is going to need to live on the server for a very short period of time while we execute the following steps. We break this file into the name, the mime type, and the data. We take all of this and put it into basically a JSON encoding so it's a simple string at that point and encrypt that. And that returns, and this is just a simple additional step of AES encryption so that when we push these blocks down to the nodes, the nodes can't see the actual data in the file. The end result is the encrypted data which is a base 64 string. We split that into a bunch of different file blocks that simply take the first 1024 characters, pull those off into a block, then the next 1024. All of these elements are tunable here so there's no particular reason that I'm using 1024 depending on the particular file and the reliability of the nodes. You may want smaller or larger file sizes. Sounds like it's time for a quick break. All right, what's this called? Shot the noob. All right, it's really hard to get accepted for a talk here at DEF CON. So congratulations to our new speaker. Very competitive. All right, I need someone from the audience. Over there. Come on up. Yep, you. It's first time speaker, first time at DEF CON. Right, there you go. I don't even say it anymore. All right, to our first time DEF CON attendees and speakers. Rough night. Not doing well. Three shots this hour. You can just stay there for the rest of the talk if you need to. So we now have file blocks from our uploaded file. The next step is storing those blocks in our botnet. B1 represents a particular block one from our uploaded file that is living on the server. We're going to pull in a certain number of nodes from our botnets. We randomly pick a certain number of nodes that have checked in with us in the last minute or so. So we know that they are online. We push this block down to the nodes there. And so now the block lives on the nodes and does not live on the server. The server keeps track of which nodes have that particular block and it keeps track of the checksum for the block but it does not keep the block data itself. So now this is going to be a very transient botnet. As nodes come and leave, these particular nodes may only be online for another few minutes. Maybe even another 30 seconds. So what we're doing is we do a constant heartbeat where every 5 or 10 seconds depending on how you have this tuned, the nodes are going to be sending up a heartbeat where they basically check in and say, hey, I'm a node. I'm still online. Here's my node ID. Here is the ID and checksum for each block that I have stored in my browser local storage. So eventually some of these are going to go offline or the data is going to be corrupted either intentionally or unintentionally. We have to keep in mind that we can't trust the nodes here. Somebody running that node could be intentionally modifying the data. So once the number of live confirmed good nodes drops below a certain value, we then replicate. We pull in a set of new nodes that do not currently have this block. We take the, the server sends a query down to the existing good nodes, pulls that block back up to the server and distributes it to the new nodes. So we're back up to that safe level of replication to ensure that we don't lose that block. We have to go through the server. We can't do this in a strict peer-to-peer fashion because JavaScript can't actually open a port from within a browser and listen for an incoming connection. From my perspective, it would be great if we could, but it's not such a great security move. Retrieving a block looks very similar. The server simply sends out a query to all of the, all of the nodes containing a particular block, saying, hey, please send me this node and the node sends it back up to the server. All of the nodes will send it back up. The server does a checksum verification on the server side to make sure that what it's getting back is what was actually stored. And then it stores that temporarily in the Redis data store. And it puts it in there with an expiration of say 20 seconds. So all of the blocks are going to be requested and they're stored locally in memory on the server for that time to live. This lets us rebuild the file now. So we've requested all of these blocks back from the nodes. We simply concatenate them and rebuild that into encrypted data. And the password is provided at this point by the user. And the decryption is then done providing us with the name, the mime type, and the actual file data. Rebuild that into a particular file and provide it as a download to the user. And the user is able to download it from the web application and from the user's perspective, once all of this is set up and running, it's very simple. It's provide a file and a password, upload the file, come back later, provide that password, download the file, and have the file back on your system. But this file meantime has been distributed across all of these different nodes. So getting back to where we started this talk, we want to do this so that that file is not living on the server itself. So when everything goes wrong, here's what happens. Pick your favorite three letter agency. They come in and seize this server because they've heard that you're storing some sort of data that they want to know about. What happens when they seize the server is that that server goes offline and the nodes go offline. They're no longer connecting to the command and control server. In this case, the block replication is going to fail because the nodes are going offline, but they're all going offline. The server isn't getting that heartbeat. The blocks aren't being replicated to new nodes. The end result is that the blocks are lost. And when those blocks are lost, the server no longer has a correct phone book. The phone book for those blocks is out of dates. It doesn't know where to find those blocks if you want to go back and download that file. So the end result is that the files are unrecoverable. Now let me be clear on what I mean by unrecoverable here. It's practically speaking, it's not feasible to recover the file. It is definitely possible to go out and seize all of the nodes, or at least a critical mass of the nodes in the botnet, but that's going to be at least an order of magnitude more difficult than simply seizing a file and getting a court order for, or seizing a server and getting a court order for the owner to decrypt the data on that server. It's also possible to poison the botnet by injecting, if you're part of this three letter agency, you inject enough of your own nodes deliberately into this botnet, log all of the block data, and then rebuild the file after you seize the server. You have the additional layer of encryption here, but as we talked about, sometimes that's not enough. So the only real protection that you have against this is to have a sufficiently large botnet that it would be difficult to seize every node. There's also a certain element of security through obscurity here where you have to know that this is how the files are being stored before the server is seized. You can't go back afterwards and inject nodes once the server's gone offline because those blocks can't be recovered in order to be replicated to your nodes. Obviously if the server itself is compromised, and I mean compromised instead of seized, so if that three letter agency is able to access the server without the server going offline, then they can issue that rebuild command and intercept the file on the server itself. So there are definitely some limitations to be aware of, but there's always going to be that security usability trade off here, and I think that what we have here provides a drastic increase in security in that it is significantly more difficult to recover the file if you're looking at a server seizure situation, but it's still very usable from the end user perspective. So there's some interesting unanswered legal questions here, and I'm deliberately labeling these unanswered. I have my own personal opinions on these, but I think there's still a lot of unknowns here. The first one is, is this legal? I'm calling it mostly legal. There are definitely legal ways to build the botnet, such as if somebody's going to a site that you own, but is the very act of storing a significant amount of data that's unnecessary for the functionality of the site. So the user's intent was not to download that data. Is this legitimate, or does that constitute unauthorized use of a computer? And the same question for bandwidth and processing power. Because any time we're doing all of that heartbeat, the block traffic, we're using bandwidth and we're using processing power as well. This is even more true if we're doing an actual data processing botnet with web workers. Bandwidth is going to be even more true if we're, say, conducting some sort of high traffic application using those nodes. I look at this and say, you know, this sounds a lot like a animated flash advertisement. If you go out to a particular site and they push down a flash advertisement, it's additional bandwidth when that ad is pushed down. It's additional storage, at least temporarily in the browser storage there. And it's definitely additional processing power. So we're talking about more difference in quantity as opposed to quality. My opinion is that legally it's acceptable because somebody did deliberately go to that site and when you go to that site, there's sort of an implicit assumption that you're going to download and execute in your browser whatever that site gives to you. There's not an opt in for each and every component. But it is unanswered. I'm not aware of legal precedent in this area. From the other side, what if you're storing data without encryption or without any form of encoding and so it turns up in a forensic search of one of the nodes. So somebody's running their web browser, happens to become a member of the botnet, you push down data. If their system is later analyzed forensically and this illegal content shows up on their system, that's going to look pretty bad for them. So are you responsible for data that a site that you deliberately went to loaded a hidden iframe, push down that data onto your computer? Are you responsible for that data? I don't know. Demo time. So we'll start off showing the node side of things. This is my personal website. I'm loading it through this proxy here. I've got it running with foxy proxy. And if we look at the source for the site, most of this is normal source but down at the bottom right before the closing body tag, we've got this hidden iframe. This is done through a simple engine X proxy and there's a Lua rule in there that simply says, do a find and replace on the body content of the response and replace that slash body tag with the iframe and then the slash body tag. It's really simple. It's very efficient and it pushes out iframes to thousands of different nodes. On the console side of things, we see all of these different requests going back and forth. The check queue, and again, I've had this fall back to AJAX because we're going through the proxy and it's also easier to see because some of Firebug's debugging features haven't really caught up with the persistent WebSocket connections. So these post requests for check queue are basically saying, anything that I need to do, are there any blocks that you need me to store? Are there any blocks that you need me to send back to the server? The heartbeat here, let me see if I can grab one of these. So the post data here is simply the block ID, that's that file block in the UID, and the MD5 checksum for each of those file blocks. So these are blocks that are currently being stored in this node. So it does that heartbeat every so often to just let it know, hey, I'm still here. These are the blocks. These are the checksums. However, if I close down Firebug, you see my pretty face and no traffic there. So it's all completely transparent in the background. Here's what the C2 server interface looks like. Again, this is a Ruby on Rails application. We've got a simple interface showing the files that have been uploaded. And there's a separate page here for the nodes. So this is the list of nodes that have been active within the last minute. In order to retain a little bit more control over this particular demonstration, I'm not having this run with thousands of different nodes. This is just from a few IP addresses and systems that I control here. The last updated time is simply the last time we've heard from the node there. The UID is something that we store in the, in a cookie on the node to keep track of which node is which, and then we correspondingly use that in the reddest data store for tracking which blocks live on which nodes. So let's take a look at what it takes to upload a file. We simply put in the name of the file, put in a password, choose a file that we're going to upload, and go ahead and upload it. Basically the same as any other web application file upload. The file itself is assigned a UID for the directory tracking purposes. We go over to detail. We see it's got this file name that we assigned it, but the original file name is encrypted with the file data and stored out on the nodes. Here's the listing of all of the file data with each of the file blocks and then the nodes that each file block lives on. So at this point we've got the replication set to four nodes. In a production bot net you definitely want to have that set higher. Say maybe distribute across 20 different nodes and if it drops below 10 replicate until you're back up to 20. So there's a large number of blocks here because I have my block size set relatively small. Again, all of this is tunable. When we go into the fetch dialog, we put the password back in. Go ahead and fetch the file. It loads all the different file blocks and looks like I typoed it. I may have typoed it when I created it. There we go. All right. So and this is a real time loading bar here in that it's actually showing what blocks do we have and which ones are we still waiting on. So as it goes across that showing we've sent out the request and more and more blocks are coming in when it gets to the end. We finally have all of the blocks. The files ready. We can catenate, decrypt with that password that we just provided and the files downloaded. Yes we want to keep this file and now we're able to view our data that's getting more and more dangerous to be caught with. I am going to be releasing the code for the bot net itself. Both the engine X side of things which is basically an engine X configuration file. That's all there is to it. And then I'm releasing the web application side of things with the Ruby on Rails application. Again it's a research project. It's not the most stable software out there. But you'll at least be able to see how I do things, how I track the blocks. All of that is going to be available. Code will be on GitHub but it will be linked to from my personal site and the slides will be up there as well as well as a video of the presentation. With that I'll open it up for questions. I think we have two microphones at two different locations in the room here so if we could use those to make sure I can hear you that would be great. Yes. Hi. Hello. I wanted to ask you what happens if the three-letter agency seizes your system while it's still operating? Still connected to the net? So if they seize it while it's still connected, if they take it offline, the replication fails. Right. If they keep it online, if they are able to take control of the operating system while it stays online, then they would be able to rebuild it. So you want to take the normal physical security measures to make it as difficult as possible for them to take control without actually unplugging the system or at least disconnecting it from the network there. Thanks. I'm wondering if the internet connection to the server goes down, does that mean all your files disappear with it too because now all nodes are disconnected? Correct. If the internet connection goes down, if the nodes can no longer talk to the server, then the data replication fails and the blocks are lost. If it comes back down, or if it comes back online quickly enough, I mean probably within five minutes or so, you'll probably have enough nodes left that you can recover the data, but it's not guaranteed. So the purpose of this is definitely to store data where it is better to lose it entirely than to have somebody recover that data, decrypt it and be able to pin it on you. Thank you. Over here. What about the file size limits of what the browser will let you store? Yes. So each node is generally able to store roughly five megabytes of data without prompting the user, and we definitely don't want the user to be prompted to allow more data, but that's five megabytes per node. So if you have, say, 10,000 nodes, that's 500,000 megabytes, even if your replication cuts that by a factor of 10 or so, that's still a lot of data that can be stored in this botnet. Yes? Would it be possible to set a timeout on the web storage to make the node side blocks self-destruct after a certain amount of time? Yes. And you can definitely add such a timeout there. It's sort of a fail safe, kill switch type thing where if the node cannot talk to the server within a certain number of seconds, then it simply wipes the local storage in the browser there so that even if the nodes are recovered or seized, more work has to be done at least in order to access that data. What kind of transfer overhead is there in comparison to the file size, both on the server and the node end? So in terms of the actual algorithm for the encoding, for the encryption and the encoding, I don't know exactly as a percentage of file size, but it's basically AES encrypt or JSON encoding, AES encryption, and then it's just chopping it up into blocks. I mean how much data is being sent back and forth? What kind of bandwidth are you using compared to the file size? So it's going to depend entirely on how much data you're storing and how much is stored in the browser. Those check queue commands are very small. That's a post request with no data. Is there anything for me to do and normally it's just getting back an empty array. There's nothing left to do. The heartbeat command is what you saw up there on the screen with the block IDs and the MB5 for each block, so there's a little bit more, but usually it's just getting back a 200 okay response. So it's pretty lightweight as far as the total amount of bandwidth is going to depend on your tuning parameters for how quickly you're checking that queue and how often you're sending the heartbeats. So those can all be tuned depending on how stable the particular nodes in this botnet are. Yes? Do you have any way of protecting against say a malicious user who connects and sets their local storage to be persistent in their browser versus just I assume you have it set for like a transitory temporary thing so it's not permanent with the domain once it's offline. So we do store it in local storage meaning that it is going to be more persistent and the reason for doing that is say you have a browser with multiple tabs open if the user and if they're all going through that proxy you want to use it to be able to close tabs, move to other tabs and have that data stay there so you're not needing to replicate unnecessarily. It would be possible to use session storage which is going to expire more. Again if, no matter what you're doing, if you have a deliberately poisoned botnet and that three letter agency is able to get a sufficiently large number of nodes, a sufficiently high percentage of nodes then regardless of how you set it if they're logging that traffic they'll be able to log those blocks. So it may provide a little bit of additional security but not significantly so. Yes. Are there any inherent restrictions or reasons why you wouldn't have the clients connect to a series of failover servers in the event that your power goes out or your internet connection is dropped? So you could however that configuration would need to be pushed down from the C2 server and that gives that three letter agency multiple chances. So if they seize that first server and everything goes offline if replication is still being done through a second, third, fourth, fifth server then once they do forensic analysis on the first server they'll see well we screwed up our chances with this one but we know that we have to take different tactics and possibly poison the botnet since it still exists and is being replicated on those other servers as well. So again it would definitely provide a higher availability guarantee but it would provide a significantly reduced confidentiality guarantee at that point. Thank you. Yes. When you mentioned the legal questions outstanding have you consulted legal counsel about that? I have not. I've got someone I've got a card for you after. Sounds good. Yeah, I'd definitely be interested in exploring that side of things a little bit more. Do you have a sense empirically of how what percentage of the file lives on the server at any given moment because of replication? Empirically no. Theoretically it depends on how quickly you need to replicate. So the more stable your nodes are the longer those nodes are online the less often you're going to need to replicate. And it's that replication that causes the data to need to flow through the server again. Any time a file is uploaded any time a file is rebuilt and any time a block is replicated that data is stored on the server with a timeout of 20 seconds. For a relatively fast botnet where you have at least one node for each block that's going to reply much more quickly than that you could probably tune that down to more like 5 or 10 seconds. But it's hard to say for sure because it depends entirely on the makeup of that botnet. Alright, I think we're done. Thank you very much.