 Thanks for coming to my talk. This is broken brokers in boxes. Fuzzing breaks everything even early. And this is a story about how I found three different vulnerabilities in open source message brokers earlier in 2021. We're gonna start out with a little bit of an introduction, just talking about what a message broker is and why it's important and a little bit about how it works. And then we'll review fuzzing. We'll talk about various different domains of fuzzing, how to make good test cases, and especially how this applies to network protocol fuzzing, which is what I used in this research. Then we'll talk about how software fails. There's a lot of different ways for software to fail. And so we'll discuss some of those and what their implications are. I'll briefly touch on why Erlang is a good environment for message brokers and other types of highly available, highly concurrent service applications. And then I'll talk about how I took the targets of my fuzzing, these different message brokers, and put them in Docker containers as a way to help me find resource exhaustion problems. And then finally we'll sum up. So as I mentioned, the vulnerabilities that I found are in message brokers. And the easiest way to explain a message broker is that it's kind of like Slack for robots. It's kind of like a messaging system for all of the different parts of a system. So you can see in the picture here, like Slack, it has a publisher subscriber model where you can join a channel or a room or in message broker terms, it's called a message queue. And you can publish messages to that queue. And then various subscribers are able to subscribe to specific message queues. And every time somebody publishes a message into it, they get notified, they get the message. And of course, system components can be publishers and subscribers, but that's the fundamental function of a message broker. Where would you use these? All sorts of places. So one example would be maybe you're an electric power utility and you have smart meters on thousands or tens of thousands of homes. Maybe those are all publishing to various message queues. And then you've got backend billing systems or monitoring systems that can pull the messages from those message queues and do whatever work they need to do. Likewise, you could imagine maybe an Internet of Things home security system where the various door and window sensors are publishing messages or events into message queues and getting picked up by other parts of the system. And on and on, you can see that a message broker can be a very central and very important part of a larger application system. The vulnerabilities themselves are, there are three of them in three different message brokers. So one in RabbitMQ, which is a pretty popular open source message broker. One in EMQX and one in VernMQ. And all of these I found using protocol fuzzing. So the way the message passing works is via network protocols. And so for RabbitMQ, I used fuzzing of the AMQP protocol. And for the other two, EMQX and VernMQ, I used fuzzing of the MQTT protocol. So that's all well and good, but wouldn't it be fun to actually see something break? So let's go take a look at that. So I'm going to show you, I'm going to, I actually have RabbitMQ running here inside a Docker container. And it's gone through all of its initialization and it's ready for business. It's listening for incoming network protocol connections in AMQP. And then I have an exploit script, which we're going to look at a little later. But for now I'm just going to run that script so you can see what it looks like when it fails. So I'm going to, it's just a little Python script, so I'm going to run it here. And then over here, that's it. That's the fireworks right there. That was RabbitMQ going down. So you can actually see this time we got a couple of messages about memory. And that's in fact the problem that this vulnerability is is that it chews up all the available memory and the container gets shut down automatically. The process gets shut down automatically. Okay, so I know it wasn't really like stuff blowing up, but this is kind of the nature of application security where it doesn't look like much, but it's pretty important. Okay, so I told you that I used FuzzTesting in order to locate these vulnerabilities. So let's talk a little bit about what FuzzTesting is and all the different ways that you can do that. HDMor created this definition of fuzzing, which is pretty great. Basically it says you're sending data that's intentionally malformed to an application. And if the application can't handle it, if it fails in some way, then you know you've found a vulnerability that might be able to be exploited. This definition implies a few different things. So first it implies that somehow you're gonna come up with this badly formed data or lots of badly formed data. So somehow you're gonna create test cases that you wanna deliver. And then second, you do actually have to deliver them to your target. And then third, you have to know if some failure occurred in the target software. So it's kind of three steps in fuzzing. You figure out what the test cases are, you deliver them to the target and you try to figure out if something broke, if something failed. Interestingly, so that's pretty open-ended, right? Fuzzing means you deliver badly formed input to something and see if it breaks. And so there's actually lots of different layers at which you can do FuzzTesting. So you can do arguments to individual functions. And in fact, if you're familiar with LibFuzzer, which is part of the LLVM ecosystem, there's some support for this sort of built into that compiler ecosystem. You can Fuzz command line arguments. So you can call command line applications with weird and funky arguments and see what happens. Or for applications that process some kind of file, you can mess that up. You can Fuzz that and create various weird input files and see what happens. Likewise, in web forms, in web applications, you can Fuzz the parameters, the things that are getting filled into the form, essentially, and as well as other parts of HTTP requests. APIs can be Fuzzed, they take input, so you can create badly formed inputs for that. And then of course, network protocols as well. Network protocols are just conversations between different pieces of software over a network. They have a certain form. They're expecting certain kinds of messages, and so you can Fuzz that. You can create badly formed network protocol messages and deliver them to a target. The part about creating the test cases, there's actually some subtlety there. And first of all, it's an infinite space problem. So for any input, you can create an infinite number of badly formed inputs. But nobody has infinite time for testing, so what you wanna do with fuzzing is use a technique that gives you higher quality test cases. And what I mean by higher quality is, they look more like what they're supposed to look like, even though they're still deliberately malformed. And the reason is this. There's this spectrum from easy to ignore to looks legitimate for test cases. And if somebody said, okay, write a fuzzer, probably the first thing you would do is just create completely randomized inputs, test cases. And that is okay, but it doesn't work very well because almost every test case that you create is not looking at all like what it's supposed to look like. And so for the target software, it gets this randomized piece of data, this message, and it's very easy to ignore. So it might look for a certain header and it won't find it because it's most of the time because it's randomized data. So it's very easy to ignore these test cases. And we would say that they have low quality because they don't really get into that target software and go down different control pathways and exercise that software. If you're familiar with the infinite monkeys theorem, if they are actually random test cases, just because of randomness, eventually you'll randomly choose something that kind of looks the way it's supposed to look, but it's gonna take a really long time to get there. So the test case quality is low for random fuzzing. The next step up from that is template fuzzing or mutational fuzzing. And here you start out with a known good input or message, the template. And then in order to create test cases, you introduce anomalies into that. You mutate it in various ways and create test cases that way. It's better because the test cases kind of look the way they're supposed to look. So you're gonna go down some interesting control paths in your target software. But it has shortcomings too. So if you're doing network protocols and the messages have like length fields or check sum fields or session IDs or other semantically meaningful things, the template doesn't really know anything about that. It's just messing up starting from a known good template input. And so it doesn't really know anything about those. And so it's, while the test case quality is better, significantly better than random, we can still do better than that. And so better than that is what we call generational fuzzing or model-based fuzzing. And here the fuzzing tool understands the data that it's creating test cases for. So if it's a file format fuzzer, it knows what the file should look like. So if it's a JPEG fuzzer, it knows the specification of a JPEG and how that file should look and what the different structures are and the fields inside. Or likewise with a network protocol fuzzer, it knows what each message should look like and what each field means. And that allows it to systematically break every rule and create test cases that are very close to correct, which means they'll go down lots of control pathways, but they are still bad. They're still deliberately malformed, which helps uncover the vulnerabilities, the bugs in the target software. And then you may have heard of coverage-guided fuzzing. Oh, I have a surrealist painting here because that's my analogy for generational fuzzing, which is it's kind of things that look familiar, but they're put together in unfamiliar ways and they kind of mess with your mind. So you've probably heard of coverage-guided mutational fuzzing or American fuzzy law, this sort of the most famous fuzzer of this kind. And it's like an enhanced version of template fuzzing where you are mutating a known good input to create the test cases, but every time you deliver a test case, you actually examine the control paths that got executed in the target binary, and then based on that information, you use that to guide how you're doing the mutation to create subsequent test cases. So for example, every time AFL gets to a new control pathway, it sort of takes a note and then uses that as a starting point for further mutations of the test cases. But the general rule is if you're fuzzing, you wanna have high-quality test cases to give you the best chance of finding bugs in the testing time that you have available. And so let's, specifically in the arena of network protocol fuzzing, what are network protocols? So they're really just conversations where this is a capture from Wireshark for the AMQP protocol. And you can see the conversation starts with somebody connects to the message broker and sends this protocol header message just saying, okay, here I am, I'm speaking AMQP, here's the version of the protocol that I use, and there's the response to that. And then after that, you can do other messages like open or begin and so forth. And the point of this is that if you're gonna do a network protocol fuzzing right, you have to understand the whole conversation. So if you wanna deliver fuzzed test cases of the open message, in order to deliver those effectively, you first have to open up the connection, send the protocol header, get the protocol header response, and then send your test case, your anomalized open message. So that's one tricky thing about fuzzing network protocols. Likewise, if there's a session ID or something like that, you want your fuzzer to be able to handle that correctly so that it can get the conversation to the point where you actually deliver the badly formed input, the badly formed message, the test case. Another thing that makes, where generational fuzzing really excels is with these TLV structures. So TLV stands for type length value, and it's a common way of encoding data in messages or protocols or file formats. So what we're looking at here, this is a capture of an AMQP open message. So remember, this is like midway through the conversation. The endpoints have to have already exchanged the protocol header messages, and then here's an open message. And the highlighted blue part is the actual AMQP message and the rest of it's like the TCP and IP headers and so forth. And so I'm just gonna highlight a couple of TLV structures for you in here. If you look at the ASCII dump, you can see there are actually some human readable words or codes in here. And so the first one is right here. It's for sort of an identifying name, which we've got as my broker. But in order to encode that into this protocol, we give it a type. So we're saying, okay, coming up here is a string, which is somehow type A1. And then, hang on, let me get my pointer on here. And then there's a length where say, okay, this string is eight octets, eight bytes long, and then the value, which is the actual ASCII of the string. So that's what a TLV structure looks like. And if you think about fuzzing a TLV structure, a mutational fuzzer doesn't know anything about what the types are or what the length means. But a generational fuzzer understands that. So if, for example, they wanted to try out a very long value, they could adjust the length appropriately so that the message would still look pretty correct to the target software, but it would be able to exercise sort of the length of whatever buffer that was gonna get read into. Here's another TLV type structure. So you can see ENUS here and FIFI, these are both local designators. So like a language and a location specifier. And here again, there's a type, A3, whatever that means. And then a length of five and the value here, this one is the FIFI. And then actually in this one, there's another length and another value. So another length of five and the ENUS. So that's this whole structure here and down here. And those are TLV structures. And again, if you think about creating fuzz test cases, you can see that a generational fuzzer would really understand how these are structured, what they should look like, and be able to kind of systematically break the rules. So here, this one's fun. So on the top is that same MQP open message. And on the bottom is one of the test cases. It's anomalized and this is the one that can actually break rabbit MQ. And so it's like in the comics in the newspaper, they've got the spot the difference puzzle. But here, I'm just gonna show you. So actually the only difference between these two messages, so the top one's valid and the bottom one breaks rabbit MQ or previous versions of rabbit MQ. The only difference is that my fuzzer has changed one of the T values in the TLV structure. So instead of an A3, we've specified a 40. And that's enough to cause rabbit MQ to, as it's attempting to deal with this message, it eats up all the available memory and gets killed. So I showed you the demo and I delivered the exploit, the bad open message and you saw rabbit MQ died and it went back to the command line. And so what exactly happened there? So traditionally in cybersecurity, we've been really focused on buffer overflows and process crashes. So if you start getting interested in cybersecurity, everyone says, oh, go read that paper, I'm just smashing the stack for fun and profit and you go read it and it's a good read. But that's really only one kind of bug and one kind of failure. So historically we've really focused on that and part of it's because it's so easy to mess up buffers when you're coding and see. And also if the conditions are just right, then you can actually, as an attacker, you can supply an input that contains your own code that ends up getting executed and that's called remote code execution. And it's super cool, but it's only like one kind of failure. And these types of crashes are very much about reading or writing memory that doesn't belong to you. So you get a segmentation fault and the process crashes and it's very obvious that a failure has occurred. But there are other types of failures, such as resource exhaustion and we'll talk a little bit about those. Historically Java came to prominence and one of the big things about Java was that you couldn't have a buffer overrun. Like you wouldn't have a crash from it. So if you tried to write past the end of a buffer instead of having your application crash, it would throw an array index out of bounds exception and it wouldn't actually have a process crash and there wasn't actually any exploit mechanism by which you would be able to supply your own code and an input and have it get run. So the programming environment or the runtime environment of applications makes a difference and it's important. But even so there are plenty of other ways that software can fail as we'll see. So one of the reasons Java sort of denies these kinds of failures is that it has this virtual machine architecture. So it's sort of another layer of insulation between your application and the metal like the processor that makes it harder to do certain kinds of failures. So a crash is an obvious mode of failure. A kernel panic is an obvious mode of failure but there are all these other different ways that software can fail. You can accidentally leak information that you shouldn't be leaking which is what Heartbleed was. You can end up in an infinite loop so that your application becomes unresponsive. You can eat up memory or processing power or disk space inappropriately in some sort of loop. You can have data in the application that gets messed up somehow and you don't really notice that until later usually. Like maybe if you have a database that ends up getting written with garbage that would be a problem. And then sometimes things just stop working that should still be working. So that's broken functionality. The things that cause these are always some mechanism in the code. So basically developers make mistakes because they're human. So for example, they might not check that data doesn't get written past the end of a buffer. That would be a buffer overflow. They might not validate the input that they're getting correctly. They might handle memory incorrectly and so on and so forth. And then in another bucket we've got the consequences of these things. So it might mean that you've lost money. It might mean that you don't have the computing resources you thought you had. And for cyber physical systems it could mean destruction or death as well. And the real kicker is that there's a completely arbitrary and completely unpredictable relationship between the mistakes that developers make in code, the way that the software fails and the consequences of that failure. You just cannot call it. So all you can really do is do better testing while you're making applications and try to make sure that the software doesn't fail or fails under minimal sets of circumstances. And that's one of the great things about fuzzing is that you're sort of bombarding an application with badly formed inputs to see when things go wrong, when it fails. And then once you know that you can go and fix it and make it better. It can make your application more robust and more secure. So we talked a little bit about failures in C and Java. Erlang is kind of an interesting environment. And to be honest, not something I had really encountered until I started this research with the message brokers. So Erlang, if you think about building an application that responds to like a network protocol, you're getting these messages in, you're trying to parse them and figure out what they are and what to do with them. And so the normal model of programming is that you basically try to think of every possible thing that could go wrong and write code for it. So you say, oh, well, what if I get a, you know, a type that's not a type I recognize? What do I do then? Or what if I only receive half a message? What do I do then? And so on and so forth. And Erlang kind of inverts that model with what's called a fail fast philosophy. So the idea with Erlang is that you write code to sort of match the way things should happen. And when they don't, you just let that code die. And then there are supervisors in your application that when things fail, they'll recreate the appropriate thing to try again or skip over the failure or whatever. But the point is that it's kind of an inversion of the traditional way of doing network parsing or network protocol parsing or file parsing or dealing with any kind of input. And then the reason you would do this is that parsing is always hard. Like if you look at the way, if you look at most of the vulnerabilities in the world, they have to do with, oh, this application didn't respond correctly to this input that wasn't quite right and so on and so forth. And part of the challenge is that when developers write parsing code, they have in front of them a specification of what the data should look like. And so they're sort of expecting it to look like that. And so they code, I would say, optimistically to parse the data that they're getting in an expected format. And of course, input is never to be trusted. And so when you're parsing input data or validating it, you have to be especially careful and code defensively. And so Erlang's fail fast philosophy helps kind of invert this model and makes it good for applications like message brokers. And then finally, like Java, Erlang has this virtual machine architecture where your application code isn't actually running on the processor. It's running on a virtual machine on top of the processor. And that means that certain types of failures or application layer failures are nearly impossible. And so you're very unlikely to have, for example, a remote code execution type exploit. However, despite that, think about other failure modes. So obviously, you saw in the demo, I sent a test case and the application went down. The virtual machine went down. And so obviously that's bad. It's just a different kind of failure. So at the very least, it would be a denial of service if an attacker can run my little Python script and cause a message broker to die. Then obviously that's a problem. So kind of the last topic here I wanna talk about is using Docker containers for FuzzTest targets because this turned out to be a really good way to see problems with resources like memory. So if you think about how you would do this normally, if you had an application like RabbitMQ and it had this vulnerability where it would eat up all the available memory, normally you might run RabbitMQ on its own computer in your lab or maybe on a virtual machine. But it would be hard to see a memory exhaustion bug because sort of by default operating systems give you as much RAM as they can. And they also give you this swap space on disk. And so when the RAM fills up, they'll take unused parts of it and write it out to disk and use that RAM sort of as additional RAM. And so if you do have an application that's consuming all of the available memory, it's not gonna like immediately crash. So it'll eat up all the available memory and then the operating system will thrash around for a while, swapping out that memory to disk and going back and forth and trying to give the application as much memory as it possibly can. And then eventually, in Linux at least, there's this thing called the OomKiller, the outer memory killer. And if a process is taking too much memory, the operating systems OomKiller will shut that process down. But under normal circumstances, there's all this swapping going on and then the system becomes less and less responsive and it's just thrashing for a long time. So putting Fuzz targets in containers makes it much easier to constrain the environment. So for a Docker container, it's very easy when you run a Docker container to say, oh, this container only has this much memory and you can say, no, I'm not gonna use swap for it. And so that means that when you're fuzzing, if the process eats up all the available memory, it fails quickly because it's not trying to do any swapping to disk or any of that nonsense. And of course, containers provide all sorts of other good benefits like you can say, oh, I wanna put it inside this virtual network, which is probably a good idea when you're fuzzing so that you're not interacting with other things. And you can limit disk space and so forth resources. So that's great about containers. They allow you to define this box inside which your targets run. But in addition, just having a Docker file, a very repeatable way of creating a container image is really nice for fuzz testing because it gives you a very reliable starting point for your fuzz testing, a very repeatable starting point. And reliable and repeatable are great words in testing. So this is actually an example of how I've used containers for fuzz testing targets. So this is for the older, vulnerable version of RabbitMQ. So I use four different files for each software application that I wanna test. And one of them just defines the image name. So here it's RabbitMQ pre-built box. And then I've got a build script here, which pulls in the image name and then just does a Docker build like you would do for anything. The Docker build of course uses this Docker file and this one's really simple. I'm just pulling a specific version from Docker hub of RabbitMQ. And then I'm also, I'm enabling the AMQP protocol, network protocol, because that's what I used for the fuzz testing. And then run has most of the magic here. We do a run, we do dash IT, which means interactive in the terminal. We do dash dash RM so that when this container is finished, it automatically is removed. And then I expose here the AMQP protocol port 5672 so that with my fuzzing tool, I can interact with this application using AMQP. And then this part's the real magic related to memory. I say, okay, my available memory for this container is half a meg of RAM. And then when you specify memory swap that is the same size, that actually means there will be no swap. So I'm just giving it a half a gig and no swap. And so this means that as I'm fuzzing, if I am causing memory problems, they show up pretty quickly. This Docker file is super simple, but I've also done variations on this where I have a Docker file where I'm actually building the application from its source. And of course, those are a little more complicated. But like I said before, the repeatability of building these things, assembling them in the same way every time is really, really useful. So before we get to the summary, I just wanna go back and show you what the exploit script looks like. So here it is. And so we were talking about fuzzing network protocols and you can see here, we actually have this header string, which is basically ASCII of AMQP and then a version number. And so in order to deliver the badly formed open string, we first, we do a TCP connection to the target and then we send that header, protocol header message, and then we receive the protocol header from the message broker. And then we actually send the badly formed open string. And so that's what the exploit looks like. And this is just kind of a simplified version of what the fuzzer did in the first place. So I just wanted to show you that. And then let's wrap up. So thank you for listening. I hope you enjoyed the talk. I had a lot of fun doing this research and I was very excited to get these vulnerabilities acknowledged and fixed. So to sum up, I think all failures are important. So traditionally, we've focused on process crashes and buffer overflows especially, but it's important to remember that cybersecurity is about confidentiality, integrity, and availability. And so any damage to any one of those things means that you have a cybersecurity problem. So in the case of, this vulnerability that I just showed you in RabbitMQ, it's an availability problem. An attacker can fairly easily render the message broker unavailable for anyone who's using it. Another takeaway is that applications that are even in, you know, air quotes, safe environments like Java or Erlang still fail. So you can structure things differently. You can kind of rule out certain kinds of failures or you can make it almost impossible for developers to make certain kinds of mistakes. But, you know, software is software and no matter where it's running, you can still have vulnerabilities. You can still have robustness problems. You can still have security problems. And then finally, this idea that putting fuzz testing targets inside containers makes it easy to constrain their environment and makes it easy to do things consistently and get consistent results. So that's the end of my talk. I hope you enjoyed it and hope you enjoy the rest of the show. Thank you.